Re: [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume
On Friday, October 20, 2017 10:46:07 PM CEST Bjorn Helgaas wrote: > On Mon, Oct 16, 2017 at 03:12:35AM +0200, Rafael J. Wysocki wrote: > > Hi All, > > > > Well, this took more time than expected, as I tried to cover everything I > > had > > in mind regarding PM flags for drivers. > > For the parts that touch PCI, > > Acked-by: Bjorn HelgaasThank you! > I doubt there'll be conflicts with changes in my tree, but let me know if > you trip over any so I can watch for them when merging. Well, if there are any conflicts, we'll see them in linux-next I guess. :-) Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume
On Mon, Oct 16, 2017 at 03:12:35AM +0200, Rafael J. Wysocki wrote: > Hi All, > > Well, this took more time than expected, as I tried to cover everything I had > in mind regarding PM flags for drivers. For the parts that touch PCI, Acked-by: Bjorn HelgaasI doubt there'll be conflicts with changes in my tree, but let me know if you trip over any so I can watch for them when merging. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] bug-hunting.rst: Fix an example and a typo in a Sphinx tag
- Use the same file name in the explanation and in the example (conex.c vs sonixj.c) - Add a missing ':' in a :ref: tag which leads to incorrect Shpinx output - Add some missing ',' and ';' Signed-off-by: Christophe JAILLET--- Documentation/admin-guide/bug-hunting.rst | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/Documentation/admin-guide/bug-hunting.rst b/Documentation/admin-guide/bug-hunting.rst index 08c4b1308189..f278b289e260 100644 --- a/Documentation/admin-guide/bug-hunting.rst +++ b/Documentation/admin-guide/bug-hunting.rst @@ -240,7 +240,7 @@ In order to report it upstream, you should identify the mailing list used for the development of the affected code. This can be done by using the ``get_maintainer.pl`` script. -For example, if you find a bug at the gspca's conex.c file, you can get +For example, if you find a bug at the gspca's sonixj.c file, you can get their maintainers with:: $ ./scripts/get_maintainer.pl -f drivers/media/usb/gspca/sonixj.c @@ -257,7 +257,7 @@ Please notice that it will point to: Tejun and Bhaktipriya (in this specific case, none really envolved on the development of this file); - The driver maintainer (Hans Verkuil); -- The subsystem maintainer (Mauro Carvalho Chehab) +- The subsystem maintainer (Mauro Carvalho Chehab); - The driver and/or subsystem mailing list (linux-me...@vger.kernel.org); - the Linux Kernel mailing list (linux-ker...@vger.kernel.org). @@ -274,14 +274,14 @@ Fixing the bug -- If you know programming, you could help us by not only reporting the bug, -but also providing us with a solution. After all open source is about +but also providing us with a solution. After all, open source is about sharing what you do and don't you want to be recognised for your genius? If you decide to take this way, once you have worked out a fix please submit it upstream. Please do read -ref:`Documentation/process/submitting-patches.rst ` though +:ref:`Documentation/process/submitting-patches.rst ` though to help your code get accepted. -- 2.14.1 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v1 1/2] thunderbolt: Make pathname to force_power shorter
WMI is the bus inside kernel, so, we may access the GUID via /sys/bus/wmi instead of doing this through /sys/devices path. Signed-off-by: Andy Shevchenko--- Documentation/admin-guide/thunderbolt.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/admin-guide/thunderbolt.rst b/Documentation/admin-guide/thunderbolt.rst index de50a8561774..9b55952039a6 100644 --- a/Documentation/admin-guide/thunderbolt.rst +++ b/Documentation/admin-guide/thunderbolt.rst @@ -230,7 +230,7 @@ If supported by your machine this will be exposed by the WMI bus with a sysfs attribute called "force_power". For example the intel-wmi-thunderbolt driver exposes this attribute in: - /sys/devices/platform/PNP0C14:00/wmi_bus/wmi_bus-PNP0C14:00/86CCFD48-205E-4A77-9C48-2021CBEDE341/force_power + /sys/bus/wmi/devices/86CCFD48-205E-4A77-9C48-2021CBEDE341/force_power To force the power to on, write 1 to this attribute file. To disable force power, write 0 to this attribute file. -- 2.14.2 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v1 2/2] thunderbolt: Additional step for built-in module to power on
The device will not appear until we rescan the bus. Signed-off-by: Andy Shevchenko--- Documentation/admin-guide/thunderbolt.rst | 5 + 1 file changed, 5 insertions(+) diff --git a/Documentation/admin-guide/thunderbolt.rst b/Documentation/admin-guide/thunderbolt.rst index 9b55952039a6..86987c566d6a 100644 --- a/Documentation/admin-guide/thunderbolt.rst +++ b/Documentation/admin-guide/thunderbolt.rst @@ -235,4 +235,9 @@ For example the intel-wmi-thunderbolt driver exposes this attribute in: To force the power to on, write 1 to this attribute file. To disable force power, write 0 to this attribute file. +In some cases (usually when thunderbolt.ko is built-in) the additional +step should be performed:: + + # echo 1 > /sys/bus/pci/rescan + Note: it's currently not possible to query the force power state of a platform. -- 2.14.2 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] printk: Introduce per-console loglevel setting
On 10/20/2017 01:05 AM, Petr Mladek wrote: On Thu 2017-10-19 16:40:45, Calvin Owens wrote: On 09/28/2017 05:43 PM, Calvin Owens wrote: Not all consoles are created equal: depending on the actual hardware, the latency of a printk() call can vary dramatically. The worst examples are serial consoles, where it can spin for tens of milliseconds banging the UART to emit a message, which can cause application-level problems when the kernel spews onto the console. Any thoughts on this series? Happy to resend again, but if there are no objections I'd love to see it merged sooner rather than later :) Happy to resend too, just let me know. There is no need to resend the patch. It is on my radar and I am going to look at it. Please, be patient, you hit conference, illness, after vacation season. We do not want to unnecessarily delay it but it is not a trivial change that might be accepted within minutes. No worries, just wanted to make sure it hadn't been missed :) Thanks, Calvin Best Regards, Petr -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v9 10/10] sparc64: Add support for ADI (Application Data Integrity)
ADI is a new feature supported on SPARC M7 and newer processors to allow hardware to catch rogue accesses to memory. ADI is supported for data fetches only and not instruction fetches. An app can enable ADI on its data pages, set version tags on them and use versioned addresses to access the data pages. Upper bits of the address contain the version tag. On M7 processors, upper four bits (bits 63-60) contain the version tag. If a rogue app attempts to access ADI enabled data pages, its access is blocked and processor generates an exception. Please see Documentation/sparc/adi.txt for further details. This patch extends mprotect to enable ADI (TSTATE.mcde), enable/disable MCD (Memory Corruption Detection) on selected memory ranges, enable TTE.mcd in PTEs, return ADI parameters to userspace and save/restore ADI version tags on page swap out/in or migration. ADI is not enabled by default for any task. A task must explicitly enable ADI on a memory range and set version tag for ADI to be effective for the task. Signed-off-by: Khalid AzizCc: Khalid Aziz --- v9: - Added code to migrate ADI tags to copy_highpage() to ensure tags get copied on page migration - Improved code to detect underflow and overflow when allocating tag storage v8: - Added note to doc about non-faulting loads not triggering ADI tag mismatch and more details on special tag values of 0x0 and 0xf, as suggested by Anthony Yznaga) - Added an IPI on mprotect(...PROT_ADI...) call to set TSTATE.MCDE on threads running on other processors and restore of TSTATE.MCDE on context switch (suggested by David Miller) - Removed restriction on enabling ADI on read-only memory (suggested by Anthony Yznaga) - Changed kzalloc() for tag storage to use GFP_NOWAIT - Added code to handle overflow and underflow when allocating tag storage, as suggested by Anthony Yznaga - Replaced sun_m7_patch_1insn_range() with sun4v_patch_1insn_range() which is functionally identical (suggested by Anthony Yznaga) - Added membar after restoring ADI tags in copy_user_highpage(), as suggested by David Miller v7: - Enhanced arch_validate_prot() to enable ADI only on writable addresses backed by physical RAM - Added support for saving/restoring ADI tags for each ADI block size address range on a page on swap in/out - Added code to copy ADI tags on COW - Updated values for auxiliary vectors to not conflict with values on other architectures to avoid conflict in glibc. glibc consolidates all auxiliary vectors into its headers and duplicate values in consolidated header are problematic - Disable same page merging on ADI enabled pages since ADI tags may not match on pages with identical data - Broke the patch up further into smaller patches v6: - Eliminated instructions to read and write PSTATE as well as MCDPER and PMCDPER on every access to userspace addresses by setting PSTATE and PMCDPER correctly upon entry into kernel. PSTATE.mcde and PMCDPER are set upon entry into kernel when running on an M7 processor. PSTATE.mcde being set only affects memory accesses that have TTE.mcd set. PMCDPER being set only affects writes to memory addresses that have TTE.mcd set. This ensures any faults caused by ADI tag mismatch on a write are exposed before kernel returns to userspace. v5: - Fixed indentation issues and instrcuctions in assembly code - Removed CONFIG_SPARC64 from mdesc.c - Changed to maintain state of MCDPER register in thread info flags as opposed to in mm context. MCDPER is a per-thread state and belongs in thread info flag as opposed to mm context which is shared across threads. Added comments to clarify this is a lazily maintained state and must be updated on context switch and copy_process() - Updated code to use the new arch_do_swap_page() and arch_unmap_one() functions v4: - Broke patch up into smaller patches v3: - Removed CONFIG_SPARC_ADI - Replaced prctl commands with mprotect - Added auxiliary vectors for ADI parameters - Enabled ADI for swappable pages v2: - Fixed a build error Documentation/sparc/adi.txt | 278 + arch/sparc/include/asm/mman.h | 84 - arch/sparc/include/asm/mmu_64.h | 17 ++ arch/sparc/include/asm/mmu_context_64.h | 50 ++ arch/sparc/include/asm/page_64.h| 6 + arch/sparc/include/asm/pgtable_64.h | 46 + arch/sparc/include/asm/thread_info_64.h | 2 +- arch/sparc/include/asm/trap_block.h |
[PATCH v9 00/10] Application Data Integrity feature introduced by SPARC M7
SPARC M7 processor adds additional metadata for memory address space that can be used to secure access to regions of memory. This additional metadata is implemented as a 4-bit tag attached to each cacheline size block of memory. A task can set a tag on any number of such blocks. Access to such block is granted only if the virtual address used to access that block of memory has the tag encoded in the uppermost 4 bits of VA. Since sparc processor does not implement all 64 bits of VA, top 4 bits are available for ADI tags. Any mismatch between tag encoded in VA and tag set on the memory block results in a trap. Tags are verified in the VA presented to the MMU and tags are associated with the physical page VA maps on to. If a memory page is swapped out and page frame gets reused for another task, the tags are lost and hence must be saved when swapping or migrating the page. A userspace task enables ADI through mprotect(). This patch series adds a page protection bit PROT_ADI and a corresponding VMA flag VM_SPARC_ADI. VM_SPARC_ADI is used to trigger setting TTE.mcd bit in the sparc pte that enables ADI checking on the corresponding page. MMU validates the tag embedded in VA for every page that has TTE.mcd bit set in its pte. After enabling ADI on a memory range, the userspace task can set ADI version tags using stxa instruction with ASI_MCD_PRIMARY or ASI_MCD_ST_BLKINIT_PRIMARY ASI. Once userspace task calls mprotect() with PROT_ADI, kernel takes following overall steps: 1. Find the VMAs covering the address range passed in to mprotect and set VM_SPARC_ADI flag. If address range covers a subset of a VMA, the VMA will be split. 2. When a page is allocated for a VA and the VMA covering this VA has VM_SPARC_ADI flag set, set the TTE.mcd bit so MMU will check the vwersion tag. 3. Userspace can now set version tags on the memory it has enabled ADI on. Userspace accesses ADI enabled memory using a virtual address that has the version tag embedded in the high bits. MMU validates this version tag against the actual tag set on the memory. If tag matches, MMU performs the VA->PA translation and access is granted. If there is a mismatch, hypervisor sends a data access exception or precise memory corruption detected exception depending upon whether precise exceptions are enabled or not (controlled by MCDPERR register). Kernel sends SIGSEGV to the task with appropriate si_code. 4. If a page is being swapped out or migrated, kernel must save any ADI tags set on the page. Kernel maintains a page worth of tag storage descriptors. Each descriptors pointsto a tag storage space and the address range it covers. If the page being swapped out or migrated has ADI enabled on it, kernel finds a tag storage descriptor that covers the address range for the page or allocates a new descriptor if none of the existing descriptors cover the address range. Kernel saves tags from the page into the tag storage space descriptor points to. 5. When the page is swapped back in or reinstantiated after migration, kernel restores the version tags on the new physical page by retrieving the original tag from tag storage pointed to by a tag storage descriptor for the virtual address range for new page. User task can disable ADI by calling mprotect() again on the memory range with PROT_ADI bit unset. Kernel clears the VM_SPARC_ADI flag in VMAs, merges adjacent VMAs if necessary, and clears TTE.mcd bit in the corresponding ptes. IOMMU does not support ADI checking. Any version tags embedded in the top bits of VA meant for IOMMU, are cleared and replaced with sign extension of the first non-version tag bit (bit 59 for SPARC M7) for IOMMU addresses. This patch series adds support for this feature in 10 patches: Patch 1/10 Tag mismatch on access by a task results in a trap from hypervisor as data access exception or a precide memory corruption detected exception. As part of handling these exceptions, kernel sends a SIGSEGV to user process with special si_code to indicate which fault occurred. This patch adds three new si_codes to differentiate between various mismatch errors. Patch 2/10 When a page is swapped or migrated, metadata associated with the page must be saved so it can be restored later. This patch adds a new function that saves/restores this metadata when updating pte upon a swap/migration. Patch 3/10 SPARC M7 processor adds new fields to control registers to support ADI feature. It also adds a new exception for precise traps on tag mismatch. This patch adds definitions for the new control register fields, new ASIs for ADI and an exception handler for the precise trap on tag mismatch. Patch 4/10 New hypervisor fault types were added by sparc M7 processor to support ADI feature. This patch adds code to handle these fault types for data access exception handler. Patch 5/10 When ADI is in use for a page and a tag mismatch occurs, processor raises "Memory corruption Detected" trap. This patch adds
Re: [PATCH 7/8] Documentation: fix selftests related file refs
On Thu, Oct 12, 2017 at 03:24:10PM -0500, Tom Saeger wrote: > Make refs to selftests files valid including: > - watchdog-test.c > - dnotify_test.c > > Signed-off-by: Tom Saeger> --- > Documentation/filesystems/dnotify.txt| 2 +- > Documentation/watchdog/hpwdt.txt | 2 +- > Documentation/watchdog/pcwd-watchdog.txt | 2 +- > 3 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/Documentation/filesystems/dnotify.txt > b/Documentation/filesystems/dnotify.txt > index 6baf88f46859..15156883d321 100644 > --- a/Documentation/filesystems/dnotify.txt > +++ b/Documentation/filesystems/dnotify.txt > @@ -62,7 +62,7 @@ disabled, fcntl(fd, F_NOTIFY, ...) will return -EINVAL. > > Example > --- > -See Documentation/filesystems/dnotify_test.c for an example. > +See tools/testing/selftests/filesystems/dnotify_test.c for an example. > > NOTE > > diff --git a/Documentation/watchdog/hpwdt.txt > b/Documentation/watchdog/hpwdt.txt > index 7a9f635d0258..6d866c537127 100644 > --- a/Documentation/watchdog/hpwdt.txt > +++ b/Documentation/watchdog/hpwdt.txt > @@ -15,7 +15,7 @@ Last reviewed: 05/20/2016 > > Watchdog functionality is enabled like any other common watchdog driver. > That > is, an application needs to be started that kicks off the watchdog timer. A > - basic application exists in the Documentation/watchdog/src directory called > + basic application exists in tools/testing/selftests/watchdog/ named > watchdog-test.c. Simply compile the C file and kick it off. If the system > gets into a bad state and hangs, the HPE ProLiant iLO timer register will > not be updated in a timely fashion and a hardware system reset (also known > as Taking over hpwdt for Jimmy Vance. Signed-off-by: Jerry Hoemann > diff --git a/Documentation/watchdog/pcwd-watchdog.txt > b/Documentation/watchdog/pcwd-watchdog.txt > index 4f68052395c0..b8e60a441a43 100644 > --- a/Documentation/watchdog/pcwd-watchdog.txt > +++ b/Documentation/watchdog/pcwd-watchdog.txt > @@ -25,7 +25,7 @@ Last reviewed: 10/05/2007 > > If you want to write a program to be compatible with the PC Watchdog > driver, simply use of modify the watchdog test program: > - Documentation/watchdog/src/watchdog-test.c > + tools/testing/selftests/watchdog/watchdog-test.c > > > Other IOCTL functions include: > -- > 2.14.2 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-watchdog" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- - Jerry Hoemann Software Engineer Hewlett Packard Enterprise - -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rcu kernel-doc issues (4.14-rc1)
On 10/20/17 09:42, Paul E. McKenney wrote: > On Wed, Oct 18, 2017 at 10:36:47AM -0600, Jonathan Corbet wrote: >> On Wed, 18 Oct 2017 09:27:01 -0700 >> "Paul E. McKenney"wrote: >> >>> On a related topic... Is there anything that test-builds docbook prior >>> to patches hitting mainline? My experience indicates that the answer is >>> "no". >> >> The zero-day robot is said to be testing for new doc-build errors, but I >> haven't actually seen much of that. > > Well, on the good side, Linus did take the fixes. I will leave it > to you guys to sort things as needed with Fengguang. ;-) Thanks. -- ~Randy -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rcu kernel-doc issues (4.14-rc1)
On Wed, Oct 18, 2017 at 10:36:47AM -0600, Jonathan Corbet wrote: > On Wed, 18 Oct 2017 09:27:01 -0700 > "Paul E. McKenney"wrote: > > > On a related topic... Is there anything that test-builds docbook prior > > to patches hitting mainline? My experience indicates that the answer is > > "no". > > The zero-day robot is said to be testing for new doc-build errors, but I > haven't actually seen much of that. Well, on the good side, Linus did take the fixes. I will leave it to you guys to sort things as needed with Fengguang. ;-) Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 1/4] arm64: kvm: route synchronous external abort exceptions to EL2
Hi james, Thanks for the mail and sorry for my late response. 2017-10-19 1:21 GMT+08:00, James Morse: > Hi Dongjiu Geng, > > On 17/10/17 15:14, Dongjiu Geng wrote: >> ARMv8.2 adds a new bit HCR_EL2.TEA which controls to >> route synchronous external aborts to EL2, and adds a >> trap control bit HCR_EL2.TERR which controls to >> trap all Non-secure EL1&0 error record accesses to EL2. > > The bulk of this patch is about trap-and-emulating these ERR registers, but > that's not reflected in the title: >> KVM: arm64: Emulate RAS error registers and set HCR_EL2's TERR & TEA > > >> This patch enables the two bits for the guest OS. >> when an synchronous abort is generated in the guest OS, >> it will trap to EL3 firmware, EL3 firmware will check the > > *buzz* > This depends on SCR_EL3.EA, which this patch doesn't touch and the > normal-world > can't even know about. This is what your system does, the commit message > should > be about the change to Linux. > > (I've said this before) Thanks for the point out, I make this series in a hurry(you are waiting this patch), forget to check again your comments before. > > >> HCR_EL2.TEA value to decide to jump to hypervisor or host >> OS. Enabling HCR_EL2.TERR makes error record access >> from guest trap to EL2. >> >> Add some minimal emulation for RAS-Error-Record registers. >> In the emulation, ERRIDR_EL1 and ERRSELR_EL1 are zero. >> Then, the others ERX* registers are RAZ/WI. > >> diff --git a/arch/arm64/include/asm/kvm_emulate.h >> b/arch/arm64/include/asm/kvm_emulate.h >> index fe39e68..47983db 100644 >> --- a/arch/arm64/include/asm/kvm_emulate.h >> +++ b/arch/arm64/include/asm/kvm_emulate.h >> @@ -47,6 +47,13 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu >> *vcpu) >> vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS; >> if (is_kernel_in_hyp_mode()) >> vcpu->arch.hcr_el2 |= HCR_E2H; >> +if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) { > > This ARM64_HAS_RAS_EXTN isn't in mainline, nor is it added by your series. > I > know where it comes from, but other reviewers may not. If you have > dependencies > on another series, please call them out in the cover letter. yes, thanks for the point out. > > This is the first cpus_have_const_cap() user in this header file, it > probably needs: > #include OK > > >> +/* route synchronous external abort exceptions to EL2 */ >> +vcpu->arch.hcr_el2 |= HCR_TEA; >> +/* trap error record accesses */ >> +vcpu->arch.hcr_el2 |= HCR_TERR; >> +} >> + >> if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) >> vcpu->arch.hcr_el2 &= ~HCR_RW; >> } >> diff --git a/arch/arm64/include/asm/kvm_host.h >> b/arch/arm64/include/asm/kvm_host.h >> index d686300..af55b3bc 100644 >> --- a/arch/arm64/include/asm/kvm_host.h >> +++ b/arch/arm64/include/asm/kvm_host.h >> @@ -105,6 +105,8 @@ enum vcpu_sysreg { >> TTBR1_EL1, /* Translation Table Base Register 1 */ >> TCR_EL1,/* Translation Control Register */ >> ESR_EL1,/* Exception Syndrome Register */ > >> +ERRIDR_EL1, /* Error Record ID Register */ > > Page 39 of [0]: 'ERRIDR_EL1 is a 64-bit read-only ...'. yes, it is read-only. > > >> +ERRSELR_EL1,/* Error Record Select Register */ > > We're emulating these as RAZ/WI, do we really need to allocate > vcpu->arch.ctxt.sys_regs space for them? If we always return 0 for ERRIDR, > then > we don't need to keep ERRSELR as 'the value read back [..] is UNKNOWN'. https://lists.cs.columbia.edu/pipermail/kvmarm/2017-September/027176.html " 'If ERRSELR_EL1.SEL is [>=] ERRIDR_EL1.NUM' that makes the ERX* registers RAZ/WI" This is because I want to make above simulation as you suggested, if want to make above simulation, it needs set "vcpu->arch.ctxt.sys_regs" to 0, instead of reading from system register. so need a space to store it > > I think we only need space for these once their value needs to be migrated, > user-space doesn't need to know they exist until then. > > >> AFSR0_EL1, /* Auxiliary Fault Status Register 0 */ >> AFSR1_EL1, /* Auxiliary Fault Status Register 1 */ >> FAR_EL1,/* Fault Address Register */ > >> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c >> index 2e070d3..a74617b 100644 >> --- a/arch/arm64/kvm/sys_regs.c >> +++ b/arch/arm64/kvm/sys_regs.c >> @@ -775,6 +775,36 @@ static bool access_pmovs(struct kvm_vcpu *vcpu, >> struct sys_reg_params *p, >> return true; >> } >> >> +static bool access_error_reg(struct kvm_vcpu *vcpu, struct sys_reg_params >> *p, >> + const struct sys_reg_desc *r) >> +{ >> +/* accessing ERRIDR_EL1 */ >> +if (r->CRm == 3 && r->Op2 == 0) { >> +if (p->is_write) >> +vcpu_sys_reg(vcpu, ERRIDR_EL1) = 0; > > As above, this register is read-only. yes, it is my mistake. > > >> +return trap_raz_wi(vcpu, p, r); > > If we can
Re: [PATCH v2 1/2] mm, thp: introduce dedicated transparent huge page allocation interfaces
On Fri, 20 Oct 2017, changbin...@intel.com wrote: > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 269b5df..2a960fc 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -501,6 +501,43 @@ void prep_transhuge_page(struct page *page) > set_compound_page_dtor(page, TRANSHUGE_PAGE_DTOR); > } > > +struct page *alloc_transhuge_page_vma(gfp_t gfp_mask, > + struct vm_area_struct *vma, unsigned long addr) > +{ > + struct page *page; > + > + page = alloc_pages_vma(gfp_mask | __GFP_COMP, HPAGE_PMD_ORDER, > +vma, addr, numa_node_id(), true); > + if (unlikely(!page)) > + return NULL; > + prep_transhuge_page(page); > + return page; > +} > + > +struct page *alloc_transhuge_page_nodemask(gfp_t gfp_mask, > + int preferred_nid, nodemask_t *nmask) > +{ > + struct page *page; > + > + page = __alloc_pages_nodemask(gfp_mask | __GFP_COMP, HPAGE_PMD_ORDER, > + preferred_nid, nmask); > + if (unlikely(!page)) > + return NULL; > + prep_transhuge_page(page); > + return page; > +} > + > +struct page *alloc_transhuge_page(gfp_t gfp_mask) > +{ > + struct page *page; > + > + page = alloc_pages(gfp_mask | __GFP_COMP, HPAGE_PMD_ORDER); > + if (unlikely(!page)) > + return NULL; > + prep_transhuge_page(page); > + return page; > +} > + These look pretty similar to the code used for huge pages (aside from the call to prep_transhuge_page(). Maybe we can have common allocation primitives for huge pages? -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Update][PATCH v2 01/12] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags
On Friday, October 20, 2017 1:35:27 PM CEST Greg Kroah-Hartman wrote: > On Fri, Oct 20, 2017 at 01:11:22PM +0200, Rafael J. Wysocki wrote: > > On Thursday, October 19, 2017 9:33:15 AM CEST Greg Kroah-Hartman wrote: > > > On Thu, Oct 19, 2017 at 01:17:31AM +0200, Rafael J. Wysocki wrote: > > > > From: Rafael J. Wysocki> > > > > > > > The motivation for this change is to provide a way to work around > > > > a problem with the direct-complete mechanism used for avoiding > > > > system suspend/resume handling for devices in runtime suspend. > > > > > > > > The problem is that some middle layer code (the PCI bus type and > > > > the ACPI PM domain in particular) returns positive values from its > > > > system suspend ->prepare callbacks regardless of whether the driver's > > > > ->prepare returns a positive value or 0, which effectively prevents > > > > drivers from being able to control the direct-complete feature. > > > > Some drivers need that control, however, and the PCI bus type has > > > > grown its own flag to deal with this issue, but since it is not > > > > limited to PCI, it is better to address it by adding driver flags at > > > > the core level. > > > > > > > > To that end, add a driver_flags field to struct dev_pm_info for flags > > > > that can be set by device drivers at the probe time to inform the PM > > > > core and/or bus types, PM domains and so on on the capabilities and/or > > > > preferences of device drivers. Also add two static inline helpers > > > > for setting that field and testing it against a given set of flags > > > > and make the driver core clear it automatically on driver remove > > > > and probe failures. > > > > > > > > Define and document two PM driver flags related to the direct- > > > > complete feature: NEVER_SKIP and SMART_PREPARE that can be used, > > > > respectively, to indicate to the PM core that the direct-complete > > > > mechanism should never be used for the device and to inform the > > > > middle layer code (bus types, PM domains etc) that it can only > > > > request the PM core to use the direct-complete mechanism for > > > > the device (by returning a positive value from its ->prepare > > > > callback) if it also has been requested by the driver. > > > > > > > > While at it, make the core check pm_runtime_suspended() when > > > > setting power.direct_complete so that it doesn't need to be > > > > checked by ->prepare callbacks. > > > > > > > > Signed-off-by: Rafael J. Wysocki > > > > > > Acked-by: Greg Kroah-Hartman > > > > Thanks! > > > > Does it also apply to the other patches in the series? > > > > I'd like to queue up the core patches for 4.15 as they are specifically > > designed to only affect the drivers that actually set the flags, so there > > shouldn't be any regression resulting from them, and I'd like to be > > able to start using the flags in drivers going forward. > > Yes, sorry, I thought I acked them, but you are right, I didn't: > > Acked-by: Greg Kroah-Hartman > > for all of them please. Thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Update][PATCH v2 01/12] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags
On Fri, Oct 20, 2017 at 01:11:22PM +0200, Rafael J. Wysocki wrote: > On Thursday, October 19, 2017 9:33:15 AM CEST Greg Kroah-Hartman wrote: > > On Thu, Oct 19, 2017 at 01:17:31AM +0200, Rafael J. Wysocki wrote: > > > From: Rafael J. Wysocki> > > > > > The motivation for this change is to provide a way to work around > > > a problem with the direct-complete mechanism used for avoiding > > > system suspend/resume handling for devices in runtime suspend. > > > > > > The problem is that some middle layer code (the PCI bus type and > > > the ACPI PM domain in particular) returns positive values from its > > > system suspend ->prepare callbacks regardless of whether the driver's > > > ->prepare returns a positive value or 0, which effectively prevents > > > drivers from being able to control the direct-complete feature. > > > Some drivers need that control, however, and the PCI bus type has > > > grown its own flag to deal with this issue, but since it is not > > > limited to PCI, it is better to address it by adding driver flags at > > > the core level. > > > > > > To that end, add a driver_flags field to struct dev_pm_info for flags > > > that can be set by device drivers at the probe time to inform the PM > > > core and/or bus types, PM domains and so on on the capabilities and/or > > > preferences of device drivers. Also add two static inline helpers > > > for setting that field and testing it against a given set of flags > > > and make the driver core clear it automatically on driver remove > > > and probe failures. > > > > > > Define and document two PM driver flags related to the direct- > > > complete feature: NEVER_SKIP and SMART_PREPARE that can be used, > > > respectively, to indicate to the PM core that the direct-complete > > > mechanism should never be used for the device and to inform the > > > middle layer code (bus types, PM domains etc) that it can only > > > request the PM core to use the direct-complete mechanism for > > > the device (by returning a positive value from its ->prepare > > > callback) if it also has been requested by the driver. > > > > > > While at it, make the core check pm_runtime_suspended() when > > > setting power.direct_complete so that it doesn't need to be > > > checked by ->prepare callbacks. > > > > > > Signed-off-by: Rafael J. Wysocki > > > > Acked-by: Greg Kroah-Hartman > > Thanks! > > Does it also apply to the other patches in the series? > > I'd like to queue up the core patches for 4.15 as they are specifically > designed to only affect the drivers that actually set the flags, so there > shouldn't be any regression resulting from them, and I'd like to be > able to start using the flags in drivers going forward. Yes, sorry, I thought I acked them, but you are right, I didn't: Acked-by: Greg Kroah-Hartman for all of them please. thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Update][PATCH v2 01/12] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags
On Thursday, October 19, 2017 9:33:15 AM CEST Greg Kroah-Hartman wrote: > On Thu, Oct 19, 2017 at 01:17:31AM +0200, Rafael J. Wysocki wrote: > > From: Rafael J. Wysocki> > > > The motivation for this change is to provide a way to work around > > a problem with the direct-complete mechanism used for avoiding > > system suspend/resume handling for devices in runtime suspend. > > > > The problem is that some middle layer code (the PCI bus type and > > the ACPI PM domain in particular) returns positive values from its > > system suspend ->prepare callbacks regardless of whether the driver's > > ->prepare returns a positive value or 0, which effectively prevents > > drivers from being able to control the direct-complete feature. > > Some drivers need that control, however, and the PCI bus type has > > grown its own flag to deal with this issue, but since it is not > > limited to PCI, it is better to address it by adding driver flags at > > the core level. > > > > To that end, add a driver_flags field to struct dev_pm_info for flags > > that can be set by device drivers at the probe time to inform the PM > > core and/or bus types, PM domains and so on on the capabilities and/or > > preferences of device drivers. Also add two static inline helpers > > for setting that field and testing it against a given set of flags > > and make the driver core clear it automatically on driver remove > > and probe failures. > > > > Define and document two PM driver flags related to the direct- > > complete feature: NEVER_SKIP and SMART_PREPARE that can be used, > > respectively, to indicate to the PM core that the direct-complete > > mechanism should never be used for the device and to inform the > > middle layer code (bus types, PM domains etc) that it can only > > request the PM core to use the direct-complete mechanism for > > the device (by returning a positive value from its ->prepare > > callback) if it also has been requested by the driver. > > > > While at it, make the core check pm_runtime_suspended() when > > setting power.direct_complete so that it doesn't need to be > > checked by ->prepare callbacks. > > > > Signed-off-by: Rafael J. Wysocki > > Acked-by: Greg Kroah-Hartman Thanks! Does it also apply to the other patches in the series? I'd like to queue up the core patches for 4.15 as they are specifically designed to only affect the drivers that actually set the flags, so there shouldn't be any regression resulting from them, and I'd like to be able to start using the flags in drivers going forward. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Make squashfs fragments' cache size more configurable
Hi Phillip, Thank you for your fast reply. On Fri, Oct 20, 2017 at 2:18 PM, Phillip Lougher wrote: > On Thu, Oct 19, 2017 at 12:50 AM, Qixuan Wu wrote: >> Hi All, >> >> Currently, squashfs fragments' cache size is only determined by >> config option CONFIG_SQUASHFS_FRAGMENT_CACHE_SIZE. Users have >> no way to change the value when they get the binary kernel. > Thank-you for the patches, but they're both pointless and dangerous. > Let's be clear here you're trying to change an "expert only" kernel > configuration option into a user changeable option. This is stupid > because it is not meant for non-experts to change for good reason. > > The fragment cache size isn't some small tweak to the operation of > Squashfs, it fundamentally affects both the performance and memory > overhead of Squashfs. As such right from its introduction in 2003, it > has been an "expert only" configuration option at build time. Even > then it is made clear that the default has been carefully chosen, and > it should only be changed in exceptional circumstances. This > basically means don't change the default unless you really know what > you're doing, and this means tracing of Squashfs against your use-case > to determine caching behaviour. There is absolutely no other reason > why you'd want to change the default. This also means it should be > restricted to kernel configuration time only. > > Let's be clear again, very few people should ever want to change the > default, and for the "experts" that do want to do so, they can do so > when configuring the kernel. If you're not in a position to change it > at kernel configuration time then by definition you're not an expert, > and you shouldn't be able to change it anyway and certainly not as a > user. > > There is absolutely no use-case here to make this a user changeable > option. I can see no upsides in doing this, only downsides. > > Frankly if you need to change this value at module insert time then > there is something wrong with your system or build process. If you > want this because you want to build the kernel/modules once, and then > post-facto configure them for various products then it is your build > process that is broken. If you want this because you want to > dynamically change Squashfs memory usage/caching behaviour post kernel > configuration time it suggests you're trying to adapt Squashfs's > footprint based on available memory. This is an abuse of the option > as it's only meant to be used after detailed tracing/analysis and > certainly not used to accommodate unforeseen dynamic low memory > situations, and if that's the reason for needing this option, you > should be looking to solve it elsewhere. > > Ultimately this has been an "expert" kernel configuration only option > since its introduction in 2003, and I never been asked to change it, > and I believe this is because people recognise it as such. I suspect > you're trying to change this for fundamentally bogus reasons. > Moreover Squashfs is used in many different use-cases and > distributions, and I'm not going to make this a user-changeable option > allowing users to insert the Squashfs module in such a way that will > break its performance. > > So NACK. I need apologize for not describing the scenario clear enough. In our company, maybe we have a bit different kernel distribution mode. We only can release the single kernel binary to multiple customer. For one customer of us, they have a very strict kernel boot speed requirement that is 2~3s including rootfs (squashfs) uncompression. We found if modify the value from 3 to 8, some handreds of miliseconds can be saved, and the total boot time satisfied the requirement. But we were afraid to impact other customer, so used kernel boot parameter. Module interface currently there is no user-case. Maybe it's not correct, from my opinion, that kernel boot parameter is almost same as config option, kernel module, /proc or /sysfs. It gives administrator the chance to change some kernel's variables as per different scenario, if they cannot chagne the config option. And administrator sometime is root, not normal user. So the parameter set by them through kernel boot parameter with enough understanding, testing and analysis. For example, such kernel boot parameters like crashkernel=size[KMG], default_hugepagesz= also do the same work. So before setting to 8, the administrator of our customer understands the meaning and memory overhead impact of this modification of fragments cache size. Frankly I admit maybe our scenario is a bit special in embedded system, it's not as useful for others as for us. So it seems like a bit over-design and I can understand what you are worried about if accepting the feature. Anyway, thanks for your reply and your opinion. Thanks Qixuan > Phillip Lougher (Squashfs maintainer) -- To unsubscribe from this list: send the line
Re: [RFC PATCH 3/5] gpio: gpiolib: Add chardev support for maintaining GPIO values on reset
On Fri, 2017-10-20 at 09:27 +0200, Linus Walleij wrote: > I paged Bartosz and Michael on this, they are experts on the use cases for > the character device and their opinions are likely more valuable than mine. > > > On Fri, Oct 20, 2017 at 5:37 AM, Andrew Jefferywrote: > > Similar to devicetree support, add flags and mappings to expose reset > > tolerance configuration through the chardev interface. > > > > Signed-off-by: Andrew Jeffery > > (...) > > > +* Unconditionally configure reset tolerance, as it's > > possible > > +* that the tolerance flag itself becomes tolerant to > > resets. > > +* Thus it could remain set from a previous environment, but > > +* the current environment may not expect it so. > > +*/ > > + ret = gpiod_set_reset_tolerant(desc, > > + !!(lflags & > > GPIOHANDLE_REQUEST_RESET_TOLERANT)); > > + if (ret < 0) > > + goto out_free_descs; > > First, as noted in the first patch, IMO we should just go for persistance, > i.e. you want to flag to the system to keep the line persistent in any case, > no matter if the system goes to sleep or resets. > > So the usecase is going to be a control system or similar, a makerspace > project, an industrial product of some kind, driving GPIO from userspace. > > I don't see it as helpful to give userspace control over whether the line > is persistent or not. It is more reasonable to assume persistance for > userspace use cases, don't you think? Whether the system goes to sleep > or the gpiochip resets should not make a door suddenly close or the > lights in the christmas tree go out, right? I think if the gpiochip supports > persistance of any kind, we should try to use it and not have userspace > provide flags for that. Right. I guess the counter argument to your examples is if the gpio is controlling any active process that we don't want to continue if we've lost the capacity to monitor some other inputs (some kind of dead-man's switch). But maybe the argument is that should be implemented in the kernel anyway? Andrew signature.asc Description: This is a digitally signed message part
[PATCH v2 1/2] mm, thp: introduce dedicated transparent huge page allocation interfaces
From: Changbin DuThis patch introduced 4 new interfaces to allocate a prepared transparent huge page. The aim is to remove duplicated code and simplify transparent huge page allocation. These are similar to alloc_hugepage_xxx which are for hugetlbfs pages. - alloc_transhuge_page_vma - alloc_transhuge_page_nodemask - alloc_transhuge_page_node - alloc_transhuge_page These interfaces implicitly add __GFP_COMP gfp mask which is the minimum flags used for huge page allocation. More flags leave to the callers. This patch does below changes: - define alloc_transhuge_page_xxx interfaces - apply them to all existing code - declare prep_transhuge_page as static since no others use it - remove alloc_hugepage_vma definition since it no longer has users Signed-off-by: Changbin Du --- v2: Anshuman Khandu: - Remove redundant 'VM_BUG_ON(!(gfp_mask & __GFP_COMP))'. Andrew Morton: - Fix build error if thp is disabled. --- include/linux/gfp.h | 4 include/linux/huge_mm.h | 18 -- include/linux/migrate.h | 14 +- mm/huge_memory.c| 48 +--- mm/khugepaged.c | 11 ++- mm/mempolicy.c | 14 +++--- mm/migrate.c| 14 -- mm/shmem.c | 6 ++ 8 files changed, 73 insertions(+), 56 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index f780718..855c72e 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -507,15 +507,11 @@ alloc_pages(gfp_t gfp_mask, unsigned int order) extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order, struct vm_area_struct *vma, unsigned long addr, int node, bool hugepage); -#define alloc_hugepage_vma(gfp_mask, vma, addr, order) \ - alloc_pages_vma(gfp_mask, order, vma, addr, numa_node_id(), true) #else #define alloc_pages(gfp_mask, order) \ alloc_pages_node(numa_node_id(), gfp_mask, order) #define alloc_pages_vma(gfp_mask, order, vma, addr, node, false)\ alloc_pages(gfp_mask, order) -#define alloc_hugepage_vma(gfp_mask, vma, addr, order) \ - alloc_pages(gfp_mask, order) #endif #define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0) #define alloc_page_vma(gfp_mask, vma, addr)\ diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 14bc21c..184eb38 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -130,9 +130,20 @@ extern unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags); -extern void prep_transhuge_page(struct page *page); extern void free_transhuge_page(struct page *page); +extern struct page *alloc_transhuge_page_vma(gfp_t gfp_mask, + struct vm_area_struct *vma, unsigned long addr); +extern struct page *alloc_transhuge_page_nodemask(gfp_t gfp_mask, + int preferred_nid, nodemask_t *nmask); + +static inline struct page *alloc_transhuge_page_node(int nid, gfp_t gfp_mask) +{ + return alloc_transhuge_page_nodemask(gfp_mask, nid, NULL); +} + +extern struct page *alloc_transhuge_page(gfp_t gfp_mask); + bool can_split_huge_page(struct page *page, int *pextra_pins); int split_huge_page_to_list(struct page *page, struct list_head *list); static inline int split_huge_page(struct page *page) @@ -260,7 +271,10 @@ static inline bool transparent_hugepage_enabled(struct vm_area_struct *vma) return false; } -static inline void prep_transhuge_page(struct page *page) {} +#define alloc_transhuge_page_vma(gfp_mask, vma, addr) NULL +#define alloc_transhuge_page_nodemask(gfp_mask, preferred_nid, nmask) NULL +#define alloc_transhuge_page_node(nid, gfp_maskg) NULL +#define alloc_transhuge_page(gfp_mask) NULL #define transparent_hugepage_flags 0UL diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 643c7ae..70a00f3 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -42,19 +42,15 @@ static inline struct page *new_page_nodemask(struct page *page, return alloc_huge_page_nodemask(page_hstate(compound_head(page)), preferred_nid, nodemask); - if (thp_migration_supported() && PageTransHuge(page)) { - order = HPAGE_PMD_ORDER; - gfp_mask |= GFP_TRANSHUGE; - } - if (PageHighMem(page) || (zone_idx(page_zone(page)) == ZONE_MOVABLE)) gfp_mask |= __GFP_HIGHMEM; - new_page = __alloc_pages_nodemask(gfp_mask, order, + if (thp_migration_supported() && PageTransHuge(page)) + return alloc_transhuge_page_nodemask(gfp_mask | GFP_TRANSHUGE, + preferred_nid, nodemask); + else + return __alloc_pages_nodemask(gfp_mask, order,
[PATCH v2 0/2] mm, thp: introduce dedicated transparent huge page allocation interfaces
From: Changbin DuThe first one introduce new interfaces, the second one kills naming confusion. The aim is to simplify transparent huge page allocation and remove duplicated code. V2: - Coding improvment. - Fix build error if thp is disabled. Changbin Du (2): mm, thp: introduce dedicated transparent huge page allocation interfaces mm: rename page dtor functions to {compound,huge,transhuge}_page__dtor Documentation/vm/hugetlbfs_reserv.txt | 4 +-- include/linux/gfp.h | 4 --- include/linux/huge_mm.h | 20 -- include/linux/hugetlb.h | 2 +- include/linux/migrate.h | 14 -- include/linux/mm.h| 8 +++--- mm/huge_memory.c | 52 +-- mm/hugetlb.c | 14 +- mm/khugepaged.c | 11 ++-- mm/mempolicy.c| 14 ++ mm/migrate.c | 14 +++--- mm/page_alloc.c | 10 +++ mm/shmem.c| 6 ++-- mm/swap.c | 2 +- mm/userfaultfd.c | 2 +- 15 files changed, 97 insertions(+), 80 deletions(-) -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/2] mm: rename page dtor functions to {compound,huge,transhuge}_page__dtor
From: Changbin DuThe current name free_{huge,transhuge}_page are paired with alloc_{huge,transhuge}_page functions, but the actual page free function is still free_page() which will indirectly call free_{huge,transhuge}_page. So this patch removes this confusion by renaming all the compound page dtors to {compound,huge,transhuge}_page__dtor. And since we already have a typedef compound_page_dtor, rename it to compound_page_dtor_t to avoid name conflict. Signed-off-by: Changbin Du --- v2: Improve commit message. --- Documentation/vm/hugetlbfs_reserv.txt | 4 ++-- include/linux/huge_mm.h | 2 +- include/linux/hugetlb.h | 2 +- include/linux/mm.h| 8 mm/huge_memory.c | 4 ++-- mm/hugetlb.c | 14 +++--- mm/page_alloc.c | 10 +- mm/swap.c | 2 +- mm/userfaultfd.c | 2 +- 9 files changed, 24 insertions(+), 24 deletions(-) diff --git a/Documentation/vm/hugetlbfs_reserv.txt b/Documentation/vm/hugetlbfs_reserv.txt index 9aca09a..b3ffa3e 100644 --- a/Documentation/vm/hugetlbfs_reserv.txt +++ b/Documentation/vm/hugetlbfs_reserv.txt @@ -238,7 +238,7 @@ to the global reservation count (resv_huge_pages). Freeing Huge Pages -- -Huge page freeing is performed by the routine free_huge_page(). This routine +Huge page freeing is performed by the routine huge_page_dtor(). This routine is the destructor for hugetlbfs compound pages. As a result, it is only passed a pointer to the page struct. When a huge page is freed, reservation accounting may need to be performed. This would be the case if the page was @@ -468,7 +468,7 @@ However, there are several instances where errors are encountered after a huge page is allocated but before it is instantiated. In this case, the page allocation has consumed the reservation and made the appropriate subpool, reservation map and global count adjustments. If the page is freed at this -time (before instantiation and clearing of PagePrivate), then free_huge_page +time (before instantiation and clearing of PagePrivate), then huge_page_dtor will increment the global reservation count. However, the reservation map indicates the reservation was consumed. This resulting inconsistent state will cause the 'leak' of a reserved huge page. The global reserve count will diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 184eb38..bd05bc7 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -130,7 +130,7 @@ extern unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags); -extern void free_transhuge_page(struct page *page); +extern void transhuge_page_dtor(struct page *page); extern struct page *alloc_transhuge_page_vma(gfp_t gfp_mask, struct vm_area_struct *vma, unsigned long addr); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 8bbbd37..24492c5 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -118,7 +118,7 @@ long hugetlb_unreserve_pages(struct inode *inode, long start, long end, long freed); bool isolate_huge_page(struct page *page, struct list_head *list); void putback_active_hugepage(struct page *page); -void free_huge_page(struct page *page); +void huge_page_dtor(struct page *page); void hugetlb_fix_reserve_counts(struct inode *inode); extern struct mutex *hugetlb_fault_mutex_table; u32 hugetlb_fault_mutex_hash(struct hstate *h, struct mm_struct *mm, diff --git a/include/linux/mm.h b/include/linux/mm.h index 065d99d..adfa906 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -616,7 +616,7 @@ void split_page(struct page *page, unsigned int order); * prototype for that function and accessor functions. * These are _only_ valid on the head of a compound page. */ -typedef void compound_page_dtor(struct page *); +typedef void compound_page_dtor_t(struct page *); /* Keep the enum in sync with compound_page_dtors array in mm/page_alloc.c */ enum compound_dtor_id { @@ -630,7 +630,7 @@ enum compound_dtor_id { #endif NR_COMPOUND_DTORS, }; -extern compound_page_dtor * const compound_page_dtors[]; +extern compound_page_dtor_t * const compound_page_dtors[]; static inline void set_compound_page_dtor(struct page *page, enum compound_dtor_id compound_dtor) @@ -639,7 +639,7 @@ static inline void set_compound_page_dtor(struct page *page, page[1].compound_dtor = compound_dtor; } -static inline compound_page_dtor *get_compound_page_dtor(struct page *page) +static inline compound_page_dtor_t *get_compound_page_dtor(struct page *page) { VM_BUG_ON_PAGE(page[1].compound_dtor >= NR_COMPOUND_DTORS, page);
Re: [PATCH 1/2] mm, thp: introduce dedicated transparent huge page allocation interfaces
Hi Hocko, On Thu, Oct 19, 2017 at 02:49:31PM +0200, Michal Hocko wrote: > On Wed 18-10-17 19:00:26, Du, Changbin wrote: > > Hi Hocko, > > > > On Tue, Oct 17, 2017 at 12:20:52PM +0200, Michal Hocko wrote: > > > [CC Kirill] > > > > > > On Mon 16-10-17 17:19:16, changbin...@intel.com wrote: > > > > From: Changbin Du> > > > > > > > This patch introduced 4 new interfaces to allocate a prepared > > > > transparent huge page. > > > > - alloc_transhuge_page_vma > > > > - alloc_transhuge_page_nodemask > > > > - alloc_transhuge_page_node > > > > - alloc_transhuge_page > > > > > > > > The aim is to remove duplicated code and simplify transparent > > > > huge page allocation. These are similar to alloc_hugepage_xxx > > > > which are for hugetlbfs pages. This patch does below changes: > > > > - define alloc_transhuge_page_xxx interfaces > > > > - apply them to all existing code > > > > - declare prep_transhuge_page as static since no others use it > > > > - remove alloc_hugepage_vma definition since it no longer has users > > > > > > So what exactly is the advantage of the new API? The diffstat doesn't > > > sound very convincing to me. > > > > > The caller only need one step to allocate thp. Several LOCs removed for all > > the > > caller side with this change. So it's little more convinent. > > Yeah, but the overall result is more code. So I am not really convinced. Yes, but some of code are just to make compiler happy (declarations). These are just simple light wrappers same as other functions in kernel. At least the code readbility is improved by this, two steps allocation merged into one so duplicated logic removed. > -- > Michal Hocko > SUSE Labs -- Thanks, Changbin Du signature.asc Description: PGP signature
Re: [RFC PATCH 1/5] gpio: gpiolib: Add core support for maintaining GPIO values on reset
On Fri, 2017-10-20 at 09:43 +0200, Linus Walleij wrote: > On Fri, Oct 20, 2017 at 9:17 AM, Linus Walleij> wrote: > > > > On Fri, Oct 20, 2017 at 5:37 AM, Andrew Jeffery wrote: > > > > > GPIO state reset tolerance is implemented in gpiolib through the > > > addition of a new pinconf parameter. With that, some renaming of helpers > > > is done to clarify the scope of the already existing > > > gpiochip_line_is_persistent(), as it's now ambiguous as to whether that > > > means on suspend, reset or both. > > > > Isn't it most reasonable to say persistance covers both cases, reset > > and/or sleep? This seems a bit like overdefined. > > I should also add: right now persistance is defined in negative terms, > you can supply the flag "may lose value", which means the subsystem > by default, and driver by default, will try to keep values persistent across > sleep. > > Then it is possible to opt in for not doing so. (Usually to save power I > think.) > > I think that especially for userspace use cases, saving power should > not really be the concern, but correct me if I'm wrong. I am thinking > of a box with a DC plug wired up to a factory line here. > > What we have in the Arizona driver is an opt-in where the DT can > say "don't preserve the value this line during system sleep" i.e. "lay lose > value" and we can extend that flag to mean "don't preserve this line > during reset either" but by default assume that we should. Yeah, the preserve polarity was another thing I debated given the current example with the Arizona driver. Not preserving is the default for the Aspeed hardware, so that ended up influencing my choice. Not that implementation details should necessarily influence interface design, but it was at least more than a coin toss. I don't have anything specific against preserving by default, just my gut instinct and the hardware went the other way. As long as we expose the option to opt out, which the additions for the Arizona already do. Cheers, Andrew signature.asc Description: This is a digitally signed message part
Re: [RFC PATCH 1/5] gpio: gpiolib: Add core support for maintaining GPIO values on reset
On Fri, 2017-10-20 at 09:17 +0200, Linus Walleij wrote: > > On Fri, Oct 20, 2017 at 5:37 AM, Andrew Jefferywrote: > > > GPIO state reset tolerance is implemented in gpiolib through the > > addition of a new pinconf parameter. With that, some renaming of helpers > > is done to clarify the scope of the already existing > > gpiochip_line_is_persistent(), as it's now ambiguous as to whether that > > means on suspend, reset or both. > > Isn't it most reasonable to say persistance covers both cases, reset > and/or sleep? This seems a bit like overdefined. I definitely had some internal debate about that. I erred on the side of avoiding potential change in expectations for the arizona. If you consider that overdefined then I'm happy to go the other way. > > So can we say that is this flag is set, the hardware and driver should > do its best to preserve the value across any system disruptions. > > We can change the wording of course, patches welcome for that. Yep. > > But do we really need to distinguish the cases of disruption and > whether we cover up for them or not? > > I would say we can deal with that the day we have a system with > two register bits (or similar) where you can select to preserve across > sleep, reset, one or the other, AND there is also a usecase such that > a user wants to preserve the value across reset but not suspend or > vice versa. > > I suspect that will not happen. A very reasonable approach. Cheers for the feedback. Andrew signature.asc Description: This is a digitally signed message part
Re: [PATCH 1/3] printk: Introduce per-console loglevel setting
On Thu 2017-10-19 16:40:45, Calvin Owens wrote: > On 09/28/2017 05:43 PM, Calvin Owens wrote: > >Not all consoles are created equal: depending on the actual hardware, > >the latency of a printk() call can vary dramatically. The worst examples > >are serial consoles, where it can spin for tens of milliseconds banging > >the UART to emit a message, which can cause application-level problems > >when the kernel spews onto the console. > > Any thoughts on this series? Happy to resend again, but if there are no > objections I'd love to see it merged sooner rather than later :) > > Happy to resend too, just let me know. There is no need to resend the patch. It is on my radar and I am going to look at it. Please, be patient, you hit conference, illness, after vacation season. We do not want to unnecessarily delay it but it is not a trivial change that might be accepted within minutes. Best Regards, Petr -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/5] gpio: gpiolib: Add core support for maintaining GPIO values on reset
On Fri, Oct 20, 2017 at 9:17 AM, Linus Walleijwrote: > On Fri, Oct 20, 2017 at 5:37 AM, Andrew Jeffery wrote: > >> GPIO state reset tolerance is implemented in gpiolib through the >> addition of a new pinconf parameter. With that, some renaming of helpers >> is done to clarify the scope of the already existing >> gpiochip_line_is_persistent(), as it's now ambiguous as to whether that >> means on suspend, reset or both. > > Isn't it most reasonable to say persistance covers both cases, reset > and/or sleep? This seems a bit like overdefined. I should also add: right now persistance is defined in negative terms, you can supply the flag "may lose value", which means the subsystem by default, and driver by default, will try to keep values persistent across sleep. Then it is possible to opt in for not doing so. (Usually to save power I think.) I think that especially for userspace use cases, saving power should not really be the concern, but correct me if I'm wrong. I am thinking of a box with a DC plug wired up to a factory line here. What we have in the Arizona driver is an opt-in where the DT can say "don't preserve the value this line during system sleep" i.e. "lay lose value" and we can extend that flag to mean "don't preserve this line during reset either" but by default assume that we should. Yours, Linus Walleij -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 4/5] gpio: gpiolib: Add sysfs support for maintaining GPIO values on reset
On Fri, 2017-10-20 at 09:29 +0200, Linus Walleij wrote: > On Fri, Oct 20, 2017 at 5:37 AM, Andrew Jeffery> wrote: > > > Expose a new 'maintain' sysfs attribute to control both suspend and > > reset tolerance. > > > > Signed-off-by: Andrew Jeffery > > NAK. You will find the actual ABI documentation in > Documentation/ABI/obsolete/sysfs-gpio Right, I did a quick grep to find an attribute description in order to judge what documentation to change. Unfortunately my grep didn't pick up this file. > that's why. This is being phased out and should not be extended. > Everyone should use the character device, especially for new > functionality. Yeah, I expected this (and the NAK) would be the response but figured I should ask the question. Thanks, Andrew signature.asc Description: This is a digitally signed message part
Re: [RFC PATCH 4/5] gpio: gpiolib: Add sysfs support for maintaining GPIO values on reset
On Fri, Oct 20, 2017 at 5:37 AM, Andrew Jefferywrote: > Expose a new 'maintain' sysfs attribute to control both suspend and > reset tolerance. > > Signed-off-by: Andrew Jeffery NAK. You will find the actual ABI documentation in Documentation/ABI/obsolete/sysfs-gpio that's why. This is being phased out and should not be extended. Everyone should use the character device, especially for new functionality. Yours, Linus Walleij -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 2/5] gpio: gpiolib: Add OF support for maintaining GPIO values on reset
On Fri, 2017-10-20 at 09:18 +0200, Linus Walleij wrote: > On Fri, Oct 20, 2017 at 5:37 AM, Andrew Jeffery> wrote: > > > @@ -32,6 +32,7 @@ enum of_gpio_flags { > > OF_GPIO_SINGLE_ENDED = 0x2, > > OF_GPIO_OPEN_DRAIN = 0x4, > > OF_GPIO_SLEEP_MAY_LOSE_VALUE = 0x8, > > + OF_GPIO_RESET_TOLERANT = 0x16, > > Now you're mixing up decimal and hex. Ugh. Whoops. signature.asc Description: This is a digitally signed message part
Re: [RFC PATCH 3/5] gpio: gpiolib: Add chardev support for maintaining GPIO values on reset
I paged Bartosz and Michael on this, they are experts on the use cases for the character device and their opinions are likely more valuable than mine. On Fri, Oct 20, 2017 at 5:37 AM, Andrew Jefferywrote: > Similar to devicetree support, add flags and mappings to expose reset > tolerance configuration through the chardev interface. > > Signed-off-by: Andrew Jeffery (...) > +* Unconditionally configure reset tolerance, as it's possible > +* that the tolerance flag itself becomes tolerant to resets. > +* Thus it could remain set from a previous environment, but > +* the current environment may not expect it so. > +*/ > + ret = gpiod_set_reset_tolerant(desc, > + !!(lflags & > GPIOHANDLE_REQUEST_RESET_TOLERANT)); > + if (ret < 0) > + goto out_free_descs; First, as noted in the first patch, IMO we should just go for persistance, i.e. you want to flag to the system to keep the line persistent in any case, no matter if the system goes to sleep or resets. So the usecase is going to be a control system or similar, a makerspace project, an industrial product of some kind, driving GPIO from userspace. I don't see it as helpful to give userspace control over whether the line is persistent or not. It is more reasonable to assume persistance for userspace use cases, don't you think? Whether the system goes to sleep or the gpiochip resets should not make a door suddenly close or the lights in the christmas tree go out, right? I think if the gpiochip supports persistance of any kind, we should try to use it and not have userspace provide flags for that. Yours, Linus Walleij -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 2/5] gpio: gpiolib: Add OF support for maintaining GPIO values on reset
On Fri, Oct 20, 2017 at 5:37 AM, Andrew Jefferywrote: > @@ -32,6 +32,7 @@ enum of_gpio_flags { > OF_GPIO_SINGLE_ENDED = 0x2, > OF_GPIO_OPEN_DRAIN = 0x4, > OF_GPIO_SLEEP_MAY_LOSE_VALUE = 0x8, > + OF_GPIO_RESET_TOLERANT = 0x16, Now you're mixing up decimal and hex. Anyways, I do not think this is necessary. Yours, Linus Walleij -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/5] gpio: gpiolib: Add core support for maintaining GPIO values on reset
On Fri, Oct 20, 2017 at 5:37 AM, Andrew Jefferywrote: > GPIO state reset tolerance is implemented in gpiolib through the > addition of a new pinconf parameter. With that, some renaming of helpers > is done to clarify the scope of the already existing > gpiochip_line_is_persistent(), as it's now ambiguous as to whether that > means on suspend, reset or both. Isn't it most reasonable to say persistance covers both cases, reset and/or sleep? This seems a bit like overdefined. So can we say that is this flag is set, the hardware and driver should do its best to preserve the value across any system disruptions. We can change the wording of course, patches welcome for that. But do we really need to distinguish the cases of disruption and whether we cover up for them or not? I would say we can deal with that the day we have a system with two register bits (or similar) where you can select to preserve across sleep, reset, one or the other, AND there is also a usecase such that a user wants to preserve the value across reset but not suspend or vice versa. I suspect that will not happen. Yours, Linus Walleij -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Make squashfs fragments' cache size more configurable
On Thu, Oct 19, 2017 at 12:50 AM, Qixuan Wuwrote: > Hi All, > > Currently, squashfs fragments' cache size is only determined by > config option CONFIG_SQUASHFS_FRAGMENT_CACHE_SIZE. Users have > no way to change the value when they get the binary kernel. Thank-you for the patches, but they're both pointless and dangerous. Let's be clear here you're trying to change an "expert only" kernel configuration option into a user changeable option. This is stupid because it is not meant for non-experts to change for good reason. The fragment cache size isn't some small tweak to the operation of Squashfs, it fundamentally affects both the performance and memory overhead of Squashfs. As such right from its introduction in 2003, it has been an "expert only" configuration option at build time. Even then it is made clear that the default has been carefully chosen, and it should only be changed in exceptional circumstances. This basically means don't change the default unless you really know what you're doing, and this means tracing of Squashfs against your use-case to determine caching behaviour. There is absolutely no other reason why you'd want to change the default. This also means it should be restricted to kernel configuration time only. Let's be clear again, very few people should ever want to change the default, and for the "experts" that do want to do so, they can do so when configuring the kernel. If you're not in a position to change it at kernel configuration time then by definition you're not an expert, and you shouldn't be able to change it anyway and certainly not as a user. There is absolutely no use-case here to make this a user changeable option. I can see no upsides in doing this, only downsides. Frankly if you need to change this value at module insert time then there is something wrong with your system or build process. If you want this because you want to build the kernel/modules once, and then post-facto configure them for various products then it is your build process that is broken. If you want this because you want to dynamically change Squashfs memory usage/caching behaviour post kernel configuration time it suggests you're trying to adapt Squashfs's footprint based on available memory. This is an abuse of the option as it's only meant to be used after detailed tracing/analysis and certainly not used to accommodate unforeseen dynamic low memory situations, and if that's the reason for needing this option, you should be looking to solve it elsewhere. Ultimately this has been an "expert" kernel configuration only option since its introduction in 2003, and I never been asked to change it, and I believe this is because people recognise it as such. I suspect you're trying to change this for fundamentally bogus reasons. Moreover Squashfs is used in many different use-cases and distributions, and I'm not going to make this a user-changeable option allowing users to insert the Squashfs module in such a way that will break its performance. So NACK. Phillip Lougher (Squashfs maintainer) > Now make it be configured when booting or inserting module. > Actually, it's better that a config option in a number format > in .config file cat be reconfigured during booting or inserting > module. > > Thanks > Qixuan > > Qixuan Wu (2): > Squashfs: Let the number of fragments cached configurable > Documentation/kernel-parameters.txt: Add kernel parameter of squashfs > fragments' cache size > > Documentation/admin-guide/kernel-parameters.txt | 7 > fs/squashfs/super.c | 43 > - > 2 files changed, 49 insertions(+), 1 deletion(-) > > -- > 2.7.4 > -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume
[...] > In this regards as we consider genpd being a trivial PM domain, those > examples your bring up above is too me also examples of trivial PM > domains. Especially because they don't deal with wakeups, as that is > taken care of by the drivers, right!? Not directly, for example, omap device framework has noirq callback implemented which forcibly disable all devices which are not PM runtime suspended. while doing this it calls drivers PM .runtime_suspend() which may return non 0 value and in this case device will be left enabled (powered) at suspend for wake up purposes (see _od_suspend_noirq()). >>> >>> Yeah, I had that feeling that omap has some trickyness going on. :-) >>> >>> I sure that can be fixed in the omap PM domain, although >> >> ...slipped with my fingers.. here is the rest of the reply... >> >> ..of course that require us to use another way for drivers to signal >> to the omap PM domain that it needs to stay powered as to deal with >> wakeup. >> >> I can have a look at that more closely, to see if it makes sense to change. >> > > Also, additional note here. some IPs are reused between OMAP/Davinci/Keystone, > OMAP PM domain have some code running at noirq time to dial with devices left > in PM runtime enabled state (OMAP PM runtime centric), while Davinci/Keystone > haven't (clock_ops.c), > so pm_runtime_force_* API is actually possibility now to make the same driver > work > on all these platforms. That sounds great! Also, in the end it would be nice to also convert the OMAP PM domain to genpd. I think most of the needed infrastructure is already there to do that. Kind regards Uffe -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html