Re: [Ksummit-discuss] bug-introducing patches
On Tue, 8 May 2018, Sasha Levin wrote: There's no one, for example, who picked up vanilla v4.16 and plans to keep using it for a year. Actually, at a prior job I would do almost exactly that. I never intended to go a year without updating, but it would happen if nothing came up that was related to the hardware/features I was running. so 'no one uses the Linus kernel is false.
Re: [Ksummit-discuss] bug-introducing patches
On Tue, 8 May 2018, Sasha Levin wrote: There's no one, for example, who picked up vanilla v4.16 and plans to keep using it for a year. Actually, at a prior job I would do almost exactly that. I never intended to go a year without updating, but it would happen if nothing came up that was related to the hardware/features I was running. so 'no one uses the Linus kernel is false.
Re: Reg : Spectre & Meltdown
the 4.4.112 patches that Greg just posted include a bunch of work for these vulnerabilities. Who knows what has been backported to the kernel he is running. k
Re: Reg : Spectre & Meltdown
the 4.4.112 patches that Greg just posted include a bunch of work for these vulnerabilities. Who knows what has been backported to the kernel he is running. k
Re: Reg : Spectre & Meltdown
you are running a RedHat kernel, you will have to ask them about what they have included in it. k
Re: Reg : Spectre & Meltdown
you are running a RedHat kernel, you will have to ask them about what they have included in it. k
Re: [PATCH] x86/retpoline: Fill return stack buffer on vmexit
I somewhat hate to ask this, but for those of us following at home, what does this add to the overhead? I am remembering an estimate from mid last week that put retpoline at replacing a 3 clock 'ret' with 30 clocks of eye-bleed code
Re: [PATCH] x86/retpoline: Fill return stack buffer on vmexit
I somewhat hate to ask this, but for those of us following at home, what does this add to the overhead? I am remembering an estimate from mid last week that put retpoline at replacing a 3 clock 'ret' with 30 clocks of eye-bleed code
Re: Avoid speculative indirect calls in kernel
The point is that in many cases, if someone explits the "trusted" process, they already have everything that the machine is able to do anyway.
Re: Avoid speculative indirect calls in kernel
The point is that in many cases, if someone explits the "trusted" process, they already have everything that the machine is able to do anyway.
Re: Avoid speculative indirect calls in kernel
On Wed, 3 Jan 2018, Andi Kleen wrote: Why is this all done without any configuration options? I was thinking of a config option, but I was struggling with a name. CONFIG_INSECURE_KERNEL, CONFIG_LEAK_MEMORY? CONFIG_BUGGY_INTEL_CACHE (or similar) something that indicates that this is to support the Intel CPUs that have this bug in them. We've had such CPU specific support options in the past. Some people will need the speed more than the protection, some people will be running on CPUs that don't need this. Why is this needed? because of an Intel bug, so name it accordingly. David Lang
Re: Avoid speculative indirect calls in kernel
On Wed, 3 Jan 2018, Andi Kleen wrote: Why is this all done without any configuration options? I was thinking of a config option, but I was struggling with a name. CONFIG_INSECURE_KERNEL, CONFIG_LEAK_MEMORY? CONFIG_BUGGY_INTEL_CACHE (or similar) something that indicates that this is to support the Intel CPUs that have this bug in them. We've had such CPU specific support options in the past. Some people will need the speed more than the protection, some people will be running on CPUs that don't need this. Why is this needed? because of an Intel bug, so name it accordingly. David Lang
Re: Yes you have standing to sue GRSecurity
On Sat, 29 Jul 2017, Paul G. Allen wrote: It's not even clear that there is infringement. The GPL merely requires that people who have been distributed copies of GPL'ed code must not be restricted from further redistribution of the code. It does not require that that someone who is distributing it must available on a public FTP/HTTP server. what I have seen reported is that they are adding additional restrictions, that if any of their customers redistribute the source, their contract with grsecurity is terminated. If there is something to this (that GRSecurity is somehow in violation of the GPL), then it would probably be a very good idea for someone (the community, Red Hat, etc.) to protect the kernel. From my understanding, at least in America, protections under any license or contract (especially dealing with copyright and trademark infringement) are only enforceable as long as the party with the rights enforce the license/contract/agreement. You are thinking of Trademarks, they must be defended or you loose them. Contracts and Licenses do not need to be defended at every chance or risk loosing them. There is also something in law called "setting a precedent" and if the violating of the Linux license agreement is left unchecked, then quite possibly a precedent could be set to allow an entire upstream kernel to be co-opted. This is a potential problem. David Lang
Re: Yes you have standing to sue GRSecurity
On Sat, 29 Jul 2017, Paul G. Allen wrote: It's not even clear that there is infringement. The GPL merely requires that people who have been distributed copies of GPL'ed code must not be restricted from further redistribution of the code. It does not require that that someone who is distributing it must available on a public FTP/HTTP server. what I have seen reported is that they are adding additional restrictions, that if any of their customers redistribute the source, their contract with grsecurity is terminated. If there is something to this (that GRSecurity is somehow in violation of the GPL), then it would probably be a very good idea for someone (the community, Red Hat, etc.) to protect the kernel. From my understanding, at least in America, protections under any license or contract (especially dealing with copyright and trademark infringement) are only enforceable as long as the party with the rights enforce the license/contract/agreement. You are thinking of Trademarks, they must be defended or you loose them. Contracts and Licenses do not need to be defended at every chance or risk loosing them. There is also something in law called "setting a precedent" and if the violating of the Linux license agreement is left unchecked, then quite possibly a precedent could be set to allow an entire upstream kernel to be co-opted. This is a potential problem. David Lang
Re: [copyleft-next] Re: Kernel modules under new copyleft licence : (was Re: [PATCH v2] module.h: add copyleft-next >= 0.3.1 as GPL compatible)
On Fri, 19 May 2017, Luis R. Rodriguez wrote: On Thu, May 18, 2017 at 06:12:05PM -0400, Theodore Ts'o wrote: Sorry, I guess I wasn't clear enough. So there are two major cases, with three sub-cases for each. 1) The driver is dual-licensed GPLv2 and copyleft-next 1A) The developer only wants to use the driver, without making any changes to it. 1B) The developer wants to make changes to the driver, and distribute source and binaries 1C) The developer wants to make changes to the driver, and contribute the changes back to upstream. 2) The driver is solely licensed under copyleft-next 2A) The developer only wants to use the driver, without making any changes to it. 2B) The developer wants to make changes to the driver, and distribute source and binaries 2C) The developer wants to make changes to the driver, and contribute the changes back to upstream. In cases 1A and 1B, I claim that no additional lawyer ink is required, I really cannot see how you might have an attorney who wants ink on 2A but not 1A. I really cannot see how you might have an attorney who wants ink on 2B but not 1B. If something is under multiple licences, and one is a license that is known, you can just use that license and not worry (or even think) about what other licenses are available. But if it's a new license, then it needs to be analyzed, and that takes lawyer ink. That's why 1A and 1B are ok, you can ignore copyleft-next and just use GPLv2 David Lang
Re: [copyleft-next] Re: Kernel modules under new copyleft licence : (was Re: [PATCH v2] module.h: add copyleft-next >= 0.3.1 as GPL compatible)
On Fri, 19 May 2017, Luis R. Rodriguez wrote: On Thu, May 18, 2017 at 06:12:05PM -0400, Theodore Ts'o wrote: Sorry, I guess I wasn't clear enough. So there are two major cases, with three sub-cases for each. 1) The driver is dual-licensed GPLv2 and copyleft-next 1A) The developer only wants to use the driver, without making any changes to it. 1B) The developer wants to make changes to the driver, and distribute source and binaries 1C) The developer wants to make changes to the driver, and contribute the changes back to upstream. 2) The driver is solely licensed under copyleft-next 2A) The developer only wants to use the driver, without making any changes to it. 2B) The developer wants to make changes to the driver, and distribute source and binaries 2C) The developer wants to make changes to the driver, and contribute the changes back to upstream. In cases 1A and 1B, I claim that no additional lawyer ink is required, I really cannot see how you might have an attorney who wants ink on 2A but not 1A. I really cannot see how you might have an attorney who wants ink on 2B but not 1B. If something is under multiple licences, and one is a license that is known, you can just use that license and not worry (or even think) about what other licenses are available. But if it's a new license, then it needs to be analyzed, and that takes lawyer ink. That's why 1A and 1B are ok, you can ignore copyleft-next and just use GPLv2 David Lang
Re: Apparent backward time travel in timestamps on file creation
On Thu, 30 Mar 2017, David Howells wrote: Linus Torvalds <torva...@linux-foundation.org> wrote: The error bar can be huge, for the simple reason that the filesystem you are testing may not be sharing a clock with the CPU at _all_. IOW, think network filesystems. Can't I just not do the tests when the filesystem is a network fs? I don't think it should be a problem for disk filesystems on network-attached storage. it's not trivial to detect if a filesystem is local or network (you would have to do calls to figure out what filesystem you are on, then have a list to define what's local and what's remote, that list would become out of date as new filesystems are added) David Lang
Re: Apparent backward time travel in timestamps on file creation
On Thu, 30 Mar 2017, David Howells wrote: Linus Torvalds wrote: The error bar can be huge, for the simple reason that the filesystem you are testing may not be sharing a clock with the CPU at _all_. IOW, think network filesystems. Can't I just not do the tests when the filesystem is a network fs? I don't think it should be a problem for disk filesystems on network-attached storage. it's not trivial to detect if a filesystem is local or network (you would have to do calls to figure out what filesystem you are on, then have a list to define what's local and what's remote, that list would become out of date as new filesystems are added) David Lang
Re: [Cluster-devel] [PATCH 8/8] Revert "ext4: fix wrong gfp type under transaction"
On Fri, 27 Jan 2017, Christoph Hellwig wrote: On Fri, Jan 27, 2017 at 11:40:42AM -0500, Theodore Ts'o wrote: The reason why I'm nervous is that nojournal mode is not a common configuration, and "wait until production systems start failing" is not a strategy that I or many SRE-types find comforting. What does SRE stand for? Site Reliability Engineer, a mix of operations and engineering (DevOps++) David Lang
Re: [Cluster-devel] [PATCH 8/8] Revert "ext4: fix wrong gfp type under transaction"
On Fri, 27 Jan 2017, Christoph Hellwig wrote: On Fri, Jan 27, 2017 at 11:40:42AM -0500, Theodore Ts'o wrote: The reason why I'm nervous is that nojournal mode is not a common configuration, and "wait until production systems start failing" is not a strategy that I or many SRE-types find comforting. What does SRE stand for? Site Reliability Engineer, a mix of operations and engineering (DevOps++) David Lang
Re: Regression - SATA disks behind USB ones on v4.8-rc1, breaking boot. [Re: Who reordered my disks (probably v4.8-rc1 problem)]
On Sun, 14 Aug 2016, Tom Yan wrote: On 14 August 2016 at 18:07, Tom Yan <tom.t...@gmail.com> wrote: On 14 August 2016 at 18:01, Pavel Machek <pa...@ucw.cz> wrote: Since SATA support was merged, certainly since v2.4, and from way before /dev/disk/by-id existed. I have no idea how "SATA before USB" had been done in the past (if it was ever a thing in the kernel), but that has not been the case since at least v3.0 AFAIR. People may not run udev, and you can't use /dev/disk/by-id on kernel command line. No, but you can always use root=PARTUUID=, that's built into the kernel. (root=UUID= requires udev or so though). Silly me. root=UUID= has nothing to do with udev, but `blkid` in util-linux. At least that's how it's done in Arch/mkinitcpio. The rule is "don't break working systems", not "but we are allowed to break systems, see it says here not to depend on this" Drive ordering has been stable since the 0.1 kernel [1] It takes a lot longer to detect USB drives, why in the world would they be detected before hard-wired drives? I expect that Linus' response is going to be very quotable. David Lang [1] given stable hardware and no new drivers becoming involved
Re: Regression - SATA disks behind USB ones on v4.8-rc1, breaking boot. [Re: Who reordered my disks (probably v4.8-rc1 problem)]
On Sun, 14 Aug 2016, Tom Yan wrote: On 14 August 2016 at 18:07, Tom Yan wrote: On 14 August 2016 at 18:01, Pavel Machek wrote: Since SATA support was merged, certainly since v2.4, and from way before /dev/disk/by-id existed. I have no idea how "SATA before USB" had been done in the past (if it was ever a thing in the kernel), but that has not been the case since at least v3.0 AFAIR. People may not run udev, and you can't use /dev/disk/by-id on kernel command line. No, but you can always use root=PARTUUID=, that's built into the kernel. (root=UUID= requires udev or so though). Silly me. root=UUID= has nothing to do with udev, but `blkid` in util-linux. At least that's how it's done in Arch/mkinitcpio. The rule is "don't break working systems", not "but we are allowed to break systems, see it says here not to depend on this" Drive ordering has been stable since the 0.1 kernel [1] It takes a lot longer to detect USB drives, why in the world would they be detected before hard-wired drives? I expect that Linus' response is going to be very quotable. David Lang [1] given stable hardware and no new drivers becoming involved
Re: Variant symlink filesystem
On Sat, 12 Mar 2016, Cole wrote: On 12 March 2016 at 00:24, Al Viro <v...@zeniv.linux.org.uk> wrote: On Sat, Mar 12, 2016 at 12:03:11AM +0200, Cole wrote: This was one of the first solutions we looked at, and using various namespaces. However we would like to be able to have multiple terminal sessions open, and be able to have each session using a different mount point, or be able to use the other terminals mount point, i.e. switching the mount point to that of the other terminals. We would also like the shell to be able to make use of these, and use shell commands such as 'ls'. When we originally looked at namespaces and containers, we could not find a solution to achieve the above. Is this possible using namespaces? I'd try to look at setns(2) if you want processes joinging existing namespaces. I'm afraid that I'll need to get some sleep before I'll be up to asking the right questions for figuring out what requirements do you have and what's the best way to do it - after a while coffee stops being efficient and I'm already several hours past that ;-/ Sure, not a problem, when you have time to reply I will gladly welcome any feed back. As for the usage, I'll explain it a bit so that you have something to work off of when you get a chance to read it. The problem we encountered with namespaces when we looked at it more than a year ago was 'how do you get the shell' to join them, or into one. And also how do you move the shell in one terminal session into a namespace that another shell is currently in. We wanted a solution that doesn't require modifying existing programs to make them namespace aware. However, as I said, this was more than a year ago that we looked at it, and we could easily have misunderstood something, or not understood the full functionality available. If you say this is possible, without modifying programs such as bash, could you please point me in the direction of the documentation describing this, and I will try to educate myself. looking at the setns() function, it seems like you could have a suid helper program that you run in one session that changes the namespace and then invokes a bash shell in that namespace that you then run unmodified stuff in. it seems like there should be a way for a root program to change the namespace of another, but I'm not finding it at the moment. There is the nsenter program that will run a program inside an existing namespace. It looks like you need something similar that implements some permission checking (only let you go into namespaces of other programs for the same user or similar), but you should be able to make proof-of-concept scripts with nsenter. David Lang
Re: Variant symlink filesystem
On Sat, 12 Mar 2016, Cole wrote: On 12 March 2016 at 00:24, Al Viro wrote: On Sat, Mar 12, 2016 at 12:03:11AM +0200, Cole wrote: This was one of the first solutions we looked at, and using various namespaces. However we would like to be able to have multiple terminal sessions open, and be able to have each session using a different mount point, or be able to use the other terminals mount point, i.e. switching the mount point to that of the other terminals. We would also like the shell to be able to make use of these, and use shell commands such as 'ls'. When we originally looked at namespaces and containers, we could not find a solution to achieve the above. Is this possible using namespaces? I'd try to look at setns(2) if you want processes joinging existing namespaces. I'm afraid that I'll need to get some sleep before I'll be up to asking the right questions for figuring out what requirements do you have and what's the best way to do it - after a while coffee stops being efficient and I'm already several hours past that ;-/ Sure, not a problem, when you have time to reply I will gladly welcome any feed back. As for the usage, I'll explain it a bit so that you have something to work off of when you get a chance to read it. The problem we encountered with namespaces when we looked at it more than a year ago was 'how do you get the shell' to join them, or into one. And also how do you move the shell in one terminal session into a namespace that another shell is currently in. We wanted a solution that doesn't require modifying existing programs to make them namespace aware. However, as I said, this was more than a year ago that we looked at it, and we could easily have misunderstood something, or not understood the full functionality available. If you say this is possible, without modifying programs such as bash, could you please point me in the direction of the documentation describing this, and I will try to educate myself. looking at the setns() function, it seems like you could have a suid helper program that you run in one session that changes the namespace and then invokes a bash shell in that namespace that you then run unmodified stuff in. it seems like there should be a way for a root program to change the namespace of another, but I'm not finding it at the moment. There is the nsenter program that will run a program inside an existing namespace. It looks like you need something similar that implements some permission checking (only let you go into namespaces of other programs for the same user or similar), but you should be able to make proof-of-concept scripts with nsenter. David Lang
Re: Variant symlink filesystem
On Sat, 12 Mar 2016, Cole wrote: On 11 March 2016 at 23:51, Al Viro <v...@zeniv.linux.org.uk> wrote: On Fri, Mar 11, 2016 at 10:52:52PM +0200, Cole wrote: The implementation doesn't necessarily have to continue to work with env variables. On FreeBSD, the variant symlinks function by using variables stored in kernel memory, and have a hierarchical lookup, starting with user defined values and terminating with global entries. I am not aware of such functionality existing on linux, but if someone could point me at something similar to that, I would much prefer to use that, as there are issues with variables that are exported or modified during process execution. Put your processes into a separate namespace and use mount --bind in it... This was one of the first solutions we looked at, and using various namespaces. However we would like to be able to have multiple terminal sessions open, and be able to have each session using a different mount point, or be able to use the other terminals mount point, i.e. switching the mount point to that of the other terminals. We would also like the shell to be able to make use of these, and use shell commands such as 'ls'. you should be able to have multiple sessions using the same namespace. There is the lwn.net series on namespaces at https://lwn.net/Articles/531114/ from what I'm looking at, this should be possible with the right mount options. It's not as trivial as setting an environment variable, but if it's all scripted, that shouldn't matter to the user. you would need to use the setns() call to have one session join an existing namespace rather than creating a new one. now, changing namespaces does require CAP_SYS_ADMIN, so if you are not running things as root, you may need to create a small daemon to run as root that reassigns your different sessions from one ns to another. David Lang When we originally looked at namespaces and containers, we could not find a solution to achieve the above. Is this possible using namespaces? Regards /Cole
Re: Variant symlink filesystem
On Sat, 12 Mar 2016, Cole wrote: On 11 March 2016 at 23:51, Al Viro wrote: On Fri, Mar 11, 2016 at 10:52:52PM +0200, Cole wrote: The implementation doesn't necessarily have to continue to work with env variables. On FreeBSD, the variant symlinks function by using variables stored in kernel memory, and have a hierarchical lookup, starting with user defined values and terminating with global entries. I am not aware of such functionality existing on linux, but if someone could point me at something similar to that, I would much prefer to use that, as there are issues with variables that are exported or modified during process execution. Put your processes into a separate namespace and use mount --bind in it... This was one of the first solutions we looked at, and using various namespaces. However we would like to be able to have multiple terminal sessions open, and be able to have each session using a different mount point, or be able to use the other terminals mount point, i.e. switching the mount point to that of the other terminals. We would also like the shell to be able to make use of these, and use shell commands such as 'ls'. you should be able to have multiple sessions using the same namespace. There is the lwn.net series on namespaces at https://lwn.net/Articles/531114/ from what I'm looking at, this should be possible with the right mount options. It's not as trivial as setting an environment variable, but if it's all scripted, that shouldn't matter to the user. you would need to use the setns() call to have one session join an existing namespace rather than creating a new one. now, changing namespaces does require CAP_SYS_ADMIN, so if you are not running things as root, you may need to create a small daemon to run as root that reassigns your different sessions from one ns to another. David Lang When we originally looked at namespaces and containers, we could not find a solution to achieve the above. Is this possible using namespaces? Regards /Cole
Re: Variant symlink filesystem
On Fri, 11 Mar 2016, Cole wrote: On 11 March 2016 at 22:24, Richard Weinberger <rich...@nod.at> wrote: Am 11.03.2016 um 21:22 schrieb Cole: If I remember correctly, when we were testing the fuse version, we hard coded the path to see if that solved the problem, and the difference between the env lookup code and the hard coded path was almost the same, but substantially slower than the native file system. And where exactly as the performance problem? Anyway, if you submit your filesystem also provide a decent use case for it. :-) Thank you, I will do so. One example as a use case could be to allow for multiple package repositories to exist on a single computer, all in different locations, but with a fixed path so as not to break the package manager, the correct repository then is selected based on ENV variable. That way each user could have their own packages installed that would be separate from the system packages, and no collisions would occur. why would this not be a case to use filesystem namespaces and bind mounts? David Lang
Re: Variant symlink filesystem
On Fri, 11 Mar 2016, Cole wrote: On 11 March 2016 at 22:24, Richard Weinberger wrote: Am 11.03.2016 um 21:22 schrieb Cole: If I remember correctly, when we were testing the fuse version, we hard coded the path to see if that solved the problem, and the difference between the env lookup code and the hard coded path was almost the same, but substantially slower than the native file system. And where exactly as the performance problem? Anyway, if you submit your filesystem also provide a decent use case for it. :-) Thank you, I will do so. One example as a use case could be to allow for multiple package repositories to exist on a single computer, all in different locations, but with a fixed path so as not to break the package manager, the correct repository then is selected based on ENV variable. That way each user could have their own packages installed that would be separate from the system packages, and no collisions would occur. why would this not be a case to use filesystem namespaces and bind mounts? David Lang
Re: [PATCH 00/42] ACPICA: 20151218 Release
what is ACPICA and why should we care about divergence between it and the linux upstream? Where is it to be found? This may be common knowlege to many people, but it should probably be documented in the patch bundle and it's explination. David Lang On Tue, 29 Dec 2015, Lv Zheng wrote: Date: Tue, 29 Dec 2015 13:52:19 +0800 From: Lv Zheng To: Rafael J. Wysocki , Len Brown Cc: Lv Zheng , Lv Zheng , linux-kernel@vger.kernel.org, linux-a...@vger.kernel.org Subject: [PATCH 00/42] ACPICA: 20151218 Release The 20151218 ACPICA kernel-resident subsystem updates are linuxized based on the linux-pm/linux-next branch. The patchset has passed the following build/boot tests. Build tests are performed as follows: 1. i386 + allyes 2. i386 + allno 3. i386 + default + ACPI_DEBUGGER=y 4. i386 + default + ACPI_DEBUGGER=n + ACPI_DEBUG=y 5. i386 + default + ACPI_DEBUG=n + ACPI=y 6. i386 + default + ACPI=n 7. x86_64 + allyes 8. x86_64 + allno 9. x86_64 + default + ACPI_DEBUGGER=y 10.x86_64 + default + ACPI_DEBUGGER=n + ACPI_DEBUG=y 11.x86_64 + default + ACPI_DEBUG=n + ACPI=y 12.x86_64 + default + ACPI=n Boot tests are performed as follows: 1. i386 + default + ACPI_DEBUGGER=y 2. x86_64 + default + ACPI_DEBUGGER=y Where: 1. i386: machine named as "Dell Inspiron Mini 1010" 2. x86_64: machine named as "HP Compaq 8200 Elite SFF PC" 3. default: kernel configuration with following items enabled: All hardware drivers related to the machines of i386/x86_64 All "drivers/acpi" configurations All "drivers/platform" drivers All other drivers that link the APIs provided by ACPICA subsystem The divergences checking result: Before applying (20150930 Release): 517 lines After applying (20151218 Release): 506 lines Bob Moore (25): ACPICA: exmutex: General cleanup, restructured some code ACPICA: Core: Major update for code formatting, no functional changes ACPICA: Split interpreter tracing functions to a new file ACPICA: acpiexec: Add support for AML files containing multiple tables ACPICA: Disassembler/tools: Support for multiple ACPI tables in one file ACPICA: iasl/acpiexec: Update input file handling and verification ACPICA: Revert "acpi_get_object_info: Add support for ACPI 5.0 _SUB method." ACPICA: Add comment explaining _SUB removal ACPICA: acpiexec/acpinames: Update for error checking macros ACPICA: Concatenate operator: Add extensions to support all ACPI objects ACPICA: Debug Object: Cleanup output ACPICA: Debug object: Fix output for a NULL object ACPICA: Update for output of the Debug Object ACPICA: getopt: Comment update, no functional change ACPICA: Add new exception code, AE_IO_ERROR ACPICA: iasl/Disassembler: Support ASL ElseIf operator ACPICA: Parser: Add constants for internal namepath function ACPICA: Parser: Fix for SuperName method invocation ACPICA: Update parameter type for ObjectType operator ACPICA: Update internal #defines for ObjectType operator. No functional change ACPICA: Update for CondRefOf and RefOf operators ACPICA: Cleanup code related to the per-table module level improvement ACPICA: Add "root node" case to the ACPI name repair code ACPICA: Add per-table execution of module-level code ACPICA: Update version to 20151218 Colin Ian King (1): ACPICA: Tools: Add spacing and missing options in acpibin tool David E. Box (1): ACPICA: Fix SyncLevel support interaction with method auto-serialization LABBE Corentin (1): ACPICA: Add "const" to some functions that return fixed strings Lv Zheng (12): ACPICA: Linuxize: reduce divergences for 20151218 release ACPICA: Namespace: Fix wrong error log ACPICA: Debugger: reduce old external path format ACPICA: Namespace: Add scope information to the simple object repair mechanism ACPICA: Namespace: Add String -> ObjectReference conversion support ACPICA: Events: Deploys acpi_ev_find_region_handler() ACPICA: Events: Uses common_notify for address space handlers ACPICA: Utilities: Reorder initialization code ACPICA: Events: Fix an issue that region object is re-attached to another scope when it is already attached ACPICA: Events: Split acpi_ev_associate_reg_method() from region initialization code ACPICA: Events: Enhance acpi_ev_execute_reg_method() to ensure no _REG evaluations can happen during OS early boot stages ACPICA: Events: Introduce ACPI_REG_DISCONNECT invocation to acpi_ev_execute_reg_methods() Markus Elfring (1): ACPICA: Debugger: Remove some unecessary NULL checks Prarit Bhargava (1): ACPICA: acpi_get_sleep_type_data: Reduce warnings drivers/acpi/acpica/Makefile |4 +- drivers/acpi/acpica/acapps.h | 58 +- drivers/acpi/acpica/acdebug.h |5 +- drivers/acpi/acpica/acevents.h | 11 +- drivers/acpi/acpica/acglobal.h |3 +- drivers/acpi/acpica/aclocal.h
Re: [PATCH 00/42] ACPICA: 20151218 Release
what is ACPICA and why should we care about divergence between it and the linux upstream? Where is it to be found? This may be common knowlege to many people, but it should probably be documented in the patch bundle and it's explination. David Lang On Tue, 29 Dec 2015, Lv Zheng wrote: Date: Tue, 29 Dec 2015 13:52:19 +0800 From: Lv Zheng <lv.zh...@intel.com> To: Rafael J. Wysocki <rafael.j.wyso...@intel.com>, Len Brown <len.br...@intel.com> Cc: Lv Zheng <lv.zh...@intel.com>, Lv Zheng <zeta...@gmail.com>, linux-kernel@vger.kernel.org, linux-a...@vger.kernel.org Subject: [PATCH 00/42] ACPICA: 20151218 Release The 20151218 ACPICA kernel-resident subsystem updates are linuxized based on the linux-pm/linux-next branch. The patchset has passed the following build/boot tests. Build tests are performed as follows: 1. i386 + allyes 2. i386 + allno 3. i386 + default + ACPI_DEBUGGER=y 4. i386 + default + ACPI_DEBUGGER=n + ACPI_DEBUG=y 5. i386 + default + ACPI_DEBUG=n + ACPI=y 6. i386 + default + ACPI=n 7. x86_64 + allyes 8. x86_64 + allno 9. x86_64 + default + ACPI_DEBUGGER=y 10.x86_64 + default + ACPI_DEBUGGER=n + ACPI_DEBUG=y 11.x86_64 + default + ACPI_DEBUG=n + ACPI=y 12.x86_64 + default + ACPI=n Boot tests are performed as follows: 1. i386 + default + ACPI_DEBUGGER=y 2. x86_64 + default + ACPI_DEBUGGER=y Where: 1. i386: machine named as "Dell Inspiron Mini 1010" 2. x86_64: machine named as "HP Compaq 8200 Elite SFF PC" 3. default: kernel configuration with following items enabled: All hardware drivers related to the machines of i386/x86_64 All "drivers/acpi" configurations All "drivers/platform" drivers All other drivers that link the APIs provided by ACPICA subsystem The divergences checking result: Before applying (20150930 Release): 517 lines After applying (20151218 Release): 506 lines Bob Moore (25): ACPICA: exmutex: General cleanup, restructured some code ACPICA: Core: Major update for code formatting, no functional changes ACPICA: Split interpreter tracing functions to a new file ACPICA: acpiexec: Add support for AML files containing multiple tables ACPICA: Disassembler/tools: Support for multiple ACPI tables in one file ACPICA: iasl/acpiexec: Update input file handling and verification ACPICA: Revert "acpi_get_object_info: Add support for ACPI 5.0 _SUB method." ACPICA: Add comment explaining _SUB removal ACPICA: acpiexec/acpinames: Update for error checking macros ACPICA: Concatenate operator: Add extensions to support all ACPI objects ACPICA: Debug Object: Cleanup output ACPICA: Debug object: Fix output for a NULL object ACPICA: Update for output of the Debug Object ACPICA: getopt: Comment update, no functional change ACPICA: Add new exception code, AE_IO_ERROR ACPICA: iasl/Disassembler: Support ASL ElseIf operator ACPICA: Parser: Add constants for internal namepath function ACPICA: Parser: Fix for SuperName method invocation ACPICA: Update parameter type for ObjectType operator ACPICA: Update internal #defines for ObjectType operator. No functional change ACPICA: Update for CondRefOf and RefOf operators ACPICA: Cleanup code related to the per-table module level improvement ACPICA: Add "root node" case to the ACPI name repair code ACPICA: Add per-table execution of module-level code ACPICA: Update version to 20151218 Colin Ian King (1): ACPICA: Tools: Add spacing and missing options in acpibin tool David E. Box (1): ACPICA: Fix SyncLevel support interaction with method auto-serialization LABBE Corentin (1): ACPICA: Add "const" to some functions that return fixed strings Lv Zheng (12): ACPICA: Linuxize: reduce divergences for 20151218 release ACPICA: Namespace: Fix wrong error log ACPICA: Debugger: reduce old external path format ACPICA: Namespace: Add scope information to the simple object repair mechanism ACPICA: Namespace: Add String -> ObjectReference conversion support ACPICA: Events: Deploys acpi_ev_find_region_handler() ACPICA: Events: Uses common_notify for address space handlers ACPICA: Utilities: Reorder initialization code ACPICA: Events: Fix an issue that region object is re-attached to another scope when it is already attached ACPICA: Events: Split acpi_ev_associate_reg_method() from region initialization code ACPICA: Events: Enhance acpi_ev_execute_reg_method() to ensure no _REG evaluations can happen during OS early boot stages ACPICA: Events: Introduce ACPI_REG_DISCONNECT invocation to acpi_ev_execute_reg_methods() Markus Elfring (1): ACPICA: Debugger: Remove some unecessary NULL checks Prarit Bhargava (1): ACPICA: acpi_get_sleep_type_data: Reduce warnings drivers/acpi/acpica/Makefile |4 +- drivers/acpi/acpica/acapps.h | 58 +- drivers/acpi/acpica/acdebug.h |5 +- drivers/acpi/acpica/acevents.h
Re: kdbus: to merge or not to merge?
On Sun, 9 Aug 2015, Greg Kroah-Hartman wrote: The issue is with userspace clients opting in to receive all NameOwnerChanged messages on the bus, which is not a good idea as they constantly get woken up and process them, which is why the CPU was pegged. This issue should now be fixed in Rawhide for some of the packages we found that were doing this. Maintainers of other packages have been informed. End result, no one has ever really tested sending "bad" messages to the current system as all existing dbus users try to be "good actors", thanks to Andy's testing, these apps should all now become much more robust. Does it require elevated privileges to opt to receive all NameOwnerChanged messages on the bus? Is it the default unless the apps opt for something more restrictive? or is it somewhere in between? I was under the impression that the days of writing system-level stuff that assumes that all userspace apps are going to 'play nice' went out a decade or more ago. It's fine if the userspace app can kill itself, or possibly even the user it's running as, but being able to kill apps running as other users, let alone the whole system is a problem nowdays. It may be able to happen in a default system, but this is why cgroups and namespaces have been created, to give the system admin the ability to limit the resources that any one app can consume. Introducing a new mechanism that allows one user to consume resources allocated to another and kill the system without providing a kernel level mechanism to limit the damage (as opposed to fixing individual apps) seems rather short-sighted at best. David Lang
Re: kdbus: to merge or not to merge?
On Sun, 9 Aug 2015, Greg Kroah-Hartman wrote: The issue is with userspace clients opting in to receive all NameOwnerChanged messages on the bus, which is not a good idea as they constantly get woken up and process them, which is why the CPU was pegged. This issue should now be fixed in Rawhide for some of the packages we found that were doing this. Maintainers of other packages have been informed. End result, no one has ever really tested sending bad messages to the current system as all existing dbus users try to be good actors, thanks to Andy's testing, these apps should all now become much more robust. Does it require elevated privileges to opt to receive all NameOwnerChanged messages on the bus? Is it the default unless the apps opt for something more restrictive? or is it somewhere in between? I was under the impression that the days of writing system-level stuff that assumes that all userspace apps are going to 'play nice' went out a decade or more ago. It's fine if the userspace app can kill itself, or possibly even the user it's running as, but being able to kill apps running as other users, let alone the whole system is a problem nowdays. It may be able to happen in a default system, but this is why cgroups and namespaces have been created, to give the system admin the ability to limit the resources that any one app can consume. Introducing a new mechanism that allows one user to consume resources allocated to another and kill the system without providing a kernel level mechanism to limit the damage (as opposed to fixing individual apps) seems rather short-sighted at best. David Lang
Re: [FYI] tux3: Core changes
On Fri, 31 Jul 2015, Daniel Phillips wrote: On Friday, July 31, 2015 11:29:51 AM PDT, David Lang wrote: We, the Linux Community have less tolerance for losing people's data and preventing them from operating than we used to when it was all tinkerer's personal data and secondary systems. So rather than pushing optimizations out to everyone and seeing what breaks, we now do more testing and checking for failures before pushing things out. By the way, I am curious about whose data you think will get lost as a result of pushing out Tux3 with a possible theoretical bug in a wildly improbable scenario that has not actually been described with sufficient specificity to falsify, let alone demonstrated. you weren't asking about any particular feature of Tux, you were asking if we were still willing to push out stuff that breaks for users and fix it later. Especially for filesystems that can loose the data of whoever is using it, the answer seems to be a clear no. there may be bugs in what's pushed out that we don't know about. But we don't push out potential data corruption bugs that we do know about (or think we do) so if you think this should be pushed out with this known corner case that's not handled properly, you have to convince people that it's _so_ improbable that they shouldn't care about it. David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [FYI] tux3: Core changes
On Fri, 31 Jul 2015, Daniel Phillips wrote: Subject: Re: [FYI] tux3: Core changes On Friday, July 31, 2015 8:37:35 AM PDT, Raymond Jennings wrote: Returning ENOSPC when you have free space you can't yet prove is safer than not returning it and risking a data loss when you get hit by a write/commit storm. :) Remember when delayed allocation was scary and unproven, because proving that ENOSPC will always be returned when needed is extremely difficult? But the performance advantage was compelling, so we just worked at it until it worked. There were times when it didn't work properly, but the code was in the tree so it got fixed. It's like that now with page forking - a new technique with compelling advantages, and some challenges. In the past, we (the Linux community) would rise to the challenge and err on the side of pushing optimizations in early. That was our mojo, and that is how Linux became the dominant operating system it is today. Do we, the Linux community, still have that mojo? We, the Linux Community have less tolerance for losing people's data and preventing them from operating than we used to when it was all tinkerer's personal data and secondary systems. So rather than pushing optimizations out to everyone and seeing what breaks, we now do more testing and checking for failures before pushing things out. This means that when something new is introduced, we default to the safe, slightly slower way initially (there will be enough other bugs to deal with in any case), and then as we gain experience from the tinkerers enabling the performance optimizations, we make those optimizations reliable and only then push them out to all users. If you define this as "loosing our mojo", then yes we have. But most people see the pace of development as still being high, just with more testing and polishing before it gets out to users. David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [FYI] tux3: Core changes
On Fri, 31 Jul 2015, Daniel Phillips wrote: Subject: Re: [FYI] tux3: Core changes On Friday, July 31, 2015 8:37:35 AM PDT, Raymond Jennings wrote: Returning ENOSPC when you have free space you can't yet prove is safer than not returning it and risking a data loss when you get hit by a write/commit storm. :) Remember when delayed allocation was scary and unproven, because proving that ENOSPC will always be returned when needed is extremely difficult? But the performance advantage was compelling, so we just worked at it until it worked. There were times when it didn't work properly, but the code was in the tree so it got fixed. It's like that now with page forking - a new technique with compelling advantages, and some challenges. In the past, we (the Linux community) would rise to the challenge and err on the side of pushing optimizations in early. That was our mojo, and that is how Linux became the dominant operating system it is today. Do we, the Linux community, still have that mojo? We, the Linux Community have less tolerance for losing people's data and preventing them from operating than we used to when it was all tinkerer's personal data and secondary systems. So rather than pushing optimizations out to everyone and seeing what breaks, we now do more testing and checking for failures before pushing things out. This means that when something new is introduced, we default to the safe, slightly slower way initially (there will be enough other bugs to deal with in any case), and then as we gain experience from the tinkerers enabling the performance optimizations, we make those optimizations reliable and only then push them out to all users. If you define this as loosing our mojo, then yes we have. But most people see the pace of development as still being high, just with more testing and polishing before it gets out to users. David Lang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [FYI] tux3: Core changes
On Fri, 31 Jul 2015, Daniel Phillips wrote: On Friday, July 31, 2015 11:29:51 AM PDT, David Lang wrote: We, the Linux Community have less tolerance for losing people's data and preventing them from operating than we used to when it was all tinkerer's personal data and secondary systems. So rather than pushing optimizations out to everyone and seeing what breaks, we now do more testing and checking for failures before pushing things out. By the way, I am curious about whose data you think will get lost as a result of pushing out Tux3 with a possible theoretical bug in a wildly improbable scenario that has not actually been described with sufficient specificity to falsify, let alone demonstrated. you weren't asking about any particular feature of Tux, you were asking if we were still willing to push out stuff that breaks for users and fix it later. Especially for filesystems that can loose the data of whoever is using it, the answer seems to be a clear no. there may be bugs in what's pushed out that we don't know about. But we don't push out potential data corruption bugs that we do know about (or think we do) so if you think this should be pushed out with this known corner case that's not handled properly, you have to convince people that it's _so_ improbable that they shouldn't care about it. David Lang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Wed, 24 Jun 2015, Greg KH wrote: On Wed, Jun 24, 2015 at 10:39:52AM -0700, David Lang wrote: On Wed, 24 Jun 2015, Ingo Molnar wrote: And the thing is, in hindsight, after such huge flamewars, years down the line, almost never do I see the following question asked: 'what were we thinking merging that crap??'. If any question arises it's usually along the lines of: 'what was the big fuss about?'. So I think by and large the process works. counterexamples, devfs, tux Don't knock devfs. It created a lot of things that we take for granted now with our development model. Off the top of my head, here's a short list: - it showed that we can't arbritrary make user/kernel api changes without working with people outside of the kernel developer community, and expect people to follow them - the idea was sound, but the implementation was not, it had unfixable problems, so to fix those problems, we came up with better, kernel-wide solutions, forcing us to unify all device/driver subsystems. - we were forced to try to document our user/kernel apis better, hence Documentation/ABI/ was created - to remove devfs, we had to create a structure of _how_ to remove features. It took me 2-3 years to be able to finally delete the devfs code, as the infrastructure and feedback loops were just not in place before then to allow that to happen. So I would strongly argue that merging devfs was a good thing, it spurned a lot of us to get the job done correctly. Without it, we would have never seen the need, or had the knowledge of what needed to be done. I don't disagree with you, but it was definantly a case of adding something that was later regretted and removed. A lot was learned in the process, but that wasn't the issue I was referring to. I don't want kdbus to end up the same way. The more I think back to those discussions, the more parallels I see between the two. David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Wed, 24 Jun 2015, Greg KH wrote: On Wed, Jun 24, 2015 at 10:39:52AM -0700, David Lang wrote: On Wed, 24 Jun 2015, Ingo Molnar wrote: And the thing is, in hindsight, after such huge flamewars, years down the line, almost never do I see the following question asked: 'what were we thinking merging that crap??'. If any question arises it's usually along the lines of: 'what was the big fuss about?'. So I think by and large the process works. counterexamples, devfs, tux Don't knock devfs. It created a lot of things that we take for granted now with our development model. Off the top of my head, here's a short list: - it showed that we can't arbritrary make user/kernel api changes without working with people outside of the kernel developer community, and expect people to follow them - the idea was sound, but the implementation was not, it had unfixable problems, so to fix those problems, we came up with better, kernel-wide solutions, forcing us to unify all device/driver subsystems. - we were forced to try to document our user/kernel apis better, hence Documentation/ABI/ was created - to remove devfs, we had to create a structure of _how_ to remove features. It took me 2-3 years to be able to finally delete the devfs code, as the infrastructure and feedback loops were just not in place before then to allow that to happen. So I would strongly argue that merging devfs was a good thing, it spurned a lot of us to get the job done correctly. Without it, we would have never seen the need, or had the knowledge of what needed to be done. I don't disagree with you, but it was definantly a case of adding something that was later regretted and removed. A lot was learned in the process, but that wasn't the issue I was referring to. I don't want kdbus to end up the same way. The more I think back to those discussions, the more parallels I see between the two. David Lang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Wed, 24 Jun 2015, Martin Steigerwald wrote: Am Mittwoch, 24. Juni 2015, 10:39:52 schrieb David Lang: On Wed, 24 Jun 2015, Ingo Molnar wrote: And the thing is, in hindsight, after such huge flamewars, years down the line, almost never do I see the following question asked: 'what were we thinking merging that crap??'. If any question arises it's usually along the lines of: 'what was the big fuss about?'. So I think by and large the process works. counterexamples, devfs, tux What was tux? in-kernel webserver David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Wed, 24 Jun 2015, Ingo Molnar wrote: And the thing is, in hindsight, after such huge flamewars, years down the line, almost never do I see the following question asked: 'what were we thinking merging that crap??'. If any question arises it's usually along the lines of: 'what was the big fuss about?'. So I think by and large the process works. counterexamples, devfs, tux David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Wed, 24 Jun 2015, Ingo Molnar wrote: And the thing is, in hindsight, after such huge flamewars, years down the line, almost never do I see the following question asked: 'what were we thinking merging that crap??'. If any question arises it's usually along the lines of: 'what was the big fuss about?'. So I think by and large the process works. counterexamples, devfs, tux David Lang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Wed, 24 Jun 2015, Martin Steigerwald wrote: Am Mittwoch, 24. Juni 2015, 10:39:52 schrieb David Lang: On Wed, 24 Jun 2015, Ingo Molnar wrote: And the thing is, in hindsight, after such huge flamewars, years down the line, almost never do I see the following question asked: 'what were we thinking merging that crap??'. If any question arises it's usually along the lines of: 'what was the big fuss about?'. So I think by and large the process works. counterexamples, devfs, tux What was tux? in-kernel webserver David Lang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: clustered MD
On Tue, 9 Jun 2015, David Teigland wrote: We do have a valid real world utility. It is to provide high-availability of RAID1 storage over the cluster. The distributed locking is required only during cases of error and superblock updates and is not required during normal operations, which makes it fast enough for usual case scenarios. That's the theory, how much evidence do you have of that in practice? What are the doubts you have about it? Before I begin reviewing the implementation, I'd like to better understand what it is about the existing raid1 that doesn't work correctly for what you'd like to do with it, i.e. I don't know what the problem is. As I understand things, the problem is ~providing RAID across multiple machines, not just across the disks in one machine. David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: clustered MD
On Tue, 9 Jun 2015, David Teigland wrote: We do have a valid real world utility. It is to provide high-availability of RAID1 storage over the cluster. The distributed locking is required only during cases of error and superblock updates and is not required during normal operations, which makes it fast enough for usual case scenarios. That's the theory, how much evidence do you have of that in practice? What are the doubts you have about it? Before I begin reviewing the implementation, I'd like to better understand what it is about the existing raid1 that doesn't work correctly for what you'd like to do with it, i.e. I don't know what the problem is. As I understand things, the problem is ~providing RAID across multiple machines, not just across the disks in one machine. David Lang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Device Tree Blob (DTB) licence
On Fri, 29 May 2015, Enrico Weigelt, metux IT consult wrote: Important Notice: This message may contain confidential or privileged information. It is intended only for the person it was addressed to. If you are not the intended recipient of this email you may not copy, forward, disclose or otherwise use it or any part of it in any form whatsoever. If you received this email in error please notify the sender by replying and delete this message and any attachments without retaining a copy. P.S. some of us actually care about licenses being appropriate to what they're applied to, and at least theoretically capable of being honored. Your email footer may be very slightly undermining your position here. This is just a dumb auto-generated footer, coming from my client's mail server over here ... I'm just too lazy for setting up an own MTA on my workstation. You can safely ignore that. Arguing license issues and at the same time claiming that you should ignore a legal statement like the footer is a bit odd. David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Device Tree Blob (DTB) licence
On Fri, 29 May 2015, Enrico Weigelt, metux IT consult wrote: And why should they fear "poisoning" ? Search for "GPL contamination", the problem is quite common, GPL can turn anything GPL-compatible into GPL. So for a non-GPL project it's very hard to adopt GPL code. Yes, that's the whole purpose of the GPL. The deal is pretty simple: if you take some GPL'ed software and change it, you'll have to publish your changes under the same rules. For entirely separate entities (eg. dedicated programs) that's not an big issue. And for libraries, we have LGPL. If the DTS license would be a problem, it would be worse w/ ACPI and any proprietary firmware/BIOSes. not true, with a proprietary bios it's a clear "pay this much money and don't worry about it" while with GPL there's a nagging fear that someone you never heard of may sue you a decade from now claiming you need to give them the source to your OS. Is having the DTB GPL so impartant that you would rather let things fall into the windows trap ("well it booted windows, so it must be right") instead of allowing a proprietary OS to use your description of the hardware? note, this whole discussion assumes that the DTB is even copyrightable. Since it's intended to be strictly a functional description of what the hardware is able to do, that could be questioned David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Device Tree Blob (DTB) licence
On Fri, 29 May 2015, Enrico Weigelt, metux IT consult wrote: Important Notice: This message may contain confidential or privileged information. It is intended only for the person it was addressed to. If you are not the intended recipient of this email you may not copy, forward, disclose or otherwise use it or any part of it in any form whatsoever. If you received this email in error please notify the sender by replying and delete this message and any attachments without retaining a copy. P.S. some of us actually care about licenses being appropriate to what they're applied to, and at least theoretically capable of being honored. Your email footer may be very slightly undermining your position here. This is just a dumb auto-generated footer, coming from my client's mail server over here ... I'm just too lazy for setting up an own MTA on my workstation. You can safely ignore that. Arguing license issues and at the same time claiming that you should ignore a legal statement like the footer is a bit odd. David Lang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Device Tree Blob (DTB) licence
On Fri, 29 May 2015, Enrico Weigelt, metux IT consult wrote: And why should they fear poisoning ? Search for GPL contamination, the problem is quite common, GPL can turn anything GPL-compatible into GPL. So for a non-GPL project it's very hard to adopt GPL code. Yes, that's the whole purpose of the GPL. The deal is pretty simple: if you take some GPL'ed software and change it, you'll have to publish your changes under the same rules. For entirely separate entities (eg. dedicated programs) that's not an big issue. And for libraries, we have LGPL. If the DTS license would be a problem, it would be worse w/ ACPI and any proprietary firmware/BIOSes. not true, with a proprietary bios it's a clear pay this much money and don't worry about it while with GPL there's a nagging fear that someone you never heard of may sue you a decade from now claiming you need to give them the source to your OS. Is having the DTB GPL so impartant that you would rather let things fall into the windows trap (well it booted windows, so it must be right) instead of allowing a proprietary OS to use your description of the hardware? note, this whole discussion assumes that the DTB is even copyrightable. Since it's intended to be strictly a functional description of what the hardware is able to do, that could be questioned David Lang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [FYI] tux3: Core changes
On Mon, 25 May 2015, Daniel Phillips wrote: On Monday, May 25, 2015 11:04:39 PM PDT, David Lang wrote: if the page gets modified again, will that cause any issues? what if the page gets modified before the copy gets written out, so that there are two dirty copies of the page in the process of being written? David Lang How is the page going to get modified again? A forked page isn't mapped by a pte, so userspace can't modify it by mmap. The forked page is not in the page cache, so usespace can't modify it by posix file ops. So the writer would have to be in kernel. Tux3 knows what it is doing, so it won't modify the page. What kernel code besides Tux3 will modify the page? I'm assuming that Rik is talking about whatever has the reference to the page via one of the methods that he talked about. David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [FYI] tux3: Core changes
On Mon, 25 May 2015, Daniel Phillips wrote: On Monday, May 25, 2015 9:25:44 PM PDT, Rik van Riel wrote: On 05/21/2015 03:53 PM, Daniel Phillips wrote: On Wednesday, May 20, 2015 8:51:46 PM PDT, David Lang wrote: how do you prevent it from continuing to interact with the old version of the page and never see updates or have it's changes reflected on the current page? Why would it do that, and what would be surprising about it? Did you have a specific case in mind? After a get_page(), page_cache_get(), or other equivalent function, a piece of code has the expectation that it can continue using that page until after it has released the reference count. This can be an arbitrarily long period of time. It is perfectly welcome to keep using that page as long as it wants, Tux3 does not care. When it lets go of the last reference (and Tux3 has finished with it) then the page is freeable. Did you have a more specific example where this would be an issue? Are you talking about kernel or userspace code? if the page gets modified again, will that cause any issues? what if the page gets modified before the copy gets written out, so that there are two dirty copies of the page in the process of being written? David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [FYI] tux3: Core changes
On Mon, 25 May 2015, Daniel Phillips wrote: On Monday, May 25, 2015 9:25:44 PM PDT, Rik van Riel wrote: On 05/21/2015 03:53 PM, Daniel Phillips wrote: On Wednesday, May 20, 2015 8:51:46 PM PDT, David Lang wrote: how do you prevent it from continuing to interact with the old version of the page and never see updates or have it's changes reflected on the current page? Why would it do that, and what would be surprising about it? Did you have a specific case in mind? After a get_page(), page_cache_get(), or other equivalent function, a piece of code has the expectation that it can continue using that page until after it has released the reference count. This can be an arbitrarily long period of time. It is perfectly welcome to keep using that page as long as it wants, Tux3 does not care. When it lets go of the last reference (and Tux3 has finished with it) then the page is freeable. Did you have a more specific example where this would be an issue? Are you talking about kernel or userspace code? if the page gets modified again, will that cause any issues? what if the page gets modified before the copy gets written out, so that there are two dirty copies of the page in the process of being written? David Lang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [FYI] tux3: Core changes
On Mon, 25 May 2015, Daniel Phillips wrote: On Monday, May 25, 2015 11:04:39 PM PDT, David Lang wrote: if the page gets modified again, will that cause any issues? what if the page gets modified before the copy gets written out, so that there are two dirty copies of the page in the process of being written? David Lang How is the page going to get modified again? A forked page isn't mapped by a pte, so userspace can't modify it by mmap. The forked page is not in the page cache, so usespace can't modify it by posix file ops. So the writer would have to be in kernel. Tux3 knows what it is doing, so it won't modify the page. What kernel code besides Tux3 will modify the page? I'm assuming that Rik is talking about whatever has the reference to the page via one of the methods that he talked about. David Lang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [FYI] tux3: Core changes
On Wed, 20 May 2015, Daniel Phillips wrote: On 05/20/2015 03:51 PM, Daniel Phillips wrote: On 05/20/2015 12:53 PM, Rik van Riel wrote: How does tux3 prevent a user of find_get_page() from reading from or writing into the pre-COW page, instead of the current page? Careful control of the dirty bits (we have two of them, one each for front and back). That is what pagefork_for_blockdirty is about. Ah, and of course it does not matter if a reader is on the pre-cow page. It would be reading the earlier copy, which might no longer be the current copy, but it raced with the write so nobody should be surprised. That is a race even without page fork. how do you prevent it from continuing to interact with the old version of the page and never see updates or have it's changes reflected on the current page? David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [FYI] tux3: Core changes
On Wed, 20 May 2015, Daniel Phillips wrote: On 05/20/2015 07:44 AM, Jan Kara wrote: Yeah, that's what I meant. If you create a function which manipulates page cache, you better make it work with other functions manipulating page cache. Otherwise it's a landmine waiting to be tripped by some unsuspecting developer. Sure you can document all the conditions under which the function is safe to use but a function that has several paragraphs in front of it explaning when it is safe to use isn't very good API... Violent agreement, of course. To put it in concrete terms, each of the page fork support functions must be examined and determined sane. They are: * cow_replace_page_cache * cow_delete_from_page_cache * cow_clone_page * page_cow_one * page_cow_file Would it be useful to drill down into those, starting from the top of the list? It's a little more than determining that these 5 functions are sane, it's making sure that if someone mixes the use of these functions with other existing functions that the result is sane. but it's probably a good starting point to look at each of these five functions in detail and consider how they work and could interact badly with other things touching the page cache. David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [FYI] tux3: Core changes
On Wed, 20 May 2015, Daniel Phillips wrote: On 05/20/2015 07:44 AM, Jan Kara wrote: Yeah, that's what I meant. If you create a function which manipulates page cache, you better make it work with other functions manipulating page cache. Otherwise it's a landmine waiting to be tripped by some unsuspecting developer. Sure you can document all the conditions under which the function is safe to use but a function that has several paragraphs in front of it explaning when it is safe to use isn't very good API... Violent agreement, of course. To put it in concrete terms, each of the page fork support functions must be examined and determined sane. They are: * cow_replace_page_cache * cow_delete_from_page_cache * cow_clone_page * page_cow_one * page_cow_file Would it be useful to drill down into those, starting from the top of the list? It's a little more than determining that these 5 functions are sane, it's making sure that if someone mixes the use of these functions with other existing functions that the result is sane. but it's probably a good starting point to look at each of these five functions in detail and consider how they work and could interact badly with other things touching the page cache. David Lang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [FYI] tux3: Core changes
On Wed, 20 May 2015, Daniel Phillips wrote: On 05/20/2015 03:51 PM, Daniel Phillips wrote: On 05/20/2015 12:53 PM, Rik van Riel wrote: How does tux3 prevent a user of find_get_page() from reading from or writing into the pre-COW page, instead of the current page? Careful control of the dirty bits (we have two of them, one each for front and back). That is what pagefork_for_blockdirty is about. Ah, and of course it does not matter if a reader is on the pre-cow page. It would be reading the earlier copy, which might no longer be the current copy, but it raced with the write so nobody should be surprised. That is a race even without page fork. how do you prevent it from continuing to interact with the old version of the page and never see updates or have it's changes reflected on the current page? David Lang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [FYI] tux3: Core changes
On Tue, 19 May 2015, Daniel Phillips wrote: I understand that Tux3 may avoid these issues due to some other mechanisms it internally has but if page forking should get into mm subsystem, the above must work. It does work, and by example, it does not need a lot of code to make it work, but the changes are not trivial. Tux3's delta writeback model will not suit everyone, so you can't just lift our code and add it to Ext4. Using it in Ext4 would require a per-inode writeback model, which looks practical to me but far from a weekend project. Maybe something to consider for Ext5. It is the job of new designs like Tux3 to chase after that final drop of performance, not our trusty Ext4 workhorse. Though stranger things have happened - as I recall, Ext4 had O(n) directory operations at one time. Fixing that was not easy, but we did it because we had to. Fixing Ext4's write performance is not urgent by comparison, and the barrier is high, you would want jbd3 for one thing. I think the meta-question you are asking is, where is the second user for this new CoW functionality? With a possible implication that if there is no second user then Tux3 cannot be merged. Is that is the question? I don't think they are asking for a second user. What they are saying is that for this functionality to be accepted in the mm subsystem, these problem cases need to work reliably, not just work for Tux3 because of your implementation. So for things that you don't use, you need to make it an error if they get used on a page that's been forked (or not be an error and 'do the right thing') For cases where it doesn't matter because Tux3 controls the writeback, and it's undefined in general what happens if writeback is triggered twice on the same page, you will need to figure out how to either prevent the second writeback from triggering if there's one in process, or define how the two writebacks are going to happen so that you can't end up with them re-ordered by some other filesystem. I think that that's what's meant by the top statement that I left in the quote. Even if your implementation details make it safe, these need to be safe even without your implementation details to be acceptable in the core kernel. David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [FYI] tux3: Core changes
On Tue, 19 May 2015, Daniel Phillips wrote: I understand that Tux3 may avoid these issues due to some other mechanisms it internally has but if page forking should get into mm subsystem, the above must work. It does work, and by example, it does not need a lot of code to make it work, but the changes are not trivial. Tux3's delta writeback model will not suit everyone, so you can't just lift our code and add it to Ext4. Using it in Ext4 would require a per-inode writeback model, which looks practical to me but far from a weekend project. Maybe something to consider for Ext5. It is the job of new designs like Tux3 to chase after that final drop of performance, not our trusty Ext4 workhorse. Though stranger things have happened - as I recall, Ext4 had O(n) directory operations at one time. Fixing that was not easy, but we did it because we had to. Fixing Ext4's write performance is not urgent by comparison, and the barrier is high, you would want jbd3 for one thing. I think the meta-question you are asking is, where is the second user for this new CoW functionality? With a possible implication that if there is no second user then Tux3 cannot be merged. Is that is the question? I don't think they are asking for a second user. What they are saying is that for this functionality to be accepted in the mm subsystem, these problem cases need to work reliably, not just work for Tux3 because of your implementation. So for things that you don't use, you need to make it an error if they get used on a page that's been forked (or not be an error and 'do the right thing') For cases where it doesn't matter because Tux3 controls the writeback, and it's undefined in general what happens if writeback is triggered twice on the same page, you will need to figure out how to either prevent the second writeback from triggering if there's one in process, or define how the two writebacks are going to happen so that you can't end up with them re-ordered by some other filesystem. I think that that's what's meant by the top statement that I left in the quote. Even if your implementation details make it safe, these need to be safe even without your implementation details to be acceptable in the core kernel. David Lang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [FYI] tux3: Core changes
On Fri, 15 May 2015, Mel Gorman wrote: On Fri, May 15, 2015 at 02:54:48AM -0700, Daniel Phillips wrote: On 05/15/2015 01:09 AM, Mel Gorman wrote: On Thu, May 14, 2015 at 11:06:22PM -0400, Rik van Riel wrote: On 05/14/2015 08:06 PM, Daniel Phillips wrote: The issue is that things like ptrace, AIO, infiniband RDMA, and other direct memory access subsystems can take a reference to page A, which Tux3 clones into a new page B when the process writes it. However, while the process now points at page B, ptrace, AIO, infiniband, etc will still be pointing at page A. This causes the process and the other subsystem to each look at a different page, instead of at shared state, causing ptrace to do nothing, AIO and RDMA data to be invisible (or corrupted), etc... Is this a bit like page migration? Yes. Page migration will fail if there is an "extra" reference to the page that is not accounted for by the migration code. When I said it's not like page migration, I was referring to the fact that a COW on a pinned page for RDMA is a different problem to page migration. The COW of a pinned page can lead to lost writes or corruption depending on the ordering of events. I see the lost writes case, but not the corruption case, Data corruption can occur depending on the ordering of events and the applications expectations. If a process starts IO, RDMA pins the page for read and forks are combined with writes from another thread then when the IO completes the reads may not be visible. The application may take improper action at that point. if tux3 forks the page and writes the copy while the original page is being modified by other things, this means that some of the changes won't be in the version written (and this could catch partial writes with 'interesting' results if the forking happens at the wrong time) But if the original page gets re-marked as needing to be written out when it's changed by one of the other things that are accessing it, there shouldn't be any long-term corruption. As far as short-term corruption goes, any time you have a page mmapped it could get written out at any time, with only some of the application changes applied to it, so this sort of corruption could happen anyway couldn't it? Users of RDMA are typically expected to use MADV_DONTFORK to avoid this class of problem. You can choose to not define this as data corruption because thge kernel is not directly involved and that's your call. Do you mean corruption by changing a page already in writeout? If so, don't all filesystems have that problem? No, the problem is different. Backing devices requiring stable pages will block the write until the IO is complete. For those that do not require stable pages it's ok to allow the write as long as the page is dirtied so that it'll be written out again and no data is lost. so if tux3 is prevented from forking the page in cases where the write would be blocked, and will get forked again for follow-up writes if it's modified again otherwise, won't this be the same thing? David Lang If RDMA to a mmapped file races with write(2) to the same file, maybe it is reasonable and expected to lose some data. In the RDMA case, there is at least application awareness to work around the problems. Normally it's ok to have both mapped and write() access to data although userspace might need a lock to co-ordinate updates and event ordering. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [FYI] tux3: Core changes
On Fri, 15 May 2015, Mel Gorman wrote: On Fri, May 15, 2015 at 02:54:48AM -0700, Daniel Phillips wrote: On 05/15/2015 01:09 AM, Mel Gorman wrote: On Thu, May 14, 2015 at 11:06:22PM -0400, Rik van Riel wrote: On 05/14/2015 08:06 PM, Daniel Phillips wrote: The issue is that things like ptrace, AIO, infiniband RDMA, and other direct memory access subsystems can take a reference to page A, which Tux3 clones into a new page B when the process writes it. However, while the process now points at page B, ptrace, AIO, infiniband, etc will still be pointing at page A. This causes the process and the other subsystem to each look at a different page, instead of at shared state, causing ptrace to do nothing, AIO and RDMA data to be invisible (or corrupted), etc... Is this a bit like page migration? Yes. Page migration will fail if there is an extra reference to the page that is not accounted for by the migration code. When I said it's not like page migration, I was referring to the fact that a COW on a pinned page for RDMA is a different problem to page migration. The COW of a pinned page can lead to lost writes or corruption depending on the ordering of events. I see the lost writes case, but not the corruption case, Data corruption can occur depending on the ordering of events and the applications expectations. If a process starts IO, RDMA pins the page for read and forks are combined with writes from another thread then when the IO completes the reads may not be visible. The application may take improper action at that point. if tux3 forks the page and writes the copy while the original page is being modified by other things, this means that some of the changes won't be in the version written (and this could catch partial writes with 'interesting' results if the forking happens at the wrong time) But if the original page gets re-marked as needing to be written out when it's changed by one of the other things that are accessing it, there shouldn't be any long-term corruption. As far as short-term corruption goes, any time you have a page mmapped it could get written out at any time, with only some of the application changes applied to it, so this sort of corruption could happen anyway couldn't it? Users of RDMA are typically expected to use MADV_DONTFORK to avoid this class of problem. You can choose to not define this as data corruption because thge kernel is not directly involved and that's your call. Do you mean corruption by changing a page already in writeout? If so, don't all filesystems have that problem? No, the problem is different. Backing devices requiring stable pages will block the write until the IO is complete. For those that do not require stable pages it's ok to allow the write as long as the page is dirtied so that it'll be written out again and no data is lost. so if tux3 is prevented from forking the page in cases where the write would be blocked, and will get forked again for follow-up writes if it's modified again otherwise, won't this be the same thing? David Lang If RDMA to a mmapped file races with write(2) to the same file, maybe it is reasonable and expected to lose some data. In the RDMA case, there is at least application awareness to work around the problems. Normally it's ok to have both mapped and write() access to data although userspace might need a lock to co-ordinate updates and event ordering. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On Tue, 12 May 2015, Daniel Phillips wrote: On 05/12/2015 02:30 PM, David Lang wrote: On Tue, 12 May 2015, Daniel Phillips wrote: Phoronix published a headline that identifies Dave Chinner as someone who takes shots at other projects. Seems pretty much on the money to me, and it ought to be obvious why he does it. Phoronix turns any correction or criticism into an attack. Phoronix gets attacked in an unseemly way by a number of people in the developer community who should behave better. You are doing it yourself, seemingly oblivious to the valuable role that the publication plays in our community. Google for filesystem benchmarks. Where do you find them? Right. Not to mention the Xorg coverage, community issues, etc etc. The last thing we need is a monoculture in Linux news, and we are dangerously close to that now. It's on my 'sites to check daily' list, but they have also had some pretty nasty errors in their benchmarks, some of which have been pointed out repeatedly over the years (doing fsync dependent workloads in situations where one FS actually honors the fsyncs and another doesn't is a classic) So, how is "EXT4 is not as stable or as well tested as most people think" not a cheap shot? By my first hand experience, that claim is absurd. Add to that the first hand experience of roughly two billion other people. Seems to be a bit self serving too, or was that just an accident. I happen to think that it's correct. It's not that Ext4 isn't tested, but that people's expectations of how much it's been tested, and at what scale don't match the reality. You need to get out of the mindset that Ted and Dave are Enemies that you need to overcome, they are friendly competitors, not Enemies. You are wrong about Dave These are not the words of any friend: "I don't think I'm alone in my suspicion that there was something stinky about your numbers." -- Dave Chinner you are looking for offense. That just means that something is wrong with them, not that they were deliberatly falsified. Basically allegations of cheating. And wrong. Maybe Dave just lives in his own dreamworld where everybody is out to get him, so he has to attack people he views as competitors first. you are the one doing the attacking. Please stop. Take a break if needed, and then get back to producing software rather than complaining about how everyone is out to get you. David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On Tue, 12 May 2015, Daniel Phillips wrote: On 05/12/2015 11:39 AM, David Lang wrote: On Mon, 11 May 2015, Daniel Phillips wrote: ...it's the mm and core kernel developers that need to review and accept that code *before* we can consider merging tux3. Please do not say "we" when you know that I am just as much a "we" as you are. Merging Tux3 is not your decision. The people whose decision it actually is are perfectly capable of recognizing your agenda for what it is. http://www.phoronix.com/scan.php?page=news_item=MTA0NzM "XFS Developer Takes Shots At Btrfs, EXT4" umm, Phoronix has no input on what gets merged into the kernel. they also hae a reputation for trying to turn anything into click-bait by making it sound like a fight when it isn't. Perhaps you misunderstood. Linus decides what gets merged. Andrew decides. Greg decides. Dave Chinner does not decide, he just does his level best to create the impression that our project is unfit to merge. Any chance there might be an agenda? Phoronix published a headline that identifies Dave Chinner as someone who takes shots at other projects. Seems pretty much on the money to me, and it ought to be obvious why he does it. Phoronix turns any correction or criticism into an attack. You need to get out of the mindset that Ted and Dave are Enemies that you need to overcome, they are friendly competitors, not Enemies. They assume that you are working in good faith (but are inexperienced compared to them), and you need to assume that they are working in good faith. If they ever do resort to underhanded means to sabotage you, Linus and the other kernel developers will take action. But pointing out limits in your current implementation, problems in your benchmarks based on how they are run, and concepts that are going to be difficult to merge is not underhanded, it's exactly the type of assistance that you should be greatful for in friendly competition. You were the one who started crowing about how badly XFS performed. Dave gave a long and detailed explination about the reasons for the differences, and showing benchmarks on other hardware that showed that XFS works very well there. That's not an attack on EXT4 (or Tux3), it's an explination. The real question is, has the Linux development process become so political and toxic that worthwhile projects fail to benefit from supposed grassroots community support. You are the poster child for that. The linux development process is making code available, responding to concerns from the experts in the community, and letting the code talk for itself. Nice idea, but it isn't working. Did you let the code talk to you? Right, you let the code talk to Dave Chinner, then you listen to what Dave Chinner has to say about it. Any chance that there might be some creative licence acting somewhere in that chain? I have my own concerns about how things are going to work (I've voiced some of them), but no, I haven't tried running Tux3 because you say it's not ready yet. There have been many people pushing code for inclusion that has not gotten into the kernel, or has not been used by any distros after it's made it into the kernel, in spite of benchmarks being posted that seem to show how wonderful the new code is. ReiserFS was one of the first, and part of what tarnished it's reputation with many people was how much they were pushing the benchmarks that were shown to be faulty (the one I remember most vividly was that the entire benchmark completed in <30 seconds, and they had the FS tuned to not start flushing data to disk for 30 seconds, so the entire 'benchmark' ran out of ram without ever touching the disk) You know what to do about checking for faulty benchmarks. That requires that the code be readily available, which last I heard, Tux3 wasn't. Has this been fixed? So when Ted and Dave point out problems with the benchmark (the difference in behavior between a single spinning disk, different partitions on the same disk, SSDs, and ramdisks), you would be better off acknowledging them and if you can't adjust and re-run the benchmarks, don't start attacking them as a result. Ted and Dave failed to point out any actual problem with any benchmark. They invented issues with benchmarks and promoted those as FUD. They pointed out problems with using ramdisk to simulate a SSD and huge differences between spinning rust and an SSD (or disk array). Those aren't FUD. As Dave says above, it's not the other filesystem people you have to convince, it's the core VFS and Memory Mangement folks you have to convince. You may need a little benchmarking to show that there is a real advantage to be gained, but the real discussion is going to be on the impact that page forking is going to have on everything else (both in complexity and in performance impact to other things) Yet he clearly wrote "we" as if he believes he is part of it.
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On Mon, 11 May 2015, Daniel Phillips wrote: On Monday, May 11, 2015 10:38:42 PM PDT, Dave Chinner wrote: I think Ted and I are on the same page here. "Competitive benchmarks" only matter to the people who are trying to sell something. You're trying to sell Tux3, but By "same page", do you mean "transparently obvious about obstructing other projects"? The "except page forking design" statement is your biggest hurdle for getting tux3 merged, not performance. No, the "except page forking design" is because the design is already good and effective. The small adjustments needed in core are well worth merging because the benefits are proved by benchmarks. So benchmarks are key and will not stop just because you don't like the attention they bring to XFS issues. Without page forking, tux3 cannot be merged at all. But it's not filesystem developers you need to convince about the merits of the page forking design and implementation - it's the mm and core kernel developers that need to review and accept that code *before* we can consider merging tux3. Please do not say "we" when you know that I am just as much a "we" as you are. Merging Tux3 is not your decision. The people whose decision it actually is are perfectly capable of recognizing your agenda for what it is. http://www.phoronix.com/scan.php?page=news_item=MTA0NzM "XFS Developer Takes Shots At Btrfs, EXT4" umm, Phoronix has no input on what gets merged into the kernel. they also hae a reputation for trying to turn anything into click-bait by making it sound like a fight when it isn't. The real question is, has the Linux development process become so political and toxic that worthwhile projects fail to benefit from supposed grassroots community support. You are the poster child for that. The linux development process is making code available, responding to concerns from the experts in the community, and letting the code talk for itself. There have been many people pushing code for inclusion that has not gotten into the kernel, or has not been used by any distros after it's made it into the kernel, in spite of benchmarks being posted that seem to show how wonderful the new code is. ReiserFS was one of the first, and part of what tarnished it's reputation with many people was how much they were pushing the benchmarks that were shown to be faulty (the one I remember most vividly was that the entire benchmark completed in <30 seconds, and they had the FS tuned to not start flushing data to disk for 30 seconds, so the entire 'benchmark' ran out of ram without ever touching the disk) So when Ted and Dave point out problems with the benchmark (the difference in behavior between a single spinning disk, different partitions on the same disk, SSDs, and ramdisks), you would be better off acknowledging them and if you can't adjust and re-run the benchmarks, don't start attacking them as a result. As Dave says above, it's not the other filesystem people you have to convince, it's the core VFS and Memory Mangement folks you have to convince. You may need a little benchmarking to show that there is a real advantage to be gained, but the real discussion is going to be on the impact that page forking is going to have on everything else (both in complexity and in performance impact to other things) IOWs, you need to focus on the important things needed to acheive your stated goal of getting tux3 merged. New filesystems should be faster than those based on 20-25 year old designs, so you don't need to waste time trying to convince people that tux3, when complete, will be fast. You know that Tux3 is already fast. Not just that of course. It has a higher standard of data integrity than your metadata-only journalling filesystem and a small enough code base that it can be reasonably expected to reach the quality expected of an enterprise class filesystem, quite possibly before XFS gets there. We wouldn't expect anyone developing a new filesystem to believe any differently. If they didn't believe this, why would they be working on the filesystem instead of just using an existing filesystem. The ugly reality is that everyone's early versions of their new filesystem looks really good. The problem is when they extend it to cover the corner cases and when it gets stressed by real-world (as opposed to benchmark) workloads. This isn't saying that you are wrong in your belief, just that you may not be right, and nobody will know until you are to a usable state and other people can start beating on it. David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On Mon, 11 May 2015, Daniel Phillips wrote: On Monday, May 11, 2015 10:38:42 PM PDT, Dave Chinner wrote: I think Ted and I are on the same page here. Competitive benchmarks only matter to the people who are trying to sell something. You're trying to sell Tux3, but By same page, do you mean transparently obvious about obstructing other projects? The except page forking design statement is your biggest hurdle for getting tux3 merged, not performance. No, the except page forking design is because the design is already good and effective. The small adjustments needed in core are well worth merging because the benefits are proved by benchmarks. So benchmarks are key and will not stop just because you don't like the attention they bring to XFS issues. Without page forking, tux3 cannot be merged at all. But it's not filesystem developers you need to convince about the merits of the page forking design and implementation - it's the mm and core kernel developers that need to review and accept that code *before* we can consider merging tux3. Please do not say we when you know that I am just as much a we as you are. Merging Tux3 is not your decision. The people whose decision it actually is are perfectly capable of recognizing your agenda for what it is. http://www.phoronix.com/scan.php?page=news_itempx=MTA0NzM XFS Developer Takes Shots At Btrfs, EXT4 umm, Phoronix has no input on what gets merged into the kernel. they also hae a reputation for trying to turn anything into click-bait by making it sound like a fight when it isn't. The real question is, has the Linux development process become so political and toxic that worthwhile projects fail to benefit from supposed grassroots community support. You are the poster child for that. The linux development process is making code available, responding to concerns from the experts in the community, and letting the code talk for itself. There have been many people pushing code for inclusion that has not gotten into the kernel, or has not been used by any distros after it's made it into the kernel, in spite of benchmarks being posted that seem to show how wonderful the new code is. ReiserFS was one of the first, and part of what tarnished it's reputation with many people was how much they were pushing the benchmarks that were shown to be faulty (the one I remember most vividly was that the entire benchmark completed in 30 seconds, and they had the FS tuned to not start flushing data to disk for 30 seconds, so the entire 'benchmark' ran out of ram without ever touching the disk) So when Ted and Dave point out problems with the benchmark (the difference in behavior between a single spinning disk, different partitions on the same disk, SSDs, and ramdisks), you would be better off acknowledging them and if you can't adjust and re-run the benchmarks, don't start attacking them as a result. As Dave says above, it's not the other filesystem people you have to convince, it's the core VFS and Memory Mangement folks you have to convince. You may need a little benchmarking to show that there is a real advantage to be gained, but the real discussion is going to be on the impact that page forking is going to have on everything else (both in complexity and in performance impact to other things) IOWs, you need to focus on the important things needed to acheive your stated goal of getting tux3 merged. New filesystems should be faster than those based on 20-25 year old designs, so you don't need to waste time trying to convince people that tux3, when complete, will be fast. You know that Tux3 is already fast. Not just that of course. It has a higher standard of data integrity than your metadata-only journalling filesystem and a small enough code base that it can be reasonably expected to reach the quality expected of an enterprise class filesystem, quite possibly before XFS gets there. We wouldn't expect anyone developing a new filesystem to believe any differently. If they didn't believe this, why would they be working on the filesystem instead of just using an existing filesystem. The ugly reality is that everyone's early versions of their new filesystem looks really good. The problem is when they extend it to cover the corner cases and when it gets stressed by real-world (as opposed to benchmark) workloads. This isn't saying that you are wrong in your belief, just that you may not be right, and nobody will know until you are to a usable state and other people can start beating on it. David Lang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On Tue, 12 May 2015, Daniel Phillips wrote: On 05/12/2015 11:39 AM, David Lang wrote: On Mon, 11 May 2015, Daniel Phillips wrote: ...it's the mm and core kernel developers that need to review and accept that code *before* we can consider merging tux3. Please do not say we when you know that I am just as much a we as you are. Merging Tux3 is not your decision. The people whose decision it actually is are perfectly capable of recognizing your agenda for what it is. http://www.phoronix.com/scan.php?page=news_itempx=MTA0NzM XFS Developer Takes Shots At Btrfs, EXT4 umm, Phoronix has no input on what gets merged into the kernel. they also hae a reputation for trying to turn anything into click-bait by making it sound like a fight when it isn't. Perhaps you misunderstood. Linus decides what gets merged. Andrew decides. Greg decides. Dave Chinner does not decide, he just does his level best to create the impression that our project is unfit to merge. Any chance there might be an agenda? Phoronix published a headline that identifies Dave Chinner as someone who takes shots at other projects. Seems pretty much on the money to me, and it ought to be obvious why he does it. Phoronix turns any correction or criticism into an attack. You need to get out of the mindset that Ted and Dave are Enemies that you need to overcome, they are friendly competitors, not Enemies. They assume that you are working in good faith (but are inexperienced compared to them), and you need to assume that they are working in good faith. If they ever do resort to underhanded means to sabotage you, Linus and the other kernel developers will take action. But pointing out limits in your current implementation, problems in your benchmarks based on how they are run, and concepts that are going to be difficult to merge is not underhanded, it's exactly the type of assistance that you should be greatful for in friendly competition. You were the one who started crowing about how badly XFS performed. Dave gave a long and detailed explination about the reasons for the differences, and showing benchmarks on other hardware that showed that XFS works very well there. That's not an attack on EXT4 (or Tux3), it's an explination. The real question is, has the Linux development process become so political and toxic that worthwhile projects fail to benefit from supposed grassroots community support. You are the poster child for that. The linux development process is making code available, responding to concerns from the experts in the community, and letting the code talk for itself. Nice idea, but it isn't working. Did you let the code talk to you? Right, you let the code talk to Dave Chinner, then you listen to what Dave Chinner has to say about it. Any chance that there might be some creative licence acting somewhere in that chain? I have my own concerns about how things are going to work (I've voiced some of them), but no, I haven't tried running Tux3 because you say it's not ready yet. There have been many people pushing code for inclusion that has not gotten into the kernel, or has not been used by any distros after it's made it into the kernel, in spite of benchmarks being posted that seem to show how wonderful the new code is. ReiserFS was one of the first, and part of what tarnished it's reputation with many people was how much they were pushing the benchmarks that were shown to be faulty (the one I remember most vividly was that the entire benchmark completed in 30 seconds, and they had the FS tuned to not start flushing data to disk for 30 seconds, so the entire 'benchmark' ran out of ram without ever touching the disk) You know what to do about checking for faulty benchmarks. That requires that the code be readily available, which last I heard, Tux3 wasn't. Has this been fixed? So when Ted and Dave point out problems with the benchmark (the difference in behavior between a single spinning disk, different partitions on the same disk, SSDs, and ramdisks), you would be better off acknowledging them and if you can't adjust and re-run the benchmarks, don't start attacking them as a result. Ted and Dave failed to point out any actual problem with any benchmark. They invented issues with benchmarks and promoted those as FUD. They pointed out problems with using ramdisk to simulate a SSD and huge differences between spinning rust and an SSD (or disk array). Those aren't FUD. As Dave says above, it's not the other filesystem people you have to convince, it's the core VFS and Memory Mangement folks you have to convince. You may need a little benchmarking to show that there is a real advantage to be gained, but the real discussion is going to be on the impact that page forking is going to have on everything else (both in complexity and in performance impact to other things) Yet he clearly wrote we as if he believes he is part of it. He is part of the group of people who use and work
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On Tue, 12 May 2015, Daniel Phillips wrote: On 05/12/2015 02:30 PM, David Lang wrote: On Tue, 12 May 2015, Daniel Phillips wrote: Phoronix published a headline that identifies Dave Chinner as someone who takes shots at other projects. Seems pretty much on the money to me, and it ought to be obvious why he does it. Phoronix turns any correction or criticism into an attack. Phoronix gets attacked in an unseemly way by a number of people in the developer community who should behave better. You are doing it yourself, seemingly oblivious to the valuable role that the publication plays in our community. Google for filesystem benchmarks. Where do you find them? Right. Not to mention the Xorg coverage, community issues, etc etc. The last thing we need is a monoculture in Linux news, and we are dangerously close to that now. It's on my 'sites to check daily' list, but they have also had some pretty nasty errors in their benchmarks, some of which have been pointed out repeatedly over the years (doing fsync dependent workloads in situations where one FS actually honors the fsyncs and another doesn't is a classic) So, how is EXT4 is not as stable or as well tested as most people think not a cheap shot? By my first hand experience, that claim is absurd. Add to that the first hand experience of roughly two billion other people. Seems to be a bit self serving too, or was that just an accident. I happen to think that it's correct. It's not that Ext4 isn't tested, but that people's expectations of how much it's been tested, and at what scale don't match the reality. You need to get out of the mindset that Ted and Dave are Enemies that you need to overcome, they are friendly competitors, not Enemies. You are wrong about Dave These are not the words of any friend: I don't think I'm alone in my suspicion that there was something stinky about your numbers. -- Dave Chinner you are looking for offense. That just means that something is wrong with them, not that they were deliberatly falsified. Basically allegations of cheating. And wrong. Maybe Dave just lives in his own dreamworld where everybody is out to get him, so he has to attack people he views as competitors first. you are the one doing the attacking. Please stop. Take a break if needed, and then get back to producing software rather than complaining about how everyone is out to get you. David Lang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On Mon, 11 May 2015, Daniel Phillips wrote: On 05/11/2015 03:12 PM, Pavel Machek wrote: It is a fact of life that when you change one aspect of an intimately interconnected system, something else will change as well. You have naive/nonexistent free space management now; when you design something workable there it is going to impact everything else you've already done. It's an easy bet that the impact will be negative, the only question is to what degree. You might lose that bet. For example, suppose we do strictly linear allocation each delta, and just leave nice big gaps between the deltas for future expansion. Clearly, we run at similar or identical speed to the current naive strategy until we must start filling in the gaps, and at that point our layout is not any worse than XFS, which started bad and stayed that way. Umm, are you sure. If "some areas of disk are faster than others" is still true on todays harddrives, the gaps will decrease the performance (as you'll "use up" the fast areas more quickly). That's why I hedged my claim with "similar or identical". The difference in media speed seems to be a relatively small effect compared to extra seeks. It seems that XFS puts big spaces between new directories, and suffers a lot of extra seeks because of it. I propose to batch new directories together initially, then change the allocation goal to a new, relatively empty area if a big batch of files lands on a directory in a crowded region. The "big" gaps would be on the order of delta size, so not really very big. This is an interesting idea, but what happens if the files don't arrive as a big batch, but rather trickle in over time (think a logserver that if putting files into a bunch of directories at a fairly modest rate per directory) And when you then decide that you have to move the directory/file info, doesn't that create a potentially large amount of unexpected IO that could end up interfering with what the user is trying to do? David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On Mon, 11 May 2015, Daniel Phillips wrote: On 05/11/2015 03:12 PM, Pavel Machek wrote: It is a fact of life that when you change one aspect of an intimately interconnected system, something else will change as well. You have naive/nonexistent free space management now; when you design something workable there it is going to impact everything else you've already done. It's an easy bet that the impact will be negative, the only question is to what degree. You might lose that bet. For example, suppose we do strictly linear allocation each delta, and just leave nice big gaps between the deltas for future expansion. Clearly, we run at similar or identical speed to the current naive strategy until we must start filling in the gaps, and at that point our layout is not any worse than XFS, which started bad and stayed that way. Umm, are you sure. If some areas of disk are faster than others is still true on todays harddrives, the gaps will decrease the performance (as you'll use up the fast areas more quickly). That's why I hedged my claim with similar or identical. The difference in media speed seems to be a relatively small effect compared to extra seeks. It seems that XFS puts big spaces between new directories, and suffers a lot of extra seeks because of it. I propose to batch new directories together initially, then change the allocation goal to a new, relatively empty area if a big batch of files lands on a directory in a crowded region. The big gaps would be on the order of delta size, so not really very big. This is an interesting idea, but what happens if the files don't arrive as a big batch, but rather trickle in over time (think a logserver that if putting files into a bunch of directories at a fairly modest rate per directory) And when you then decide that you have to move the directory/file info, doesn't that create a potentially large amount of unexpected IO that could end up interfering with what the user is trying to do? David Lang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: how to have the kernel do udev's job and autoload the right modules ?
On Thu, 7 May 2015, Austin S Hemmelgarn wrote: On 2015-05-06 16:49, David Lang wrote: On Wed, 6 May 2015, linuxcbon linuxcbon wrote: On Wed, May 6, 2015 at 7:53 PM, David Lang wrote: It's perfectly legitimate to not want to use udev, but that doesn't mean that the kernel will (or should) do it for you. David Lang When I boot the kernel without modules, I don't have anything working except "minimal video". I think the kernel should give a minimal support for network, sound and video, even if 0 modules are loaded. I am just dreaming, You can do that, you just need to build in all the network and sound drivers (and pick which driver in the case of conflicts) There isn't such a thing as a 'generic' network or sound card. For video there is 'VGA video' which is used by default on x86 systems, but even that's a driver that could be disabled. To explain further, video has a standardized hardware level API (VGA and VBE) because it is considered critical system functionality (which is BS in my opinion, you can get by just fine with a serial console, but that's irrelevant to this discussion). Sound is traditionally not considered critical, and therefore doesn't have a standardized hardware API. Networking is (traditionally) only considered critical if the system is booting off the network, and therefore only has a standardized API (part of the PXE spec, known as UNDI) on some systems, and even then only when they are configured to netboot (and IIRC, also only when the processor is in real mode, just like for all other BIOS calls). I don't think that it has anything to do with critical system functionality, but rather just the legacy history of the PC clones. At one point VGA was the standard, and at that point the different video card manufacturers got into the game, but since they all had to boot the system, and the BIOS only knew how to talk to a VGA card, all the enhanced cards had to implement VGA so that DOS and the BIOS could function. That legacy has continued on the PC clone systems to today. Non PC clones didn't have such a standard, and they don't implement VGA on their video cards (unless it's a card ported from a PC) Network cards were never standardized, and were optional add-ons. They also weren't needed for the system to boot, so there was never any standard for newcomers to implement. David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: how to have the kernel do udev's job and autoload the right modules ?
On Thu, 7 May 2015, Austin S Hemmelgarn wrote: On 2015-05-06 16:49, David Lang wrote: On Wed, 6 May 2015, linuxcbon linuxcbon wrote: On Wed, May 6, 2015 at 7:53 PM, David Lang da...@lang.hm wrote: It's perfectly legitimate to not want to use udev, but that doesn't mean that the kernel will (or should) do it for you. David Lang When I boot the kernel without modules, I don't have anything working except minimal video. I think the kernel should give a minimal support for network, sound and video, even if 0 modules are loaded. I am just dreaming, You can do that, you just need to build in all the network and sound drivers (and pick which driver in the case of conflicts) There isn't such a thing as a 'generic' network or sound card. For video there is 'VGA video' which is used by default on x86 systems, but even that's a driver that could be disabled. To explain further, video has a standardized hardware level API (VGA and VBE) because it is considered critical system functionality (which is BS in my opinion, you can get by just fine with a serial console, but that's irrelevant to this discussion). Sound is traditionally not considered critical, and therefore doesn't have a standardized hardware API. Networking is (traditionally) only considered critical if the system is booting off the network, and therefore only has a standardized API (part of the PXE spec, known as UNDI) on some systems, and even then only when they are configured to netboot (and IIRC, also only when the processor is in real mode, just like for all other BIOS calls). I don't think that it has anything to do with critical system functionality, but rather just the legacy history of the PC clones. At one point VGA was the standard, and at that point the different video card manufacturers got into the game, but since they all had to boot the system, and the BIOS only knew how to talk to a VGA card, all the enhanced cards had to implement VGA so that DOS and the BIOS could function. That legacy has continued on the PC clone systems to today. Non PC clones didn't have such a standard, and they don't implement VGA on their video cards (unless it's a card ported from a PC) Network cards were never standardized, and were optional add-ons. They also weren't needed for the system to boot, so there was never any standard for newcomers to implement. David Lang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: how to have the kernel do udev's job and autoload the right modules ?
On Wed, 6 May 2015, linuxcbon linuxcbon wrote: On Wed, May 6, 2015 at 7:53 PM, David Lang wrote: It's perfectly legitimate to not want to use udev, but that doesn't mean that the kernel will (or should) do it for you. David Lang When I boot the kernel without modules, I don't have anything working except "minimal video". I think the kernel should give a minimal support for network, sound and video, even if 0 modules are loaded. I am just dreaming, You can do that, you just need to build in all the network and sound drivers (and pick which driver in the case of conflicts) There isn't such a thing as a 'generic' network or sound card. For video there is 'VGA video' which is used by default on x86 systems, but even that's a driver that could be disabled. David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: how to have the kernel do udev's job and autoload the right modules ?
On Wed, 6 May 2015, linuxcbon linuxcbon wrote: On Wed, May 6, 2015 at 5:55 PM, Ken Moffat wrote: I suggest that you take the time to look at eudev and mdev, and think about how you can use the facilities they offer. I was wishing the kernel would offer some minimal support for network, sound and full screen video for my hw :(. But it seems I need to load modules to achieve this. And to load modules, it needs some kind of "hotplug" called udev or mdev. I've been building my own kernels for production systems for a long time. It is absolutly possible to have a kernel provide support for your hardware without modules. The problem is the question of how much hardware you want to support. Modules were created because compiling everything into the kernel at once has multiple problems 1. sometimes different drivers can handle the same hardware, and you can only use one driver for the hardware 2. sometimes different hardware conflicts in that drivers for one piece of hardware will think that they've found their hardware, and prevent the proper drivers from working (sometimes doing 'strange' things to the hardware in the process) 3. the resulting kernel is VERY large. Back in the day, the problem was that the kernel would no longer fit on a floppy. We don't have that limit, but we still don't want to waste time reading a huge amount of data into RAM (at which point it prevents the RAM from being used for other things) 4. boot time would be horrible as all the drivers try to detect their hardware and time out. so if you want to cover your hardware, you have two choices. 1. If you have a relatively small variation of hardware, just compile in all the drivers you need. This even works for most hotplugged items. 2. use modules If you use modules, then you need to have some way of loading them. It's a very bad idea to have this happen by magic, without any control over the policies (sometimes you don't want drivers to load just because hardware exists). So you need to have a place to set the policy. Since the kernel provides mechanisms, not policy, the result is that the kernel tells userspace what it thinks it's found and it's up to userspace to then 'do the right thing' So if you don't want to use udev, then you need to have something that replaces it to load the right module with the right options. It's perfectly legitimate to not want to use udev, but that doesn't mean that the kernel will (or should) do it for you. David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: how to have the kernel do udev's job and autoload the right modules ?
On Wed, 6 May 2015, linuxcbon linuxcbon wrote: On Wed, May 6, 2015 at 7:53 PM, David Lang da...@lang.hm wrote: It's perfectly legitimate to not want to use udev, but that doesn't mean that the kernel will (or should) do it for you. David Lang When I boot the kernel without modules, I don't have anything working except minimal video. I think the kernel should give a minimal support for network, sound and video, even if 0 modules are loaded. I am just dreaming, You can do that, you just need to build in all the network and sound drivers (and pick which driver in the case of conflicts) There isn't such a thing as a 'generic' network or sound card. For video there is 'VGA video' which is used by default on x86 systems, but even that's a driver that could be disabled. David Lang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: how to have the kernel do udev's job and autoload the right modules ?
On Wed, 6 May 2015, linuxcbon linuxcbon wrote: On Wed, May 6, 2015 at 5:55 PM, Ken Moffat zarniwh...@ntlworld.com wrote: I suggest that you take the time to look at eudev and mdev, and think about how you can use the facilities they offer. I was wishing the kernel would offer some minimal support for network, sound and full screen video for my hw :(. But it seems I need to load modules to achieve this. And to load modules, it needs some kind of hotplug called udev or mdev. I've been building my own kernels for production systems for a long time. It is absolutly possible to have a kernel provide support for your hardware without modules. The problem is the question of how much hardware you want to support. Modules were created because compiling everything into the kernel at once has multiple problems 1. sometimes different drivers can handle the same hardware, and you can only use one driver for the hardware 2. sometimes different hardware conflicts in that drivers for one piece of hardware will think that they've found their hardware, and prevent the proper drivers from working (sometimes doing 'strange' things to the hardware in the process) 3. the resulting kernel is VERY large. Back in the day, the problem was that the kernel would no longer fit on a floppy. We don't have that limit, but we still don't want to waste time reading a huge amount of data into RAM (at which point it prevents the RAM from being used for other things) 4. boot time would be horrible as all the drivers try to detect their hardware and time out. so if you want to cover your hardware, you have two choices. 1. If you have a relatively small variation of hardware, just compile in all the drivers you need. This even works for most hotplugged items. 2. use modules If you use modules, then you need to have some way of loading them. It's a very bad idea to have this happen by magic, without any control over the policies (sometimes you don't want drivers to load just because hardware exists). So you need to have a place to set the policy. Since the kernel provides mechanisms, not policy, the result is that the kernel tells userspace what it thinks it's found and it's up to userspace to then 'do the right thing' So if you don't want to use udev, then you need to have something that replaces it to load the right module with the right options. It's perfectly legitimate to not want to use udev, but that doesn't mean that the kernel will (or should) do it for you. David Lang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A desktop environment[1] kernel wishlist
On Wed, 6 May 2015, Rafael J. Wysocki wrote: You are, of course, correct. Ultimately the only requirement we have is that there exists a way for userspace to determine if the system woke up because of a user-triggered event. The actual mechanism by which this determination is made isn't something I feel strongly about. The reason I had been focusing on exposing the actual wakeup event to userspace is because classifying wakeup events as user-triggered or not feels to me like a policy decision that should be left to userspace. If the kernel maintainers are ok with doing this work in the kernel instead and only exposing a binary yes/no bit to userspace for user-triggered wakeups, that's perfectly fine because it still meets our requirements. Well, please see the message I've just sent. All wakeup devices have a wakeup source object associated with them. In principle, we can expose a "priority" attribute from that for user space to set as it wants to. There may be two values of it, like "normal" and "high" for example. Then, what only remains is to introduce separate wakeup counts for the "high" priority and "normal" priority wakeup sources and teach the power manager to use them. That leaves no policy in the kernel, but it actually has a chance to work. how about instead of setting two states and defining that one must be a subset of the other you instead have the existing feed of events and then allow software that cares to define additional feeds that take the current feed and filter it. We allow bpf filters in the kernel, so use those to filter what events the additional feed is going to receive. remember that the interesting numbers in CS are 0, 1, and many, not 2 :-) don't limit things to two feeds with one always being a subset of the other, create a mechanism to allow an arbitrary number of feeds that can be filtered in different ways David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A desktop environment[1] kernel wishlist
On Wed, 6 May 2015, Rafael J. Wysocki wrote: You are, of course, correct. Ultimately the only requirement we have is that there exists a way for userspace to determine if the system woke up because of a user-triggered event. The actual mechanism by which this determination is made isn't something I feel strongly about. The reason I had been focusing on exposing the actual wakeup event to userspace is because classifying wakeup events as user-triggered or not feels to me like a policy decision that should be left to userspace. If the kernel maintainers are ok with doing this work in the kernel instead and only exposing a binary yes/no bit to userspace for user-triggered wakeups, that's perfectly fine because it still meets our requirements. Well, please see the message I've just sent. All wakeup devices have a wakeup source object associated with them. In principle, we can expose a priority attribute from that for user space to set as it wants to. There may be two values of it, like normal and high for example. Then, what only remains is to introduce separate wakeup counts for the high priority and normal priority wakeup sources and teach the power manager to use them. That leaves no policy in the kernel, but it actually has a chance to work. how about instead of setting two states and defining that one must be a subset of the other you instead have the existing feed of events and then allow software that cares to define additional feeds that take the current feed and filter it. We allow bpf filters in the kernel, so use those to filter what events the additional feed is going to receive. remember that the interesting numbers in CS are 0, 1, and many, not 2 :-) don't limit things to two feeds with one always being a subset of the other, create a mechanism to allow an arbitrary number of feeds that can be filtered in different ways David Lang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Tux3 Report: How fast can we fsync?
r directly or donated) David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Tux3 Report: How fast can we fsync?
or donated) David Lang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] kdbus for 4.1-rc1
On Thu, 30 Apr 2015, Dave Airlie wrote: On 30 April 2015 at 10:05, David Lang wrote: On Wed, 29 Apr 2015, Theodore Ts'o wrote: On Wed, Apr 29, 2015 at 12:26:59PM -0400, John Stoffel wrote: If your customers wnat this feature, you're more than welcome to fork the kernel and support it yourself. Oh wait... Redhat does that already. So what's the problem? Just put it into RHEL (which I use I admit, along with Debian/Mint) and be done with it. Harald, If you make the RHEL initramfs harder to debug in the field, I will await the time when some Red Hat field engineers will need to do the same sort of thing I have had to do in the field, and be amused when they want to shake you very warmly by the throat. :-) Seriously, keep things as simple as possible in the initramfs; don't use complicated bus protocols; that way lies madness. Enterprise systems aren't constantly booting (or they shouldn't be, if your kernels are sufficiently reliable :-), so trying to optimize for an extra 2 or 3 seconds worth of boot time really, REALLY isn't worth it. I've had Enterprise systems where I could hit power on two boxes, and finish the OS install on one before the other has even finished POST and look for the boot media. I did this 5 years ago, before the "let's speed up boot" push started. Admittedly, this wasn't a stock distro boot/install, it was my own optimized one, but it also wasn't as optimized and automated as it could have been (several points where the installer needed to pick items from a menu and enter values) You guys might have missed this new industry trend, I think they call it virtualisation, I hear it's going to be big, you might want to look into it. So what do you run your virtual machines on? you still have to put an OS on the hardware to support your VMs. Virtualization doesn't eliminate servers (as much as some cloud advocates like to claim it does) And virtualization has overhead, sometimes very significant overhead, so it's not always the right answer. David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] kdbus for 4.1-rc1
On Wed, 29 Apr 2015, Theodore Ts'o wrote: On Wed, Apr 29, 2015 at 12:26:59PM -0400, John Stoffel wrote: If your customers wnat this feature, you're more than welcome to fork the kernel and support it yourself. Oh wait... Redhat does that already. So what's the problem? Just put it into RHEL (which I use I admit, along with Debian/Mint) and be done with it. Harald, If you make the RHEL initramfs harder to debug in the field, I will await the time when some Red Hat field engineers will need to do the same sort of thing I have had to do in the field, and be amused when they want to shake you very warmly by the throat. :-) Seriously, keep things as simple as possible in the initramfs; don't use complicated bus protocols; that way lies madness. Enterprise systems aren't constantly booting (or they shouldn't be, if your kernels are sufficiently reliable :-), so trying to optimize for an extra 2 or 3 seconds worth of boot time really, REALLY isn't worth it. I've had Enterprise systems where I could hit power on two boxes, and finish the OS install on one before the other has even finished POST and look for the boot media. I did this 5 years ago, before the "let's speed up boot" push started. Admittedly, this wasn't a stock distro boot/install, it was my own optimized one, but it also wasn't as optimized and automated as it could have been (several points where the installer needed to pick items from a menu and enter values) David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] kdbus for 4.1-rc1
On Wed, 29 Apr 2015, Andy Lutomirski wrote: On Wed, Apr 29, 2015 at 1:15 PM, David Lang wrote: On Wed, 29 Apr 2015, Andy Lutomirski wrote: On Wed, Apr 29, 2015 at 12:30 PM, Austin S Hemmelgarn wrote: On 2015-04-29 14:54, Andy Lutomirski wrote: On Apr 29, 2015 5:48 AM, "Harald Hoyer" wrote: * Being in the kernel closes a lot of races which can't be fixed with the current userspace solutions. For example, with kdbus, there is a way a client can disconnect from a bus, but do so only if no further messages present in its queue, which is crucial for implementing race-free "exit-on-idle" services This can be implemented in userspace. Client to dbus daemon: may I exit now? Dbus daemon to client: yes (and no more messages) or no Depending on how this is implemented, there would be a potential issue if a message arrived for the client after the daemon told it it could exit, but before it finished shutdown, in which case the message might get lost. Then implement it the right way? The client sends some kind of sequence number with its request. so any app in the system can prevent any other app from exiting/restarting by just sending it the equivalent of a ping over dbus? preventing an app from exiting because there are unhandled messages doesn't mean that those messages are going to be handled, just that they will get read and dropped on the floor by an app trying to exit. Sometimes you will just end up with a hung app that can't process messages and needs to be restarted, but can't be restarted because there are pending messages. I think this consideration is more or less the same whether it's handled in the kernel or in userspace, though. If the justification for why this needs to be in the kernel is that you can't reliably prevent apps from exiting if there are pending messages, then the answer of "preventing apps from exiting if there are pending messages isn't a sane thing to try and do" is a direct counter to that justification for including it in the kernel. David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] kdbus for 4.1-rc1
On Wed, 29 Apr 2015, Andy Lutomirski wrote: On Wed, Apr 29, 2015 at 12:30 PM, Austin S Hemmelgarn wrote: On 2015-04-29 14:54, Andy Lutomirski wrote: On Apr 29, 2015 5:48 AM, "Harald Hoyer" wrote: * Being in the kernel closes a lot of races which can't be fixed with the current userspace solutions. For example, with kdbus, there is a way a client can disconnect from a bus, but do so only if no further messages present in its queue, which is crucial for implementing race-free "exit-on-idle" services This can be implemented in userspace. Client to dbus daemon: may I exit now? Dbus daemon to client: yes (and no more messages) or no Depending on how this is implemented, there would be a potential issue if a message arrived for the client after the daemon told it it could exit, but before it finished shutdown, in which case the message might get lost. Then implement it the right way? The client sends some kind of sequence number with its request. so any app in the system can prevent any other app from exiting/restarting by just sending it the equivalent of a ping over dbus? preventing an app from exiting because there are unhandled messages doesn't mean that those messages are going to be handled, just that they will get read and dropped on the floor by an app trying to exit. Sometimes you will just end up with a hung app that can't process messages and needs to be restarted, but can't be restarted because there are pending messages. The problem with "guaranteed delivery" messages is that things _will_ go wrong that will cause the messages to not be received and processed. At that point you have the choice of loosing some messages or freezing your entire system (you can buffer them for some time, but eventually you will run out of buffer space) We see this all the time in the logging world, people configure their systems for reliable delivery of log messages to a remote machine, then when that remote machine goes down and can't receive messages (or a network issue blocks the traffic), the sending machine blocks and causes an outage. Being too strict about guaranteeing delivery just doesn't work. You must have a mechanism to abort and throw away unprocessed messages. If this means disconnecting the receiver so that there are no missing messages to the receiver, that's a valid choice. But preventing a receiver from exiting because it hasn't processed a message is not a valid choice. David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] kdbus for 4.1-rc1
On Wed, 29 Apr 2015, Martin Steigerwald wrote: Am Mittwoch, 29. April 2015, 14:47:53 schrieb Harald Hoyer: We really don't want the IPC mechanism to be in a flux state. All tools have to fallback to a non-standard mechanism in that case. If I have to pull in a dbus daemon in the initramfs, we still have the chicken and egg problem for PID 1 talking to the logging daemon and starting dbus. systemd cannot talk to journald via dbus unless dbus-daemon is started, dbus cannot log anything on startup, if journald is not running, etc... Do I get this right that it is basically a userspace *design* decision that you use as a reason to have kdbus inside the kernel? Is it really necessary to use DBUS for talking to journald? And does it really matter that much if any message before starting up dbus do not appear in the log? /proc/kmsg is a ring buffer, it can still be copied over later. I've been getting the early boot messages in my logs for decades (assuming the system doesn't fail before the syslog daemon is started). It sometimes has required setting a larger than default ringbuffer in the kernel, but that's easy enough to do. David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] kdbus for 4.1-rc1
On Wed, 29 Apr 2015, Martin Steigerwald wrote: Am Mittwoch, 29. April 2015, 14:47:53 schrieb Harald Hoyer: We really don't want the IPC mechanism to be in a flux state. All tools have to fallback to a non-standard mechanism in that case. If I have to pull in a dbus daemon in the initramfs, we still have the chicken and egg problem for PID 1 talking to the logging daemon and starting dbus. systemd cannot talk to journald via dbus unless dbus-daemon is started, dbus cannot log anything on startup, if journald is not running, etc... Do I get this right that it is basically a userspace *design* decision that you use as a reason to have kdbus inside the kernel? Is it really necessary to use DBUS for talking to journald? And does it really matter that much if any message before starting up dbus do not appear in the log? /proc/kmsg is a ring buffer, it can still be copied over later. I've been getting the early boot messages in my logs for decades (assuming the system doesn't fail before the syslog daemon is started). It sometimes has required setting a larger than default ringbuffer in the kernel, but that's easy enough to do. David Lang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] kdbus for 4.1-rc1
On Wed, 29 Apr 2015, Andy Lutomirski wrote: On Wed, Apr 29, 2015 at 12:30 PM, Austin S Hemmelgarn ahferro...@gmail.com wrote: On 2015-04-29 14:54, Andy Lutomirski wrote: On Apr 29, 2015 5:48 AM, Harald Hoyer har...@redhat.com wrote: * Being in the kernel closes a lot of races which can't be fixed with the current userspace solutions. For example, with kdbus, there is a way a client can disconnect from a bus, but do so only if no further messages present in its queue, which is crucial for implementing race-free exit-on-idle services This can be implemented in userspace. Client to dbus daemon: may I exit now? Dbus daemon to client: yes (and no more messages) or no Depending on how this is implemented, there would be a potential issue if a message arrived for the client after the daemon told it it could exit, but before it finished shutdown, in which case the message might get lost. Then implement it the right way? The client sends some kind of sequence number with its request. so any app in the system can prevent any other app from exiting/restarting by just sending it the equivalent of a ping over dbus? preventing an app from exiting because there are unhandled messages doesn't mean that those messages are going to be handled, just that they will get read and dropped on the floor by an app trying to exit. Sometimes you will just end up with a hung app that can't process messages and needs to be restarted, but can't be restarted because there are pending messages. The problem with guaranteed delivery messages is that things _will_ go wrong that will cause the messages to not be received and processed. At that point you have the choice of loosing some messages or freezing your entire system (you can buffer them for some time, but eventually you will run out of buffer space) We see this all the time in the logging world, people configure their systems for reliable delivery of log messages to a remote machine, then when that remote machine goes down and can't receive messages (or a network issue blocks the traffic), the sending machine blocks and causes an outage. Being too strict about guaranteeing delivery just doesn't work. You must have a mechanism to abort and throw away unprocessed messages. If this means disconnecting the receiver so that there are no missing messages to the receiver, that's a valid choice. But preventing a receiver from exiting because it hasn't processed a message is not a valid choice. David Lang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] kdbus for 4.1-rc1
On Wed, 29 Apr 2015, Andy Lutomirski wrote: On Wed, Apr 29, 2015 at 1:15 PM, David Lang da...@lang.hm wrote: On Wed, 29 Apr 2015, Andy Lutomirski wrote: On Wed, Apr 29, 2015 at 12:30 PM, Austin S Hemmelgarn ahferro...@gmail.com wrote: On 2015-04-29 14:54, Andy Lutomirski wrote: On Apr 29, 2015 5:48 AM, Harald Hoyer har...@redhat.com wrote: * Being in the kernel closes a lot of races which can't be fixed with the current userspace solutions. For example, with kdbus, there is a way a client can disconnect from a bus, but do so only if no further messages present in its queue, which is crucial for implementing race-free exit-on-idle services This can be implemented in userspace. Client to dbus daemon: may I exit now? Dbus daemon to client: yes (and no more messages) or no Depending on how this is implemented, there would be a potential issue if a message arrived for the client after the daemon told it it could exit, but before it finished shutdown, in which case the message might get lost. Then implement it the right way? The client sends some kind of sequence number with its request. so any app in the system can prevent any other app from exiting/restarting by just sending it the equivalent of a ping over dbus? preventing an app from exiting because there are unhandled messages doesn't mean that those messages are going to be handled, just that they will get read and dropped on the floor by an app trying to exit. Sometimes you will just end up with a hung app that can't process messages and needs to be restarted, but can't be restarted because there are pending messages. I think this consideration is more or less the same whether it's handled in the kernel or in userspace, though. If the justification for why this needs to be in the kernel is that you can't reliably prevent apps from exiting if there are pending messages, then the answer of preventing apps from exiting if there are pending messages isn't a sane thing to try and do is a direct counter to that justification for including it in the kernel. David Lang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] kdbus for 4.1-rc1
On Wed, 29 Apr 2015, Theodore Ts'o wrote: On Wed, Apr 29, 2015 at 12:26:59PM -0400, John Stoffel wrote: If your customers wnat this feature, you're more than welcome to fork the kernel and support it yourself. Oh wait... Redhat does that already. So what's the problem? Just put it into RHEL (which I use I admit, along with Debian/Mint) and be done with it. Harald, If you make the RHEL initramfs harder to debug in the field, I will await the time when some Red Hat field engineers will need to do the same sort of thing I have had to do in the field, and be amused when they want to shake you very warmly by the throat. :-) Seriously, keep things as simple as possible in the initramfs; don't use complicated bus protocols; that way lies madness. Enterprise systems aren't constantly booting (or they shouldn't be, if your kernels are sufficiently reliable :-), so trying to optimize for an extra 2 or 3 seconds worth of boot time really, REALLY isn't worth it. I've had Enterprise systems where I could hit power on two boxes, and finish the OS install on one before the other has even finished POST and look for the boot media. I did this 5 years ago, before the let's speed up boot push started. Admittedly, this wasn't a stock distro boot/install, it was my own optimized one, but it also wasn't as optimized and automated as it could have been (several points where the installer needed to pick items from a menu and enter values) David Lang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] kdbus for 4.1-rc1
On Thu, 30 Apr 2015, Dave Airlie wrote: On 30 April 2015 at 10:05, David Lang da...@lang.hm wrote: On Wed, 29 Apr 2015, Theodore Ts'o wrote: On Wed, Apr 29, 2015 at 12:26:59PM -0400, John Stoffel wrote: If your customers wnat this feature, you're more than welcome to fork the kernel and support it yourself. Oh wait... Redhat does that already. So what's the problem? Just put it into RHEL (which I use I admit, along with Debian/Mint) and be done with it. Harald, If you make the RHEL initramfs harder to debug in the field, I will await the time when some Red Hat field engineers will need to do the same sort of thing I have had to do in the field, and be amused when they want to shake you very warmly by the throat. :-) Seriously, keep things as simple as possible in the initramfs; don't use complicated bus protocols; that way lies madness. Enterprise systems aren't constantly booting (or they shouldn't be, if your kernels are sufficiently reliable :-), so trying to optimize for an extra 2 or 3 seconds worth of boot time really, REALLY isn't worth it. I've had Enterprise systems where I could hit power on two boxes, and finish the OS install on one before the other has even finished POST and look for the boot media. I did this 5 years ago, before the let's speed up boot push started. Admittedly, this wasn't a stock distro boot/install, it was my own optimized one, but it also wasn't as optimized and automated as it could have been (several points where the installer needed to pick items from a menu and enter values) You guys might have missed this new industry trend, I think they call it virtualisation, I hear it's going to be big, you might want to look into it. So what do you run your virtual machines on? you still have to put an OS on the hardware to support your VMs. Virtualization doesn't eliminate servers (as much as some cloud advocates like to claim it does) And virtualization has overhead, sometimes very significant overhead, so it's not always the right answer. David Lang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] kdbus for 4.1-rc1
On Tue, 28 Apr 2015, Havoc Pennington wrote: On Tue, Apr 28, 2015 at 1:19 PM, David Lang wrote: If the examples that are being used to show the performance advantage of kdbus vs normal dbus are doing the wrong thing, then we need to get some other examples available to people who don't live and breath dbus that 'so things right' so that the kernel developers can see what you think is the real problem and how kdbus addresses it. So far, this 'wrong' example is the only thing that's been posted to show the performance advantage of kdbus. I'm hopeful someone will do that. fwiw, I would be suspicious of a broken benchmark if it didn't show: * the bus daemon means an extra read/parse and marshal/write per message, so 4 vs. 2 * the existence of the bus daemon therefore makes a message send/receive take roughly twice as long https://lwn.net/Articles/580194/ has a bit more elaboration about number of copies, validations, and context switches in each case. From what I can tell, the core performance claim for kdbus is that for a userspace daemon to be a routing intermediary, it has to receive and re-send messages. If the baseline performance of IPC is the cost to send once and receive once, adding the daemon means there's twice as much to do (1 more receive, 1 more send). However fast you make send/receive, the daemon always means there are twice as many send/receives as there would be with no daemon. there are twice as many context switches, nobody disputes that, the question is if it matters. It doesn't matter if the message router is in kernel space or user space, it still needs to read/parse, marshal/write the data, so you aren't saving that time due to it being in the kernel. If that isn't what a benchmark shows, then there's a mystery to explain... (one disruption to the ratio of course could be if the clients use a much faster or slower dbus lib than the daemon) As noted many times, of course this 2x penalty for the daemon was a conscious tradeoff - kdbus is trying to escape the tradeoff in order to extend usage of dbus to more use cases. Given the tradeoff, _existing_ uses of dbus seem to prefer the performance hit to the loss of useful semantics, but potential new users would like to or need to have both. If there is a 2x performance improvement for being in the kernel, but a 100x performance improvement from fixing the userspace code, the effort should be spent on the userspace code, not on moving things to kernel space. Remember the Tux in-kernel webserver? it showed performance improvements from putting the http daemon in the kernel, and a lot of the arguments about it sound very similar (reduced context switches, etc) David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] kdbus for 4.1-rc1
On Tue, 28 Apr 2015, Havoc Pennington wrote: btw if I can make a suggestion, it's quite confusing to talk about "dbus" unqualified when we are talking about implementation issues, since it muddles bus daemon vs. clients, and also since there are lots of implementations of the client bindings: http://www.freedesktop.org/wiki/Software/DBusBindings/ For the bus daemon, the only two implementations I know of are the original one (which uses libdbus as its binding) and kdbus, though. I would expect there's no question the bus daemon can be faster, maybe say 1.5x raw sockets instead of 2.5x, or whatever - something on that order. Should probably simply stipulate this for discussion purposes: "someone could optimize the crap out of the bus daemon". The kdbus question is about whether to eliminate this daemon entirely. As I'm seeing things, we aren't talking about 1.5x vs 2.5x, we're talking about 1000x If the examples that are being used to show the performance advantage of kdbus vs normal dbus are doing the wrong thing, then we need to get some other examples available to people who don't live and breath dbus that 'so things right' so that the kernel developers can see what you think is the real problem and how kdbus addresses it. So far, this 'wrong' example is the only thing that's been posted to show the performance advantage of kdbus. David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] kdbus for 4.1-rc1
On Tue, 28 Apr 2015, Havoc Pennington wrote: On Tue, Apr 28, 2015 at 1:19 PM, David Lang da...@lang.hm wrote: If the examples that are being used to show the performance advantage of kdbus vs normal dbus are doing the wrong thing, then we need to get some other examples available to people who don't live and breath dbus that 'so things right' so that the kernel developers can see what you think is the real problem and how kdbus addresses it. So far, this 'wrong' example is the only thing that's been posted to show the performance advantage of kdbus. I'm hopeful someone will do that. fwiw, I would be suspicious of a broken benchmark if it didn't show: * the bus daemon means an extra read/parse and marshal/write per message, so 4 vs. 2 * the existence of the bus daemon therefore makes a message send/receive take roughly twice as long https://lwn.net/Articles/580194/ has a bit more elaboration about number of copies, validations, and context switches in each case. From what I can tell, the core performance claim for kdbus is that for a userspace daemon to be a routing intermediary, it has to receive and re-send messages. If the baseline performance of IPC is the cost to send once and receive once, adding the daemon means there's twice as much to do (1 more receive, 1 more send). However fast you make send/receive, the daemon always means there are twice as many send/receives as there would be with no daemon. there are twice as many context switches, nobody disputes that, the question is if it matters. It doesn't matter if the message router is in kernel space or user space, it still needs to read/parse, marshal/write the data, so you aren't saving that time due to it being in the kernel. If that isn't what a benchmark shows, then there's a mystery to explain... (one disruption to the ratio of course could be if the clients use a much faster or slower dbus lib than the daemon) As noted many times, of course this 2x penalty for the daemon was a conscious tradeoff - kdbus is trying to escape the tradeoff in order to extend usage of dbus to more use cases. Given the tradeoff, _existing_ uses of dbus seem to prefer the performance hit to the loss of useful semantics, but potential new users would like to or need to have both. If there is a 2x performance improvement for being in the kernel, but a 100x performance improvement from fixing the userspace code, the effort should be spent on the userspace code, not on moving things to kernel space. Remember the Tux in-kernel webserver? it showed performance improvements from putting the http daemon in the kernel, and a lot of the arguments about it sound very similar (reduced context switches, etc) David Lang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] kdbus for 4.1-rc1
On Tue, 28 Apr 2015, Havoc Pennington wrote: btw if I can make a suggestion, it's quite confusing to talk about dbus unqualified when we are talking about implementation issues, since it muddles bus daemon vs. clients, and also since there are lots of implementations of the client bindings: http://www.freedesktop.org/wiki/Software/DBusBindings/ For the bus daemon, the only two implementations I know of are the original one (which uses libdbus as its binding) and kdbus, though. I would expect there's no question the bus daemon can be faster, maybe say 1.5x raw sockets instead of 2.5x, or whatever - something on that order. Should probably simply stipulate this for discussion purposes: someone could optimize the crap out of the bus daemon. The kdbus question is about whether to eliminate this daemon entirely. As I'm seeing things, we aren't talking about 1.5x vs 2.5x, we're talking about 1000x If the examples that are being used to show the performance advantage of kdbus vs normal dbus are doing the wrong thing, then we need to get some other examples available to people who don't live and breath dbus that 'so things right' so that the kernel developers can see what you think is the real problem and how kdbus addresses it. So far, this 'wrong' example is the only thing that's been posted to show the performance advantage of kdbus. David Lang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] kdbus for 4.1-rc1
On Mon, 27 Apr 2015, Lukasz Skalski wrote: Subject: Re: [GIT PULL] kdbus for 4.1-rc1 On 04/24/2015 09:25 PM, Greg Kroah-Hartman wrote: On Fri, Apr 24, 2015 at 04:34:34PM +0200, Lukasz Skalski wrote: On 04/24/2015 04:19 PM, Havoc Pennington wrote: On Fri, Apr 24, 2015 at 9:50 AM, Lukasz Skalski wrote: - client: http://fpaste.org/215156/ Cool - it might also be interesting to try this without blocking round trips, i.e. send requests as quickly as you can, and collect replies asynchronously. That's how people ideally use dbus. It should certainly reduce the total benchmark time, but just wondering if this usage increases or decreases the delta between userspace daemon and kdbus. No problem - I'll prepare also asynchronous version. That would be great to see as well. Many thanks for doing this work. As it was proposed by Havoc and Greg I've created simple benchmark for asynchronous calls: - server: http://fpaste.org/215157/ (the same as in the previous test) - client: http://fpaste.org/215724/ (asynchronous version) For asynchronous version of client I had to decrease number of calls to 128 (for synchronous version it was x2 calls), otherwise we can exceed the maximum number of pending replies per connection. aren't we being told that part of the reason for needing kdbus is that thousands, or tens of thousands of messages are being spewed out? how does limiting it to 128 messages represent real-life if this is the case? David Lang The test results are following: +--+++ | |Elapsed time|Elapsed time| | Message size | GLIB WITH NATIVE | GLIB + DBUS-DAEMON | | [bytes]|KDBUS SUPPORT* || +--+++ | |1) 0.018639 s |1) 0.029947 s | | 1000 |2) 0.017045 s |2) 0.032812 s | | |3) 0.017490 s |3) 0.029971 s | | |4) 0.018001 s |4) 0.026485 s | +--+++ | |3) 0.019898 s |3) 0.040914 s | |1 |3) 0.022187 s |3) 0.033604 s | | |3) 0.020854 s |3) 0.037616 s | | |3) 0.020020 s |3) 0.033772 s | +--+++ *all tests performed without using memfd mechanism. And as I wrote in my previous mail, kdbus transport for GLib is not finished yet and there are still some places for improvements, so please do not treat these test results as final). greg k-h Cheers, -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] kdbus for 4.1-rc1
On Mon, 27 Apr 2015, Lukasz Skalski wrote: Subject: Re: [GIT PULL] kdbus for 4.1-rc1 On 04/24/2015 09:25 PM, Greg Kroah-Hartman wrote: On Fri, Apr 24, 2015 at 04:34:34PM +0200, Lukasz Skalski wrote: On 04/24/2015 04:19 PM, Havoc Pennington wrote: On Fri, Apr 24, 2015 at 9:50 AM, Lukasz Skalski l.skal...@samsung.com wrote: - client: http://fpaste.org/215156/ Cool - it might also be interesting to try this without blocking round trips, i.e. send requests as quickly as you can, and collect replies asynchronously. That's how people ideally use dbus. It should certainly reduce the total benchmark time, but just wondering if this usage increases or decreases the delta between userspace daemon and kdbus. No problem - I'll prepare also asynchronous version. That would be great to see as well. Many thanks for doing this work. As it was proposed by Havoc and Greg I've created simple benchmark for asynchronous calls: - server: http://fpaste.org/215157/ (the same as in the previous test) - client: http://fpaste.org/215724/ (asynchronous version) For asynchronous version of client I had to decrease number of calls to 128 (for synchronous version it was x2 calls), otherwise we can exceed the maximum number of pending replies per connection. aren't we being told that part of the reason for needing kdbus is that thousands, or tens of thousands of messages are being spewed out? how does limiting it to 128 messages represent real-life if this is the case? David Lang The test results are following: +--+++ | |Elapsed time|Elapsed time| | Message size | GLIB WITH NATIVE | GLIB + DBUS-DAEMON | | [bytes]|KDBUS SUPPORT* || +--+++ | |1) 0.018639 s |1) 0.029947 s | | 1000 |2) 0.017045 s |2) 0.032812 s | | |3) 0.017490 s |3) 0.029971 s | | |4) 0.018001 s |4) 0.026485 s | +--+++ | |3) 0.019898 s |3) 0.040914 s | |1 |3) 0.022187 s |3) 0.033604 s | | |3) 0.020854 s |3) 0.037616 s | | |3) 0.020020 s |3) 0.033772 s | +--+++ *all tests performed without using memfd mechanism. And as I wrote in my previous mail, kdbus transport for GLib is not finished yet and there are still some places for improvements, so please do not treat these test results as final). greg k-h Cheers, -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Trusted kernel patchset
On Mon, 16 Mar 2015, Matthew Garrett wrote: On Mon, 2015-03-16 at 14:45 +, One Thousand Gnomes wrote: On Fri, 13 Mar 2015 11:38:16 -1000 Matthew Garrett wrote: 4) Used the word "measured" Nothing is being measured. Nothing is being trusted either. It's simple ensuring you probably have the same holes as before. Also the boot loader should be measuring the kernel before it runs it, thats how it knows the signature is correct. That's one implementation. Another is the kernel being stored on non-volatile media. Anything that encourages deploying systems that can't be upgraded to fix bugs that are discovered is a problem. This is an issue that the Internet of Things folks are just starting to notice, and it's only going to get worse before it gets better. How do you patch bugs on your non-volitile media? What keeps that mechansim from being abused. David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Trusted kernel patchset
On Mon, 16 Mar 2015, Matthew Garrett wrote: On Mon, 2015-03-16 at 14:45 +, One Thousand Gnomes wrote: On Fri, 13 Mar 2015 11:38:16 -1000 Matthew Garrett matthew.garr...@nebula.com wrote: 4) Used the word measured Nothing is being measured. Nothing is being trusted either. It's simple ensuring you probably have the same holes as before. Also the boot loader should be measuring the kernel before it runs it, thats how it knows the signature is correct. That's one implementation. Another is the kernel being stored on non-volatile media. Anything that encourages deploying systems that can't be upgraded to fix bugs that are discovered is a problem. This is an issue that the Internet of Things folks are just starting to notice, and it's only going to get worse before it gets better. How do you patch bugs on your non-volitile media? What keeps that mechansim from being abused. David Lang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: cgroup: status-quo and userland efforts
On Wed, 4 Mar 2015, Luke Kenneth Casson Leighton wrote: and why he concludes that having a single hierarchy for all resource types. correcting to add "is not always a good idea" i think having a single hierarchy is fine *if* and only if it is possible to overlay something similar to SE/Linux policy files - enforced by the kernel *not* by userspace (sorry serge!) - such that through those policy files any type of hierarchy be it single or multi layer, recursive or in fact absolutely anything, may be emulated and properly enforced. The fundamental problem is that sometimes you have types of controls that are orthoginal to each other, and you either manage the two types of things in separate hierarchies, or you end up with one hierarchy that is a permutation of all the combinations of what would have been separate hierarchies. David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: cgroup: status-quo and userland efforts
On Wed, 4 Mar 2015, Luke Kenneth Casson Leighton wrote: and why he concludes that having a single hierarchy for all resource types. correcting to add is not always a good idea i think having a single hierarchy is fine *if* and only if it is possible to overlay something similar to SE/Linux policy files - enforced by the kernel *not* by userspace (sorry serge!) - such that through those policy files any type of hierarchy be it single or multi layer, recursive or in fact absolutely anything, may be emulated and properly enforced. The fundamental problem is that sometimes you have types of controls that are orthoginal to each other, and you either manage the two types of things in separate hierarchies, or you end up with one hierarchy that is a permutation of all the combinations of what would have been separate hierarchies. David Lang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: cgroup: status-quo and userland efforts
On Tue, 3 Mar 2015, Luke Leighton wrote: I wrote about that many times, but here are two of the problems. * There's no way to designate a cgroup to a resource, because cgroup is only defined by the combination of who's looking at it for which controller. That's how you end up with tagging the same resource multiple times for different controllers and even then it's broken as when you move resources from one cgroup to another, you can't tell what to do with other tags. While allowing obscene level of flexibility, multiple hierarchies destroy a very fundamental concept that it *should* provide - that of a resource container. It can't because a "cgroup" is undefined under multiple hierarchies. ok, there is an alternative to hierarchies, which has precedent (and, importantly, a set of userspace management tools as well as existing code in the linux kernel), and it's the FLASK model which you know as SE/Linux. whilst the majority of people view management to be "hierarchical" (so there is a top dog or God process and everything trickles down from that), this is viewed as such an anathema in the security industry that someone came up with a formal specification for the real-world way in which permissions are managed, and it's called the FLASK model. On this topic it's also worth reading Neil Brown's series of articles on this over at http://lwn.net/Articles/604609/ and why he concludes that having a single hierarchy for all resource types. David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: cgroup: status-quo and userland efforts
On Tue, 3 Mar 2015, Luke Leighton wrote: I wrote about that many times, but here are two of the problems. * There's no way to designate a cgroup to a resource, because cgroup is only defined by the combination of who's looking at it for which controller. That's how you end up with tagging the same resource multiple times for different controllers and even then it's broken as when you move resources from one cgroup to another, you can't tell what to do with other tags. While allowing obscene level of flexibility, multiple hierarchies destroy a very fundamental concept that it *should* provide - that of a resource container. It can't because a cgroup is undefined under multiple hierarchies. ok, there is an alternative to hierarchies, which has precedent (and, importantly, a set of userspace management tools as well as existing code in the linux kernel), and it's the FLASK model which you know as SE/Linux. whilst the majority of people view management to be hierarchical (so there is a top dog or God process and everything trickles down from that), this is viewed as such an anathema in the security industry that someone came up with a formal specification for the real-world way in which permissions are managed, and it's called the FLASK model. On this topic it's also worth reading Neil Brown's series of articles on this over at http://lwn.net/Articles/604609/ and why he concludes that having a single hierarchy for all resource types. David Lang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/