Re: [Ksummit-discuss] bug-introducing patches

2018-05-08 Thread David Lang

On Tue, 8 May 2018, Sasha Levin wrote:


There's no one, for example, who picked up vanilla v4.16 and plans to
keep using it for a year.


Actually, at a prior job I would do almost exactly that.

I never intended to go a year without updating, but it would happen if nothing 
came up that was related to the hardware/features I was running.


so 'no one uses the Linus kernel is false.


Re: [Ksummit-discuss] bug-introducing patches

2018-05-08 Thread David Lang

On Tue, 8 May 2018, Sasha Levin wrote:


There's no one, for example, who picked up vanilla v4.16 and plans to
keep using it for a year.


Actually, at a prior job I would do almost exactly that.

I never intended to go a year without updating, but it would happen if nothing 
came up that was related to the hardware/features I was running.


so 'no one uses the Linus kernel is false.


Re: Reg : Spectre & Meltdown

2018-01-15 Thread David Lang
the 4.4.112 patches that Greg just posted include a bunch of work for these 
vulnerabilities.


Who knows what has been backported to the kernel he is running.
k


Re: Reg : Spectre & Meltdown

2018-01-15 Thread David Lang
the 4.4.112 patches that Greg just posted include a bunch of work for these 
vulnerabilities.


Who knows what has been backported to the kernel he is running.
k


Re: Reg : Spectre & Meltdown

2018-01-15 Thread David Lang
you are running a RedHat kernel, you will have to ask them about what they have 
included in it.

k


Re: Reg : Spectre & Meltdown

2018-01-15 Thread David Lang
you are running a RedHat kernel, you will have to ask them about what they have 
included in it.

k


Re: [PATCH] x86/retpoline: Fill return stack buffer on vmexit

2018-01-10 Thread David Lang
I somewhat hate to ask this, but for those of us following at home, what does 
this add to the overhead?


I am remembering an estimate from mid last week that put retpoline at replacing 
a 3 clock 'ret' with 30 clocks of eye-bleed code


Re: [PATCH] x86/retpoline: Fill return stack buffer on vmexit

2018-01-10 Thread David Lang
I somewhat hate to ask this, but for those of us following at home, what does 
this add to the overhead?


I am remembering an estimate from mid last week that put retpoline at replacing 
a 3 clock 'ret' with 30 clocks of eye-bleed code


Re: Avoid speculative indirect calls in kernel

2018-01-07 Thread David Lang
The point is that in many cases, if someone explits the "trusted" process, they 
already have everything that the machine is able to do anyway.


Re: Avoid speculative indirect calls in kernel

2018-01-07 Thread David Lang
The point is that in many cases, if someone explits the "trusted" process, they 
already have everything that the machine is able to do anyway.


Re: Avoid speculative indirect calls in kernel

2018-01-03 Thread David Lang

On Wed, 3 Jan 2018, Andi Kleen wrote:



Why is this all done without any configuration options?


I was thinking of a config option, but I was struggling with a name.

CONFIG_INSECURE_KERNEL, CONFIG_LEAK_MEMORY?


CONFIG_BUGGY_INTEL_CACHE (or similar)

something that indicates that this is to support the Intel CPUs that have this 
bug in them.


We've had such CPU specific support options in the past.

Some people will need the speed more than the protection, some people will be 
running on CPUs that don't need this.


Why is this needed? because of an Intel bug, so name it accordingly.

David Lang


Re: Avoid speculative indirect calls in kernel

2018-01-03 Thread David Lang

On Wed, 3 Jan 2018, Andi Kleen wrote:



Why is this all done without any configuration options?


I was thinking of a config option, but I was struggling with a name.

CONFIG_INSECURE_KERNEL, CONFIG_LEAK_MEMORY?


CONFIG_BUGGY_INTEL_CACHE (or similar)

something that indicates that this is to support the Intel CPUs that have this 
bug in them.


We've had such CPU specific support options in the past.

Some people will need the speed more than the protection, some people will be 
running on CPUs that don't need this.


Why is this needed? because of an Intel bug, so name it accordingly.

David Lang


Re: Yes you have standing to sue GRSecurity

2017-07-30 Thread David Lang

On Sat, 29 Jul 2017, Paul G. Allen wrote:


It's not even clear that there is infringement.  The GPL merely
requires that people who have been distributed copies of GPL'ed code
must not be restricted from further redistribution of the code.  It
does not require that that someone who is distributing it must
available on a public FTP/HTTP server.


what I have seen reported is that they are adding additional restrictions, that 
if any of their customers redistribute the source, their contract with 
grsecurity is terminated.



If there is something to this (that GRSecurity is somehow in violation
of the GPL), then it would probably be a very good idea for someone
(the community, Red Hat, etc.) to protect the kernel. From my
understanding, at least in America, protections under any license or
contract (especially dealing with copyright and trademark
infringement) are only enforceable as long as the party with the
rights enforce the license/contract/agreement.


You are thinking of Trademarks, they must be defended or you loose them. 
Contracts and Licenses do not need to be defended at every chance or risk 
loosing them.



There is also something in law called "setting a precedent" and if the
violating of the Linux license agreement is left unchecked, then quite
possibly a precedent could be set to allow an entire upstream kernel
to be co-opted.


This is a potential problem.

David Lang


Re: Yes you have standing to sue GRSecurity

2017-07-30 Thread David Lang

On Sat, 29 Jul 2017, Paul G. Allen wrote:


It's not even clear that there is infringement.  The GPL merely
requires that people who have been distributed copies of GPL'ed code
must not be restricted from further redistribution of the code.  It
does not require that that someone who is distributing it must
available on a public FTP/HTTP server.


what I have seen reported is that they are adding additional restrictions, that 
if any of their customers redistribute the source, their contract with 
grsecurity is terminated.



If there is something to this (that GRSecurity is somehow in violation
of the GPL), then it would probably be a very good idea for someone
(the community, Red Hat, etc.) to protect the kernel. From my
understanding, at least in America, protections under any license or
contract (especially dealing with copyright and trademark
infringement) are only enforceable as long as the party with the
rights enforce the license/contract/agreement.


You are thinking of Trademarks, they must be defended or you loose them. 
Contracts and Licenses do not need to be defended at every chance or risk 
loosing them.



There is also something in law called "setting a precedent" and if the
violating of the Linux license agreement is left unchecked, then quite
possibly a precedent could be set to allow an entire upstream kernel
to be co-opted.


This is a potential problem.

David Lang


Re: [copyleft-next] Re: Kernel modules under new copyleft licence : (was Re: [PATCH v2] module.h: add copyleft-next >= 0.3.1 as GPL compatible)

2017-05-18 Thread David Lang

On Fri, 19 May 2017, Luis R. Rodriguez wrote:


On Thu, May 18, 2017 at 06:12:05PM -0400, Theodore Ts'o wrote:

Sorry, I guess I wasn't clear enough.  So there are two major cases,
with three sub-cases for each.

1)  The driver is dual-licensed GPLv2 and copyleft-next

   1A) The developer only wants to use the driver, without making
   any changes to it.

   1B) The developer wants to make changes to the driver, and
   distribute source and binaries

   1C) The developer wants to make changes to the driver, and
   contribute the changes back to upstream.

2)  The driver is solely licensed under copyleft-next

   2A) The developer only wants to use the driver, without making
   any changes to it.

   2B) The developer wants to make changes to the driver, and
   distribute source and binaries

   2C) The developer wants to make changes to the driver, and
   contribute the changes back to upstream.

In cases 1A and 1B, I claim that no additional lawyer ink is required,


I really cannot see how you might have an attorney who wants ink on 2A but not 
1A.
I really cannot see how you might have an attorney who wants ink on 2B but not 
1B.


If something is under multiple licences, and one is a license that is known, you 
can just use that license and not worry (or even think) about what other 
licenses are available.


But if it's a new license, then it needs to be analyzed, and that takes lawyer 
ink.


That's why 1A and 1B are ok, you can ignore copyleft-next and just use GPLv2

David Lang


Re: [copyleft-next] Re: Kernel modules under new copyleft licence : (was Re: [PATCH v2] module.h: add copyleft-next >= 0.3.1 as GPL compatible)

2017-05-18 Thread David Lang

On Fri, 19 May 2017, Luis R. Rodriguez wrote:


On Thu, May 18, 2017 at 06:12:05PM -0400, Theodore Ts'o wrote:

Sorry, I guess I wasn't clear enough.  So there are two major cases,
with three sub-cases for each.

1)  The driver is dual-licensed GPLv2 and copyleft-next

   1A) The developer only wants to use the driver, without making
   any changes to it.

   1B) The developer wants to make changes to the driver, and
   distribute source and binaries

   1C) The developer wants to make changes to the driver, and
   contribute the changes back to upstream.

2)  The driver is solely licensed under copyleft-next

   2A) The developer only wants to use the driver, without making
   any changes to it.

   2B) The developer wants to make changes to the driver, and
   distribute source and binaries

   2C) The developer wants to make changes to the driver, and
   contribute the changes back to upstream.

In cases 1A and 1B, I claim that no additional lawyer ink is required,


I really cannot see how you might have an attorney who wants ink on 2A but not 
1A.
I really cannot see how you might have an attorney who wants ink on 2B but not 
1B.


If something is under multiple licences, and one is a license that is known, you 
can just use that license and not worry (or even think) about what other 
licenses are available.


But if it's a new license, then it needs to be analyzed, and that takes lawyer 
ink.


That's why 1A and 1B are ok, you can ignore copyleft-next and just use GPLv2

David Lang


Re: Apparent backward time travel in timestamps on file creation

2017-03-30 Thread David Lang

On Thu, 30 Mar 2017, David Howells wrote:


Linus Torvalds <torva...@linux-foundation.org> wrote:


The error bar can be huge, for the simple reason that the filesystem
you are testing may not be sharing a clock with the CPU at _all_.

IOW, think network filesystems.


Can't I just not do the tests when the filesystem is a network fs?  I don't
think it should be a problem for disk filesystems on network-attached storage.


it's not trivial to detect if a filesystem is local or network (you would have 
to do calls to figure out what filesystem you are on, then have a list to define 
what's local and what's remote, that list would become out of date as new 
filesystems are added)


David Lang


Re: Apparent backward time travel in timestamps on file creation

2017-03-30 Thread David Lang

On Thu, 30 Mar 2017, David Howells wrote:


Linus Torvalds  wrote:


The error bar can be huge, for the simple reason that the filesystem
you are testing may not be sharing a clock with the CPU at _all_.

IOW, think network filesystems.


Can't I just not do the tests when the filesystem is a network fs?  I don't
think it should be a problem for disk filesystems on network-attached storage.


it's not trivial to detect if a filesystem is local or network (you would have 
to do calls to figure out what filesystem you are on, then have a list to define 
what's local and what's remote, that list would become out of date as new 
filesystems are added)


David Lang


Re: [Cluster-devel] [PATCH 8/8] Revert "ext4: fix wrong gfp type under transaction"

2017-01-28 Thread David Lang

On Fri, 27 Jan 2017, Christoph Hellwig wrote:


On Fri, Jan 27, 2017 at 11:40:42AM -0500, Theodore Ts'o wrote:

The reason why I'm nervous is that nojournal mode is not a common
configuration, and "wait until production systems start failing" is
not a strategy that I or many SRE-types find comforting.


What does SRE stand for?


Site Reliability Engineer, a mix of operations and engineering (DevOps++)

David Lang


Re: [Cluster-devel] [PATCH 8/8] Revert "ext4: fix wrong gfp type under transaction"

2017-01-28 Thread David Lang

On Fri, 27 Jan 2017, Christoph Hellwig wrote:


On Fri, Jan 27, 2017 at 11:40:42AM -0500, Theodore Ts'o wrote:

The reason why I'm nervous is that nojournal mode is not a common
configuration, and "wait until production systems start failing" is
not a strategy that I or many SRE-types find comforting.


What does SRE stand for?


Site Reliability Engineer, a mix of operations and engineering (DevOps++)

David Lang


Re: Regression - SATA disks behind USB ones on v4.8-rc1, breaking boot. [Re: Who reordered my disks (probably v4.8-rc1 problem)]

2016-08-14 Thread David Lang

On Sun, 14 Aug 2016, Tom Yan wrote:


On 14 August 2016 at 18:07, Tom Yan <tom.t...@gmail.com> wrote:

On 14 August 2016 at 18:01, Pavel Machek <pa...@ucw.cz> wrote:


Since SATA support was merged, certainly since v2.4, and from way
before /dev/disk/by-id existed.


I have no idea how "SATA before USB" had been done in the past (if it
was ever a thing in the kernel), but that has not been the case since
at least v3.0 AFAIR.



People may not run udev, and you can't use /dev/disk/by-id on kernel
command line.



No, but you can always use root=PARTUUID=, that's built into the
kernel. (root=UUID= requires udev or so though).


Silly me. root=UUID= has nothing to do with udev, but `blkid` in
util-linux. At least that's how it's done in Arch/mkinitcpio.



The rule is "don't break working systems", not "but we are allowed to break 
systems, see it says here not to depend on this"


Drive ordering has been stable since the 0.1 kernel [1]

It takes a lot longer to detect USB drives, why in the world would they be 
detected before hard-wired drives?


I expect that Linus' response is going to be very quotable.

David Lang


[1] given stable hardware and no new drivers becoming involved


Re: Regression - SATA disks behind USB ones on v4.8-rc1, breaking boot. [Re: Who reordered my disks (probably v4.8-rc1 problem)]

2016-08-14 Thread David Lang

On Sun, 14 Aug 2016, Tom Yan wrote:


On 14 August 2016 at 18:07, Tom Yan  wrote:

On 14 August 2016 at 18:01, Pavel Machek  wrote:


Since SATA support was merged, certainly since v2.4, and from way
before /dev/disk/by-id existed.


I have no idea how "SATA before USB" had been done in the past (if it
was ever a thing in the kernel), but that has not been the case since
at least v3.0 AFAIR.



People may not run udev, and you can't use /dev/disk/by-id on kernel
command line.



No, but you can always use root=PARTUUID=, that's built into the
kernel. (root=UUID= requires udev or so though).


Silly me. root=UUID= has nothing to do with udev, but `blkid` in
util-linux. At least that's how it's done in Arch/mkinitcpio.



The rule is "don't break working systems", not "but we are allowed to break 
systems, see it says here not to depend on this"


Drive ordering has been stable since the 0.1 kernel [1]

It takes a lot longer to detect USB drives, why in the world would they be 
detected before hard-wired drives?


I expect that Linus' response is going to be very quotable.

David Lang


[1] given stable hardware and no new drivers becoming involved


Re: Variant symlink filesystem

2016-03-11 Thread David Lang

On Sat, 12 Mar 2016, Cole wrote:


On 12 March 2016 at 00:24, Al Viro <v...@zeniv.linux.org.uk> wrote:

On Sat, Mar 12, 2016 at 12:03:11AM +0200, Cole wrote:


This was one of the first solutions we looked at, and using various
namespaces. However we would like to be able to have multiple terminal
sessions open, and be able to have each session using a different
mount point, or be able to use the other terminals mount point, i.e.
switching the mount point to that of the other terminals. We would
also like the shell to be able to make use of these, and use shell
commands such as 'ls'.

When we originally looked at namespaces and containers, we could not
find a solution to achieve the above. Is this possible using
namespaces?


I'd try to look at setns(2) if you want processes joinging existing namespaces.
I'm afraid that I'll need to get some sleep before I'll be up to asking
the right questions for figuring out what requirements do you have and
what's the best way to do it - after a while coffee stops being efficient
and I'm already several hours past that ;-/



Sure, not a problem, when you have time to reply I will gladly welcome
any feed back.

As for the usage, I'll explain it a bit so that you have something to
work off of when you get a chance to read it.

The problem we encountered with namespaces when we looked at it more
than a year ago was 'how do you get the shell' to join them, or into
one. And also how do you move the shell in one terminal session into a
namespace that another shell is currently in. We wanted a solution
that doesn't require modifying existing programs to make them
namespace aware. However, as I said, this was more than a year ago
that we looked at it, and we could easily have misunderstood
something, or not understood the full functionality available. If you
say this is possible, without modifying programs such as bash, could
you please point me in the direction of the documentation describing
this, and I will try to educate myself.


looking at the setns() function, it seems like you could have a suid helper 
program that you run in one session that changes the namespace and then invokes 
a bash shell in that namespace that you then run unmodified stuff in.


it seems like there should be a way for a root program to change the namespace 
of another, but I'm not finding it at the moment.


There is the nsenter program that will run a program inside an existing 
namespace. It looks like you need something similar that implements some 
permission checking (only let you go into namespaces of other programs for the 
same user or similar), but you should be able to make proof-of-concept scripts 
with nsenter.


David Lang


Re: Variant symlink filesystem

2016-03-11 Thread David Lang

On Sat, 12 Mar 2016, Cole wrote:


On 12 March 2016 at 00:24, Al Viro  wrote:

On Sat, Mar 12, 2016 at 12:03:11AM +0200, Cole wrote:


This was one of the first solutions we looked at, and using various
namespaces. However we would like to be able to have multiple terminal
sessions open, and be able to have each session using a different
mount point, or be able to use the other terminals mount point, i.e.
switching the mount point to that of the other terminals. We would
also like the shell to be able to make use of these, and use shell
commands such as 'ls'.

When we originally looked at namespaces and containers, we could not
find a solution to achieve the above. Is this possible using
namespaces?


I'd try to look at setns(2) if you want processes joinging existing namespaces.
I'm afraid that I'll need to get some sleep before I'll be up to asking
the right questions for figuring out what requirements do you have and
what's the best way to do it - after a while coffee stops being efficient
and I'm already several hours past that ;-/



Sure, not a problem, when you have time to reply I will gladly welcome
any feed back.

As for the usage, I'll explain it a bit so that you have something to
work off of when you get a chance to read it.

The problem we encountered with namespaces when we looked at it more
than a year ago was 'how do you get the shell' to join them, or into
one. And also how do you move the shell in one terminal session into a
namespace that another shell is currently in. We wanted a solution
that doesn't require modifying existing programs to make them
namespace aware. However, as I said, this was more than a year ago
that we looked at it, and we could easily have misunderstood
something, or not understood the full functionality available. If you
say this is possible, without modifying programs such as bash, could
you please point me in the direction of the documentation describing
this, and I will try to educate myself.


looking at the setns() function, it seems like you could have a suid helper 
program that you run in one session that changes the namespace and then invokes 
a bash shell in that namespace that you then run unmodified stuff in.


it seems like there should be a way for a root program to change the namespace 
of another, but I'm not finding it at the moment.


There is the nsenter program that will run a program inside an existing 
namespace. It looks like you need something similar that implements some 
permission checking (only let you go into namespaces of other programs for the 
same user or similar), but you should be able to make proof-of-concept scripts 
with nsenter.


David Lang


Re: Variant symlink filesystem

2016-03-11 Thread David Lang

On Sat, 12 Mar 2016, Cole wrote:


On 11 March 2016 at 23:51, Al Viro <v...@zeniv.linux.org.uk> wrote:

On Fri, Mar 11, 2016 at 10:52:52PM +0200, Cole wrote:


The implementation doesn't necessarily have to continue to work with
env variables. On FreeBSD, the variant symlinks function by using
variables stored in kernel memory, and have a hierarchical lookup,
starting with user defined values and terminating with global entries.
I am not aware of such functionality existing on linux, but if someone
could point me at something similar to that, I would much prefer to
use that, as there are issues with variables that are exported or
modified during process execution.


Put your processes into a separate namespace and use mount --bind in it...


This was one of the first solutions we looked at, and using various
namespaces. However we would like to be able to have multiple terminal
sessions open, and be able to have each session using a different
mount point, or be able to use the other terminals mount point, i.e.
switching the mount point to that of the other terminals. We would
also like the shell to be able to make use of these, and use shell
commands such as 'ls'.


you should be able to have multiple sessions using the same namespace. There is 
the lwn.net series on namespaces at https://lwn.net/Articles/531114/


from what I'm looking at, this should be possible with the right mount options. 
It's not as trivial as setting an environment variable, but if it's all 
scripted, that shouldn't matter to the user.


you would need to use the setns() call to have one session join an existing 
namespace rather than creating a new one.


now, changing namespaces does require CAP_SYS_ADMIN, so if you are not running 
things as root, you may need to create a small daemon to run as root that 
reassigns your different sessions from one ns to another.


David Lang


When we originally looked at namespaces and containers, we could not
find a solution to achieve the above. Is this possible using
namespaces?

Regards
/Cole



Re: Variant symlink filesystem

2016-03-11 Thread David Lang

On Sat, 12 Mar 2016, Cole wrote:


On 11 March 2016 at 23:51, Al Viro  wrote:

On Fri, Mar 11, 2016 at 10:52:52PM +0200, Cole wrote:


The implementation doesn't necessarily have to continue to work with
env variables. On FreeBSD, the variant symlinks function by using
variables stored in kernel memory, and have a hierarchical lookup,
starting with user defined values and terminating with global entries.
I am not aware of such functionality existing on linux, but if someone
could point me at something similar to that, I would much prefer to
use that, as there are issues with variables that are exported or
modified during process execution.


Put your processes into a separate namespace and use mount --bind in it...


This was one of the first solutions we looked at, and using various
namespaces. However we would like to be able to have multiple terminal
sessions open, and be able to have each session using a different
mount point, or be able to use the other terminals mount point, i.e.
switching the mount point to that of the other terminals. We would
also like the shell to be able to make use of these, and use shell
commands such as 'ls'.


you should be able to have multiple sessions using the same namespace. There is 
the lwn.net series on namespaces at https://lwn.net/Articles/531114/


from what I'm looking at, this should be possible with the right mount options. 
It's not as trivial as setting an environment variable, but if it's all 
scripted, that shouldn't matter to the user.


you would need to use the setns() call to have one session join an existing 
namespace rather than creating a new one.


now, changing namespaces does require CAP_SYS_ADMIN, so if you are not running 
things as root, you may need to create a small daemon to run as root that 
reassigns your different sessions from one ns to another.


David Lang


When we originally looked at namespaces and containers, we could not
find a solution to achieve the above. Is this possible using
namespaces?

Regards
/Cole



Re: Variant symlink filesystem

2016-03-11 Thread David Lang

On Fri, 11 Mar 2016, Cole wrote:


On 11 March 2016 at 22:24, Richard Weinberger <rich...@nod.at> wrote:

Am 11.03.2016 um 21:22 schrieb Cole:

If I remember correctly, when we were testing the fuse version, we hard coded
the path to see if that solved the problem, and the difference between
the env lookup
code and the hard coded path was almost the same, but substantially slower than
the native file system.


And where exactly as the performance problem?

Anyway, if you submit your filesystem also provide a decent use case for it. :-)


Thank you, I will do so. One example as a use case could be to allow
for multiple
package repositories to exist on a single computer, all in different
locations, but with
a fixed path so as not to break the package manager, the correct
repository then is
selected based on ENV variable. That way each user could have their own packages
installed that would be separate from the system packages, and no
collisions would
occur.


why would this not be a case to use filesystem namespaces and bind mounts?

David Lang


Re: Variant symlink filesystem

2016-03-11 Thread David Lang

On Fri, 11 Mar 2016, Cole wrote:


On 11 March 2016 at 22:24, Richard Weinberger  wrote:

Am 11.03.2016 um 21:22 schrieb Cole:

If I remember correctly, when we were testing the fuse version, we hard coded
the path to see if that solved the problem, and the difference between
the env lookup
code and the hard coded path was almost the same, but substantially slower than
the native file system.


And where exactly as the performance problem?

Anyway, if you submit your filesystem also provide a decent use case for it. :-)


Thank you, I will do so. One example as a use case could be to allow
for multiple
package repositories to exist on a single computer, all in different
locations, but with
a fixed path so as not to break the package manager, the correct
repository then is
selected based on ENV variable. That way each user could have their own packages
installed that would be separate from the system packages, and no
collisions would
occur.


why would this not be a case to use filesystem namespaces and bind mounts?

David Lang


Re: [PATCH 00/42] ACPICA: 20151218 Release

2016-01-02 Thread David Lang
what is ACPICA and why should we care about divergence between it and the linux 
upstream? Where is it to be found?


This may be common knowlege to many people, but it should probably be documented 
in the patch bundle and it's explination.


David Lang

On Tue, 29 Dec 2015, Lv Zheng wrote:


Date: Tue, 29 Dec 2015 13:52:19 +0800
From: Lv Zheng 
To: Rafael J. Wysocki ,
Len Brown 
Cc: Lv Zheng , Lv Zheng ,
linux-kernel@vger.kernel.org, linux-a...@vger.kernel.org
Subject: [PATCH 00/42] ACPICA: 20151218 Release

The 20151218 ACPICA kernel-resident subsystem updates are linuxized based
on the linux-pm/linux-next branch.

The patchset has passed the following build/boot tests.
Build tests are performed as follows:
1. i386 + allyes
2. i386 + allno
3. i386 + default + ACPI_DEBUGGER=y
4. i386 + default + ACPI_DEBUGGER=n + ACPI_DEBUG=y
5. i386 + default + ACPI_DEBUG=n + ACPI=y
6. i386 + default + ACPI=n
7. x86_64 + allyes
8. x86_64 + allno
9. x86_64 + default + ACPI_DEBUGGER=y
10.x86_64 + default + ACPI_DEBUGGER=n + ACPI_DEBUG=y
11.x86_64 + default + ACPI_DEBUG=n + ACPI=y
12.x86_64 + default + ACPI=n
Boot tests are performed as follows:
1. i386 + default + ACPI_DEBUGGER=y
2. x86_64 + default + ACPI_DEBUGGER=y
Where:
1. i386: machine named as "Dell Inspiron Mini 1010"
2. x86_64: machine named as "HP Compaq 8200 Elite SFF PC"
3. default: kernel configuration with following items enabled:
  All hardware drivers related to the machines of i386/x86_64
  All "drivers/acpi" configurations
  All "drivers/platform" drivers
  All other drivers that link the APIs provided by ACPICA subsystem

The divergences checking result:
Before applying (20150930 Release):
 517 lines
After applying (20151218 Release):
 506 lines

Bob Moore (25):
 ACPICA: exmutex: General cleanup, restructured some code
 ACPICA: Core: Major update for code formatting, no functional changes
 ACPICA: Split interpreter tracing functions to a new file
 ACPICA: acpiexec: Add support for AML files containing multiple
   tables
 ACPICA: Disassembler/tools: Support for multiple ACPI tables in one
   file
 ACPICA: iasl/acpiexec: Update input file handling and verification
 ACPICA: Revert "acpi_get_object_info: Add support for ACPI 5.0 _SUB
   method."
 ACPICA: Add comment explaining _SUB removal
 ACPICA: acpiexec/acpinames: Update for error checking macros
 ACPICA: Concatenate operator: Add extensions to support all ACPI
   objects
 ACPICA: Debug Object: Cleanup output
 ACPICA: Debug object: Fix output for a NULL object
 ACPICA: Update for output of the Debug Object
 ACPICA: getopt: Comment update, no functional change
 ACPICA: Add new exception code, AE_IO_ERROR
 ACPICA: iasl/Disassembler: Support ASL ElseIf operator
 ACPICA: Parser: Add constants for internal namepath function
 ACPICA: Parser: Fix for SuperName method invocation
 ACPICA: Update parameter type for ObjectType operator
 ACPICA: Update internal #defines for ObjectType operator. No
   functional change
 ACPICA: Update for CondRefOf and RefOf operators
 ACPICA: Cleanup code related to the per-table module level
   improvement
 ACPICA: Add "root node" case to the ACPI name repair code
 ACPICA: Add per-table execution of module-level code
 ACPICA: Update version to 20151218

Colin Ian King (1):
 ACPICA: Tools: Add spacing and missing options in acpibin tool

David E. Box (1):
 ACPICA: Fix SyncLevel support interaction with method
   auto-serialization

LABBE Corentin (1):
 ACPICA: Add "const" to some functions that return fixed strings

Lv Zheng (12):
 ACPICA: Linuxize: reduce divergences for 20151218 release
 ACPICA: Namespace: Fix wrong error log
 ACPICA: Debugger: reduce old external path format
 ACPICA: Namespace: Add scope information to the simple object repair
   mechanism
 ACPICA: Namespace: Add String -> ObjectReference conversion support
 ACPICA: Events: Deploys acpi_ev_find_region_handler()
 ACPICA: Events: Uses common_notify for address space handlers
 ACPICA: Utilities: Reorder initialization code
 ACPICA: Events: Fix an issue that region object is re-attached to
   another scope when it is already attached
 ACPICA: Events: Split acpi_ev_associate_reg_method() from region
   initialization code
 ACPICA: Events: Enhance acpi_ev_execute_reg_method() to ensure no
   _REG evaluations can happen during OS early boot stages
 ACPICA: Events: Introduce ACPI_REG_DISCONNECT invocation to
   acpi_ev_execute_reg_methods()

Markus Elfring (1):
 ACPICA: Debugger: Remove some unecessary NULL checks

Prarit Bhargava (1):
 ACPICA: acpi_get_sleep_type_data: Reduce warnings

drivers/acpi/acpica/Makefile   |4 +-
drivers/acpi/acpica/acapps.h   |   58 +-
drivers/acpi/acpica/acdebug.h  |5 +-
drivers/acpi/acpica/acevents.h |   11 +-
drivers/acpi/acpica/acglobal.h |3 +-
drivers/acpi/acpica/aclocal.h

Re: [PATCH 00/42] ACPICA: 20151218 Release

2016-01-02 Thread David Lang
what is ACPICA and why should we care about divergence between it and the linux 
upstream? Where is it to be found?


This may be common knowlege to many people, but it should probably be documented 
in the patch bundle and it's explination.


David Lang

On Tue, 29 Dec 2015, Lv Zheng wrote:


Date: Tue, 29 Dec 2015 13:52:19 +0800
From: Lv Zheng <lv.zh...@intel.com>
To: Rafael J. Wysocki <rafael.j.wyso...@intel.com>,
Len Brown <len.br...@intel.com>
Cc: Lv Zheng <lv.zh...@intel.com>, Lv Zheng <zeta...@gmail.com>,
linux-kernel@vger.kernel.org, linux-a...@vger.kernel.org
Subject: [PATCH 00/42] ACPICA: 20151218 Release

The 20151218 ACPICA kernel-resident subsystem updates are linuxized based
on the linux-pm/linux-next branch.

The patchset has passed the following build/boot tests.
Build tests are performed as follows:
1. i386 + allyes
2. i386 + allno
3. i386 + default + ACPI_DEBUGGER=y
4. i386 + default + ACPI_DEBUGGER=n + ACPI_DEBUG=y
5. i386 + default + ACPI_DEBUG=n + ACPI=y
6. i386 + default + ACPI=n
7. x86_64 + allyes
8. x86_64 + allno
9. x86_64 + default + ACPI_DEBUGGER=y
10.x86_64 + default + ACPI_DEBUGGER=n + ACPI_DEBUG=y
11.x86_64 + default + ACPI_DEBUG=n + ACPI=y
12.x86_64 + default + ACPI=n
Boot tests are performed as follows:
1. i386 + default + ACPI_DEBUGGER=y
2. x86_64 + default + ACPI_DEBUGGER=y
Where:
1. i386: machine named as "Dell Inspiron Mini 1010"
2. x86_64: machine named as "HP Compaq 8200 Elite SFF PC"
3. default: kernel configuration with following items enabled:
  All hardware drivers related to the machines of i386/x86_64
  All "drivers/acpi" configurations
  All "drivers/platform" drivers
  All other drivers that link the APIs provided by ACPICA subsystem

The divergences checking result:
Before applying (20150930 Release):
 517 lines
After applying (20151218 Release):
 506 lines

Bob Moore (25):
 ACPICA: exmutex: General cleanup, restructured some code
 ACPICA: Core: Major update for code formatting, no functional changes
 ACPICA: Split interpreter tracing functions to a new file
 ACPICA: acpiexec: Add support for AML files containing multiple
   tables
 ACPICA: Disassembler/tools: Support for multiple ACPI tables in one
   file
 ACPICA: iasl/acpiexec: Update input file handling and verification
 ACPICA: Revert "acpi_get_object_info: Add support for ACPI 5.0 _SUB
   method."
 ACPICA: Add comment explaining _SUB removal
 ACPICA: acpiexec/acpinames: Update for error checking macros
 ACPICA: Concatenate operator: Add extensions to support all ACPI
   objects
 ACPICA: Debug Object: Cleanup output
 ACPICA: Debug object: Fix output for a NULL object
 ACPICA: Update for output of the Debug Object
 ACPICA: getopt: Comment update, no functional change
 ACPICA: Add new exception code, AE_IO_ERROR
 ACPICA: iasl/Disassembler: Support ASL ElseIf operator
 ACPICA: Parser: Add constants for internal namepath function
 ACPICA: Parser: Fix for SuperName method invocation
 ACPICA: Update parameter type for ObjectType operator
 ACPICA: Update internal #defines for ObjectType operator. No
   functional change
 ACPICA: Update for CondRefOf and RefOf operators
 ACPICA: Cleanup code related to the per-table module level
   improvement
 ACPICA: Add "root node" case to the ACPI name repair code
 ACPICA: Add per-table execution of module-level code
 ACPICA: Update version to 20151218

Colin Ian King (1):
 ACPICA: Tools: Add spacing and missing options in acpibin tool

David E. Box (1):
 ACPICA: Fix SyncLevel support interaction with method
   auto-serialization

LABBE Corentin (1):
 ACPICA: Add "const" to some functions that return fixed strings

Lv Zheng (12):
 ACPICA: Linuxize: reduce divergences for 20151218 release
 ACPICA: Namespace: Fix wrong error log
 ACPICA: Debugger: reduce old external path format
 ACPICA: Namespace: Add scope information to the simple object repair
   mechanism
 ACPICA: Namespace: Add String -> ObjectReference conversion support
 ACPICA: Events: Deploys acpi_ev_find_region_handler()
 ACPICA: Events: Uses common_notify for address space handlers
 ACPICA: Utilities: Reorder initialization code
 ACPICA: Events: Fix an issue that region object is re-attached to
   another scope when it is already attached
 ACPICA: Events: Split acpi_ev_associate_reg_method() from region
   initialization code
 ACPICA: Events: Enhance acpi_ev_execute_reg_method() to ensure no
   _REG evaluations can happen during OS early boot stages
 ACPICA: Events: Introduce ACPI_REG_DISCONNECT invocation to
   acpi_ev_execute_reg_methods()

Markus Elfring (1):
 ACPICA: Debugger: Remove some unecessary NULL checks

Prarit Bhargava (1):
 ACPICA: acpi_get_sleep_type_data: Reduce warnings

drivers/acpi/acpica/Makefile   |4 +-
drivers/acpi/acpica/acapps.h   |   58 +-
drivers/acpi/acpica/acdebug.h  |5 +-
drivers/acpi/acpica/acevents.h

Re: kdbus: to merge or not to merge?

2015-08-09 Thread David Lang

On Sun, 9 Aug 2015, Greg Kroah-Hartman wrote:


The issue is with userspace clients opting in to receive all
NameOwnerChanged messages on the bus, which is not a good idea as they
constantly get woken up and process them, which is why the CPU was
pegged.  This issue should now be fixed in Rawhide for some of the
packages we found that were doing this. Maintainers of other packages
have been informed.  End result, no one has ever really tested sending
"bad" messages to the current system as all existing dbus users try to
be "good actors", thanks to Andy's testing, these apps should all now
become much more robust.


Does it require elevated privileges to opt to receive all NameOwnerChanged 
messages on the bus? Is it the default unless the apps opt for something more 
restrictive? or is it somewhere in between?


I was under the impression that the days of writing system-level stuff that 
assumes that all userspace apps are going to 'play nice' went out a decade or 
more ago. It's fine if the userspace app can kill itself, or possibly even the 
user it's running as, but being able to kill apps running as other users, let 
alone the whole system is a problem nowdays.


It may be able to happen in a default system, but this is why cgroups and 
namespaces have been created, to give the system admin the ability to limit the 
resources that any one app can consume. Introducing a new mechanism that allows 
one user to consume resources allocated to another and kill the system without 
providing a kernel level mechanism to limit the damage (as opposed to fixing 
individual apps) seems rather short-sighted at best.


David Lang


Re: kdbus: to merge or not to merge?

2015-08-09 Thread David Lang

On Sun, 9 Aug 2015, Greg Kroah-Hartman wrote:


The issue is with userspace clients opting in to receive all
NameOwnerChanged messages on the bus, which is not a good idea as they
constantly get woken up and process them, which is why the CPU was
pegged.  This issue should now be fixed in Rawhide for some of the
packages we found that were doing this. Maintainers of other packages
have been informed.  End result, no one has ever really tested sending
bad messages to the current system as all existing dbus users try to
be good actors, thanks to Andy's testing, these apps should all now
become much more robust.


Does it require elevated privileges to opt to receive all NameOwnerChanged 
messages on the bus? Is it the default unless the apps opt for something more 
restrictive? or is it somewhere in between?


I was under the impression that the days of writing system-level stuff that 
assumes that all userspace apps are going to 'play nice' went out a decade or 
more ago. It's fine if the userspace app can kill itself, or possibly even the 
user it's running as, but being able to kill apps running as other users, let 
alone the whole system is a problem nowdays.


It may be able to happen in a default system, but this is why cgroups and 
namespaces have been created, to give the system admin the ability to limit the 
resources that any one app can consume. Introducing a new mechanism that allows 
one user to consume resources allocated to another and kill the system without 
providing a kernel level mechanism to limit the damage (as opposed to fixing 
individual apps) seems rather short-sighted at best.


David Lang


Re: [FYI] tux3: Core changes

2015-07-31 Thread David Lang

On Fri, 31 Jul 2015, Daniel Phillips wrote:


On Friday, July 31, 2015 11:29:51 AM PDT, David Lang wrote:
We, the Linux Community have less tolerance for losing people's data and 
preventing them from operating than we used to when it was all tinkerer's 
personal data and secondary systems.


So rather than pushing optimizations out to everyone and seeing what 
breaks, we now do more testing and checking for failures before pushing 
things out.


By the way, I am curious about whose data you think will get lost
as a result of pushing out Tux3 with a possible theoretical bug
in a wildly improbable scenario that has not actually been
described with sufficient specificity to falsify, let alone
demonstrated.


you weren't asking about any particular feature of Tux, you were asking if we 
were still willing to push out stuff that breaks for users and fix it later.


Especially for filesystems that can loose the data of whoever is using it, the 
answer seems to be a clear no.


there may be bugs in what's pushed out that we don't know about. But we don't 
push out potential data corruption bugs that we do know about (or think we do)


so if you think this should be pushed out with this known corner case that's not 
handled properly, you have to convince people that it's _so_ improbable that 
they shouldn't care about it.


David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [FYI] tux3: Core changes

2015-07-31 Thread David Lang

On Fri, 31 Jul 2015, Daniel Phillips wrote:


Subject: Re: [FYI] tux3: Core changes

On Friday, July 31, 2015 8:37:35 AM PDT, Raymond Jennings wrote:

Returning ENOSPC when you have free space you can't yet prove is safer than
not returning it and risking a data loss when you get hit by a write/commit
storm. :)


Remember when delayed allocation was scary and unproven, because proving
that ENOSPC will always be returned when needed is extremely difficult?
But the performance advantage was compelling, so we just worked at it
until it worked. There were times when it didn't work properly, but the
code was in the tree so it got fixed.

It's like that now with page forking - a new technique with compelling
advantages, and some challenges. In the past, we (the Linux community)
would rise to the challenge and err on the side of pushing optimizations
in early. That was our mojo, and that is how Linux became the dominant
operating system it is today. Do we, the Linux community, still have that
mojo?


We, the Linux Community have less tolerance for losing people's data and 
preventing them from operating than we used to when it was all tinkerer's 
personal data and secondary systems.


So rather than pushing optimizations out to everyone and seeing what breaks, we 
now do more testing and checking for failures before pushing things out.


This means that when something new is introduced, we default to the safe, 
slightly slower way initially (there will be enough other bugs to deal with in 
any case), and then as we gain experience from the tinkerers enabling the 
performance optimizations, we make those optimizations reliable and only then 
push them out to all users.


If you define this as "loosing our mojo", then yes we have. But most people see 
the pace of development as still being high, just with more testing and 
polishing before it gets out to users.


David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [FYI] tux3: Core changes

2015-07-31 Thread David Lang

On Fri, 31 Jul 2015, Daniel Phillips wrote:


Subject: Re: [FYI] tux3: Core changes

On Friday, July 31, 2015 8:37:35 AM PDT, Raymond Jennings wrote:

Returning ENOSPC when you have free space you can't yet prove is safer than
not returning it and risking a data loss when you get hit by a write/commit
storm. :)


Remember when delayed allocation was scary and unproven, because proving
that ENOSPC will always be returned when needed is extremely difficult?
But the performance advantage was compelling, so we just worked at it
until it worked. There were times when it didn't work properly, but the
code was in the tree so it got fixed.

It's like that now with page forking - a new technique with compelling
advantages, and some challenges. In the past, we (the Linux community)
would rise to the challenge and err on the side of pushing optimizations
in early. That was our mojo, and that is how Linux became the dominant
operating system it is today. Do we, the Linux community, still have that
mojo?


We, the Linux Community have less tolerance for losing people's data and 
preventing them from operating than we used to when it was all tinkerer's 
personal data and secondary systems.


So rather than pushing optimizations out to everyone and seeing what breaks, we 
now do more testing and checking for failures before pushing things out.


This means that when something new is introduced, we default to the safe, 
slightly slower way initially (there will be enough other bugs to deal with in 
any case), and then as we gain experience from the tinkerers enabling the 
performance optimizations, we make those optimizations reliable and only then 
push them out to all users.


If you define this as loosing our mojo, then yes we have. But most people see 
the pace of development as still being high, just with more testing and 
polishing before it gets out to users.


David Lang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [FYI] tux3: Core changes

2015-07-31 Thread David Lang

On Fri, 31 Jul 2015, Daniel Phillips wrote:


On Friday, July 31, 2015 11:29:51 AM PDT, David Lang wrote:
We, the Linux Community have less tolerance for losing people's data and 
preventing them from operating than we used to when it was all tinkerer's 
personal data and secondary systems.


So rather than pushing optimizations out to everyone and seeing what 
breaks, we now do more testing and checking for failures before pushing 
things out.


By the way, I am curious about whose data you think will get lost
as a result of pushing out Tux3 with a possible theoretical bug
in a wildly improbable scenario that has not actually been
described with sufficient specificity to falsify, let alone
demonstrated.


you weren't asking about any particular feature of Tux, you were asking if we 
were still willing to push out stuff that breaks for users and fix it later.


Especially for filesystems that can loose the data of whoever is using it, the 
answer seems to be a clear no.


there may be bugs in what's pushed out that we don't know about. But we don't 
push out potential data corruption bugs that we do know about (or think we do)


so if you think this should be pushed out with this known corner case that's not 
handled properly, you have to convince people that it's _so_ improbable that 
they shouldn't care about it.


David Lang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kdbus: to merge or not to merge?

2015-06-25 Thread David Lang

On Wed, 24 Jun 2015, Greg KH wrote:


On Wed, Jun 24, 2015 at 10:39:52AM -0700, David Lang wrote:

On Wed, 24 Jun 2015, Ingo Molnar wrote:


And the thing is, in hindsight, after such huge flamewars, years down the line,
almost never do I see the following question asked: 'what were we thinking 
merging
that crap??'. If any question arises it's usually along the lines of: 'what was
the big fuss about?'. So I think by and large the process works.


counterexamples, devfs, tux


Don't knock devfs.  It created a lot of things that we take for granted
now with our development model.  Off the top of my head, here's a short
list:
- it showed that we can't arbritrary make user/kernel api
  changes without working with people outside of the kernel
  developer community, and expect people to follow them
- the idea was sound, but the implementation was not, it had
  unfixable problems, so to fix those problems, we came up with
  better, kernel-wide solutions, forcing us to unify all
  device/driver subsystems.
- we were forced to try to document our user/kernel apis better,
  hence Documentation/ABI/ was created
- to remove devfs, we had to create a structure of _how_ to
  remove features.  It took me 2-3 years to be able to finally
  delete the devfs code, as the infrastructure and feedback
  loops were just not in place before then to allow that to
  happen.

So I would strongly argue that merging devfs was a good thing, it
spurned a lot of us to get the job done correctly.  Without it, we would
have never seen the need, or had the knowledge of what needed to be
done.


I don't disagree with you, but it was definantly a case of adding something that 
was later regretted and removed. A lot was learned in the process, but that 
wasn't the issue I was referring to.


I don't want kdbus to end up the same way. The more I think back to those 
discussions, the more parallels I see between the two.


David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kdbus: to merge or not to merge?

2015-06-25 Thread David Lang

On Wed, 24 Jun 2015, Greg KH wrote:


On Wed, Jun 24, 2015 at 10:39:52AM -0700, David Lang wrote:

On Wed, 24 Jun 2015, Ingo Molnar wrote:


And the thing is, in hindsight, after such huge flamewars, years down the line,
almost never do I see the following question asked: 'what were we thinking 
merging
that crap??'. If any question arises it's usually along the lines of: 'what was
the big fuss about?'. So I think by and large the process works.


counterexamples, devfs, tux


Don't knock devfs.  It created a lot of things that we take for granted
now with our development model.  Off the top of my head, here's a short
list:
- it showed that we can't arbritrary make user/kernel api
  changes without working with people outside of the kernel
  developer community, and expect people to follow them
- the idea was sound, but the implementation was not, it had
  unfixable problems, so to fix those problems, we came up with
  better, kernel-wide solutions, forcing us to unify all
  device/driver subsystems.
- we were forced to try to document our user/kernel apis better,
  hence Documentation/ABI/ was created
- to remove devfs, we had to create a structure of _how_ to
  remove features.  It took me 2-3 years to be able to finally
  delete the devfs code, as the infrastructure and feedback
  loops were just not in place before then to allow that to
  happen.

So I would strongly argue that merging devfs was a good thing, it
spurned a lot of us to get the job done correctly.  Without it, we would
have never seen the need, or had the knowledge of what needed to be
done.


I don't disagree with you, but it was definantly a case of adding something that 
was later regretted and removed. A lot was learned in the process, but that 
wasn't the issue I was referring to.


I don't want kdbus to end up the same way. The more I think back to those 
discussions, the more parallels I see between the two.


David Lang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kdbus: to merge or not to merge?

2015-06-24 Thread David Lang

On Wed, 24 Jun 2015, Martin Steigerwald wrote:


Am Mittwoch, 24. Juni 2015, 10:39:52 schrieb David Lang:

On Wed, 24 Jun 2015, Ingo Molnar wrote:

And the thing is, in hindsight, after such huge flamewars, years down
the line, almost never do I see the following question asked: 'what
were we thinking merging that crap??'. If any question arises it's
usually along the lines of: 'what was the big fuss about?'. So I think
by and large the process works.

counterexamples, devfs, tux


What was tux?


in-kernel webserver

David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kdbus: to merge or not to merge?

2015-06-24 Thread David Lang

On Wed, 24 Jun 2015, Ingo Molnar wrote:


And the thing is, in hindsight, after such huge flamewars, years down the line,
almost never do I see the following question asked: 'what were we thinking 
merging
that crap??'. If any question arises it's usually along the lines of: 'what was
the big fuss about?'. So I think by and large the process works.


counterexamples, devfs, tux

David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kdbus: to merge or not to merge?

2015-06-24 Thread David Lang

On Wed, 24 Jun 2015, Ingo Molnar wrote:


And the thing is, in hindsight, after such huge flamewars, years down the line,
almost never do I see the following question asked: 'what were we thinking 
merging
that crap??'. If any question arises it's usually along the lines of: 'what was
the big fuss about?'. So I think by and large the process works.


counterexamples, devfs, tux

David Lang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kdbus: to merge or not to merge?

2015-06-24 Thread David Lang

On Wed, 24 Jun 2015, Martin Steigerwald wrote:


Am Mittwoch, 24. Juni 2015, 10:39:52 schrieb David Lang:

On Wed, 24 Jun 2015, Ingo Molnar wrote:

And the thing is, in hindsight, after such huge flamewars, years down
the line, almost never do I see the following question asked: 'what
were we thinking merging that crap??'. If any question arises it's
usually along the lines of: 'what was the big fuss about?'. So I think
by and large the process works.

counterexamples, devfs, tux


What was tux?


in-kernel webserver

David Lang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: clustered MD

2015-06-09 Thread David Lang

On Tue, 9 Jun 2015, David Teigland wrote:


We do have a valid real world utility. It is to provide
high-availability of RAID1 storage  over the cluster. The
distributed locking is required only during cases of error and
superblock updates and is not required during normal operations,
which makes it fast enough for usual case scenarios.


That's the theory, how much evidence do you have of that in practice?


What are the doubts you have about it?


Before I begin reviewing the implementation, I'd like to better understand
what it is about the existing raid1 that doesn't work correctly for what
you'd like to do with it, i.e. I don't know what the problem is.


As I understand things, the problem is ~providing RAID across multiple machines, 
not just across the disks in one machine.


David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: clustered MD

2015-06-09 Thread David Lang

On Tue, 9 Jun 2015, David Teigland wrote:


We do have a valid real world utility. It is to provide
high-availability of RAID1 storage  over the cluster. The
distributed locking is required only during cases of error and
superblock updates and is not required during normal operations,
which makes it fast enough for usual case scenarios.


That's the theory, how much evidence do you have of that in practice?


What are the doubts you have about it?


Before I begin reviewing the implementation, I'd like to better understand
what it is about the existing raid1 that doesn't work correctly for what
you'd like to do with it, i.e. I don't know what the problem is.


As I understand things, the problem is ~providing RAID across multiple machines, 
not just across the disks in one machine.


David Lang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Device Tree Blob (DTB) licence

2015-05-29 Thread David Lang

On Fri, 29 May 2015, Enrico Weigelt, metux IT consult wrote:

Important Notice: This message may contain confidential or privileged 
information. It is intended only for the person it was addressed to. If you 
are not the intended recipient of this email you may not copy, forward, 
disclose or otherwise use it or any part of it in any form whatsoever. If 
you received this email in error please notify the sender by replying and 
delete this message and any attachments without retaining a copy.


P.S. some of us actually care about licenses being appropriate to what
they're applied to, and at least theoretically capable of being
honored. Your email footer may be very slightly undermining your
position here.


This is just a dumb auto-generated footer, coming from my client's
mail server over here ... I'm just too lazy for setting up an own
MTA on my workstation. You can safely ignore that.


Arguing license issues and at the same time claiming that you should ignore a 
legal statement like the footer is a bit odd.


David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Device Tree Blob (DTB) licence

2015-05-29 Thread David Lang

On Fri, 29 May 2015, Enrico Weigelt, metux IT consult wrote:


And why should they fear "poisoning" ?


Search for "GPL contamination", the problem is quite common, GPL
can turn anything GPL-compatible into GPL. So for a non-GPL project
it's very hard to adopt GPL code.


Yes, that's the whole purpose of the GPL. The deal is pretty simple:
if you take some GPL'ed software and change it, you'll have to publish
your changes under the same rules. For entirely separate entities
(eg. dedicated programs) that's not an big issue. And for libraries,
we have LGPL.

If the DTS license would be a problem, it would be worse w/ ACPI
and any proprietary firmware/BIOSes.


not true, with a proprietary bios it's a clear "pay this much money and don't 
worry about it" while with GPL there's a nagging fear that someone you never 
heard of may sue you a decade from now claiming you need to give them the source 
to your OS.


Is having the DTB GPL so impartant that you would rather let things fall into 
the windows trap ("well it booted windows, so it must be right") instead of 
allowing a proprietary OS to use your description of the hardware?


note, this whole discussion assumes that the DTB is even copyrightable. Since 
it's intended to be strictly a functional description of what the hardware is 
able to do, that could be questioned


David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Device Tree Blob (DTB) licence

2015-05-29 Thread David Lang

On Fri, 29 May 2015, Enrico Weigelt, metux IT consult wrote:

Important Notice: This message may contain confidential or privileged 
information. It is intended only for the person it was addressed to. If you 
are not the intended recipient of this email you may not copy, forward, 
disclose or otherwise use it or any part of it in any form whatsoever. If 
you received this email in error please notify the sender by replying and 
delete this message and any attachments without retaining a copy.


P.S. some of us actually care about licenses being appropriate to what
they're applied to, and at least theoretically capable of being
honored. Your email footer may be very slightly undermining your
position here.


This is just a dumb auto-generated footer, coming from my client's
mail server over here ... I'm just too lazy for setting up an own
MTA on my workstation. You can safely ignore that.


Arguing license issues and at the same time claiming that you should ignore a 
legal statement like the footer is a bit odd.


David Lang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Device Tree Blob (DTB) licence

2015-05-29 Thread David Lang

On Fri, 29 May 2015, Enrico Weigelt, metux IT consult wrote:


And why should they fear poisoning ?


Search for GPL contamination, the problem is quite common, GPL
can turn anything GPL-compatible into GPL. So for a non-GPL project
it's very hard to adopt GPL code.


Yes, that's the whole purpose of the GPL. The deal is pretty simple:
if you take some GPL'ed software and change it, you'll have to publish
your changes under the same rules. For entirely separate entities
(eg. dedicated programs) that's not an big issue. And for libraries,
we have LGPL.

If the DTS license would be a problem, it would be worse w/ ACPI
and any proprietary firmware/BIOSes.


not true, with a proprietary bios it's a clear pay this much money and don't 
worry about it while with GPL there's a nagging fear that someone you never 
heard of may sue you a decade from now claiming you need to give them the source 
to your OS.


Is having the DTB GPL so impartant that you would rather let things fall into 
the windows trap (well it booted windows, so it must be right) instead of 
allowing a proprietary OS to use your description of the hardware?


note, this whole discussion assumes that the DTB is even copyrightable. Since 
it's intended to be strictly a functional description of what the hardware is 
able to do, that could be questioned


David Lang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [FYI] tux3: Core changes

2015-05-26 Thread David Lang

On Mon, 25 May 2015, Daniel Phillips wrote:


On Monday, May 25, 2015 11:04:39 PM PDT, David Lang wrote:
if the page gets modified again, will that cause any issues? what if the 
page gets modified before the copy gets written out, so that there are two 
dirty copies of the page in the process of being written?


David Lang


How is the page going to get modified again? A forked page isn't
mapped by a pte, so userspace can't modify it by mmap. The forked
page is not in the page cache, so usespace can't modify it by
posix file ops. So the writer would have to be in kernel. Tux3
knows what it is doing, so it won't modify the page. What kernel
code besides Tux3 will modify the page?


I'm assuming that Rik is talking about whatever has the reference to the page 
via one of the methods that he talked about.


David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [FYI] tux3: Core changes

2015-05-26 Thread David Lang

On Mon, 25 May 2015, Daniel Phillips wrote:


On Monday, May 25, 2015 9:25:44 PM PDT, Rik van Riel wrote:

On 05/21/2015 03:53 PM, Daniel Phillips wrote:

On Wednesday, May 20, 2015 8:51:46 PM PDT, David Lang wrote:

how do you prevent it from continuing to interact with the old version
of the page and never see updates or have it's changes reflected on
the current page?


Why would it do that, and what would be surprising about it? Did
you have a specific case in mind?


After a get_page(), page_cache_get(), or other equivalent
function, a piece of code has the expectation that it can
continue using that page until after it has released the
reference count.

This can be an arbitrarily long period of time.


It is perfectly welcome to keep using that page as long as it
wants, Tux3 does not care. When it lets go of the last reference
(and Tux3 has finished with it) then the page is freeable. Did
you have a more specific example where this would be an issue?
Are you talking about kernel or userspace code?


if the page gets modified again, will that cause any issues? what if the page 
gets modified before the copy gets written out, so that there are two dirty 
copies of the page in the process of being written?


David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [FYI] tux3: Core changes

2015-05-26 Thread David Lang

On Mon, 25 May 2015, Daniel Phillips wrote:


On Monday, May 25, 2015 9:25:44 PM PDT, Rik van Riel wrote:

On 05/21/2015 03:53 PM, Daniel Phillips wrote:

On Wednesday, May 20, 2015 8:51:46 PM PDT, David Lang wrote:

how do you prevent it from continuing to interact with the old version
of the page and never see updates or have it's changes reflected on
the current page?


Why would it do that, and what would be surprising about it? Did
you have a specific case in mind?


After a get_page(), page_cache_get(), or other equivalent
function, a piece of code has the expectation that it can
continue using that page until after it has released the
reference count.

This can be an arbitrarily long period of time.


It is perfectly welcome to keep using that page as long as it
wants, Tux3 does not care. When it lets go of the last reference
(and Tux3 has finished with it) then the page is freeable. Did
you have a more specific example where this would be an issue?
Are you talking about kernel or userspace code?


if the page gets modified again, will that cause any issues? what if the page 
gets modified before the copy gets written out, so that there are two dirty 
copies of the page in the process of being written?


David Lang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [FYI] tux3: Core changes

2015-05-26 Thread David Lang

On Mon, 25 May 2015, Daniel Phillips wrote:


On Monday, May 25, 2015 11:04:39 PM PDT, David Lang wrote:
if the page gets modified again, will that cause any issues? what if the 
page gets modified before the copy gets written out, so that there are two 
dirty copies of the page in the process of being written?


David Lang


How is the page going to get modified again? A forked page isn't
mapped by a pte, so userspace can't modify it by mmap. The forked
page is not in the page cache, so usespace can't modify it by
posix file ops. So the writer would have to be in kernel. Tux3
knows what it is doing, so it won't modify the page. What kernel
code besides Tux3 will modify the page?


I'm assuming that Rik is talking about whatever has the reference to the page 
via one of the methods that he talked about.


David Lang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [FYI] tux3: Core changes

2015-05-20 Thread David Lang

On Wed, 20 May 2015, Daniel Phillips wrote:


On 05/20/2015 03:51 PM, Daniel Phillips wrote:

On 05/20/2015 12:53 PM, Rik van Riel wrote:

How does tux3 prevent a user of find_get_page() from reading from
or writing into the pre-COW page, instead of the current page?


Careful control of the dirty bits (we have two of them, one each
for front and back). That is what pagefork_for_blockdirty is about.


Ah, and of course it does not matter if a reader is on the
pre-cow page. It would be reading the earlier copy, which might
no longer be the current copy, but it raced with the write so
nobody should be surprised. That is a race even without page fork.


how do you prevent it from continuing to interact with the old version of the 
page and never see updates or have it's changes reflected on the current page?


David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [FYI] tux3: Core changes

2015-05-20 Thread David Lang

On Wed, 20 May 2015, Daniel Phillips wrote:


On 05/20/2015 07:44 AM, Jan Kara wrote:

  Yeah, that's what I meant. If you create a function which manipulates
page cache, you better make it work with other functions manipulating page
cache. Otherwise it's a landmine waiting to be tripped by some unsuspecting
developer. Sure you can document all the conditions under which the
function is safe to use but a function that has several paragraphs in front
of it explaning when it is safe to use isn't very good API...


Violent agreement, of course. To put it in concrete terms, each of
the page fork support functions must be examined and determined
sane. They are:

* cow_replace_page_cache
* cow_delete_from_page_cache
* cow_clone_page
* page_cow_one
* page_cow_file

Would it be useful to drill down into those, starting from the top
of the list?


It's a little more than determining that these 5 functions are sane, it's making 
sure that if someone mixes the use of these functions with other existing 
functions that the result is sane.


but it's probably a good starting point to look at each of these five functions 
in detail and consider how they work and could interact badly with other things 
touching the page cache.


David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [FYI] tux3: Core changes

2015-05-20 Thread David Lang

On Wed, 20 May 2015, Daniel Phillips wrote:


On 05/20/2015 07:44 AM, Jan Kara wrote:

  Yeah, that's what I meant. If you create a function which manipulates
page cache, you better make it work with other functions manipulating page
cache. Otherwise it's a landmine waiting to be tripped by some unsuspecting
developer. Sure you can document all the conditions under which the
function is safe to use but a function that has several paragraphs in front
of it explaning when it is safe to use isn't very good API...


Violent agreement, of course. To put it in concrete terms, each of
the page fork support functions must be examined and determined
sane. They are:

* cow_replace_page_cache
* cow_delete_from_page_cache
* cow_clone_page
* page_cow_one
* page_cow_file

Would it be useful to drill down into those, starting from the top
of the list?


It's a little more than determining that these 5 functions are sane, it's making 
sure that if someone mixes the use of these functions with other existing 
functions that the result is sane.


but it's probably a good starting point to look at each of these five functions 
in detail and consider how they work and could interact badly with other things 
touching the page cache.


David Lang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [FYI] tux3: Core changes

2015-05-20 Thread David Lang

On Wed, 20 May 2015, Daniel Phillips wrote:


On 05/20/2015 03:51 PM, Daniel Phillips wrote:

On 05/20/2015 12:53 PM, Rik van Riel wrote:

How does tux3 prevent a user of find_get_page() from reading from
or writing into the pre-COW page, instead of the current page?


Careful control of the dirty bits (we have two of them, one each
for front and back). That is what pagefork_for_blockdirty is about.


Ah, and of course it does not matter if a reader is on the
pre-cow page. It would be reading the earlier copy, which might
no longer be the current copy, but it raced with the write so
nobody should be surprised. That is a race even without page fork.


how do you prevent it from continuing to interact with the old version of the 
page and never see updates or have it's changes reflected on the current page?


David Lang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [FYI] tux3: Core changes

2015-05-19 Thread David Lang

On Tue, 19 May 2015, Daniel Phillips wrote:


I understand that Tux3 may avoid these issues due to some other mechanisms
it internally has but if page forking should get into mm subsystem, the
above must work.


It does work, and by example, it does not need a lot of code to make
it work, but the changes are not trivial. Tux3's delta writeback model
will not suit everyone, so you can't just lift our code and add it to
Ext4. Using it in Ext4 would require a per-inode writeback model, which
looks practical to me but far from a weekend project. Maybe something
to consider for Ext5.

It is the job of new designs like Tux3 to chase after that final drop
of performance, not our trusty Ext4 workhorse. Though stranger things
have happened - as I recall, Ext4 had O(n) directory operations at one
time. Fixing that was not easy, but we did it because we had to. Fixing
Ext4's write performance is not urgent by comparison, and the barrier
is high, you would want jbd3 for one thing.

I think the meta-question you are asking is, where is the second user
for this new CoW functionality? With a possible implication that if
there is no second user then Tux3 cannot be merged. Is that is the
question?


I don't think they are asking for a second user. What they are saying is that 
for this functionality to be accepted in the mm subsystem, these problem cases 
need to work reliably, not just work for Tux3 because of your implementation.


So for things that you don't use, you need to make it an error if they get used 
on a page that's been forked (or not be an error and 'do the right thing')


For cases where it doesn't matter because Tux3 controls the writeback, and it's 
undefined in general what happens if writeback is triggered twice on the same 
page, you will need to figure out how to either prevent the second writeback 
from triggering if there's one in process, or define how the two writebacks are 
going to happen so that you can't end up with them re-ordered by some other 
filesystem.


I think that that's what's meant by the top statement that I left in the quote. 
Even if your implementation details make it safe, these need to be safe even 
without your implementation details to be acceptable in the core kernel.


David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [FYI] tux3: Core changes

2015-05-19 Thread David Lang

On Tue, 19 May 2015, Daniel Phillips wrote:


I understand that Tux3 may avoid these issues due to some other mechanisms
it internally has but if page forking should get into mm subsystem, the
above must work.


It does work, and by example, it does not need a lot of code to make
it work, but the changes are not trivial. Tux3's delta writeback model
will not suit everyone, so you can't just lift our code and add it to
Ext4. Using it in Ext4 would require a per-inode writeback model, which
looks practical to me but far from a weekend project. Maybe something
to consider for Ext5.

It is the job of new designs like Tux3 to chase after that final drop
of performance, not our trusty Ext4 workhorse. Though stranger things
have happened - as I recall, Ext4 had O(n) directory operations at one
time. Fixing that was not easy, but we did it because we had to. Fixing
Ext4's write performance is not urgent by comparison, and the barrier
is high, you would want jbd3 for one thing.

I think the meta-question you are asking is, where is the second user
for this new CoW functionality? With a possible implication that if
there is no second user then Tux3 cannot be merged. Is that is the
question?


I don't think they are asking for a second user. What they are saying is that 
for this functionality to be accepted in the mm subsystem, these problem cases 
need to work reliably, not just work for Tux3 because of your implementation.


So for things that you don't use, you need to make it an error if they get used 
on a page that's been forked (or not be an error and 'do the right thing')


For cases where it doesn't matter because Tux3 controls the writeback, and it's 
undefined in general what happens if writeback is triggered twice on the same 
page, you will need to figure out how to either prevent the second writeback 
from triggering if there's one in process, or define how the two writebacks are 
going to happen so that you can't end up with them re-ordered by some other 
filesystem.


I think that that's what's meant by the top statement that I left in the quote. 
Even if your implementation details make it safe, these need to be safe even 
without your implementation details to be acceptable in the core kernel.


David Lang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [FYI] tux3: Core changes

2015-05-16 Thread David Lang

On Fri, 15 May 2015, Mel Gorman wrote:


On Fri, May 15, 2015 at 02:54:48AM -0700, Daniel Phillips wrote:



On 05/15/2015 01:09 AM, Mel Gorman wrote:

On Thu, May 14, 2015 at 11:06:22PM -0400, Rik van Riel wrote:

On 05/14/2015 08:06 PM, Daniel Phillips wrote:

The issue is that things like ptrace, AIO, infiniband
RDMA, and other direct memory access subsystems can take
a reference to page A, which Tux3 clones into a new page B
when the process writes it.

However, while the process now points at page B, ptrace,
AIO, infiniband, etc will still be pointing at page A.

This causes the process and the other subsystem to each
look at a different page, instead of at shared state,
causing ptrace to do nothing, AIO and RDMA data to be
invisible (or corrupted), etc...


Is this a bit like page migration?


Yes. Page migration will fail if there is an "extra"
reference to the page that is not accounted for by
the migration code.


When I said it's not like page migration, I was referring to the fact
that a COW on a pinned page for RDMA is a different problem to page
migration. The COW of a pinned page can lead to lost writes or
corruption depending on the ordering of events.


I see the lost writes case, but not the corruption case,


Data corruption can occur depending on the ordering of events and the
applications expectations. If a process starts IO, RDMA pins the page
for read and forks are combined with writes from another thread then when
the IO completes the reads may not be visible. The application may take
improper action at that point.


if tux3 forks the page and writes the copy while the original page is being 
modified by other things, this means that some of the changes won't be in the 
version written (and this could catch partial writes with 'interesting' results 
if the forking happens at the wrong time)


But if the original page gets re-marked as needing to be written out when it's 
changed by one of the other things that are accessing it, there shouldn't be any 
long-term corruption.


As far as short-term corruption goes, any time you have a page mmapped it could 
get written out at any time, with only some of the application changes applied 
to it, so this sort of corruption could happen anyway couldn't it?



Users of RDMA are typically expected to use MADV_DONTFORK to avoid this
class of problem.

You can choose to not define this as data corruption because thge kernel
is not directly involved and that's your call.


Do you
mean corruption by changing a page already in writeout? If so,
don't all filesystems have that problem?



No, the problem is different. Backing devices requiring stable pages will
block the write until the IO is complete. For those that do not require
stable pages it's ok to allow the write as long as the page is dirtied so
that it'll be written out again and no data is lost.


so if tux3 is prevented from forking the page in cases where the write would be 
blocked, and will get forked again for follow-up writes if it's modified again 
otherwise, won't this be the same thing?


David Lang


If RDMA to a mmapped file races with write(2) to the same file,
maybe it is reasonable and expected to lose some data.



In the RDMA case, there is at least application awareness to work around
the problems. Normally it's ok to have both mapped and write() access
to data although userspace might need a lock to co-ordinate updates and
event ordering.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [FYI] tux3: Core changes

2015-05-16 Thread David Lang

On Fri, 15 May 2015, Mel Gorman wrote:


On Fri, May 15, 2015 at 02:54:48AM -0700, Daniel Phillips wrote:



On 05/15/2015 01:09 AM, Mel Gorman wrote:

On Thu, May 14, 2015 at 11:06:22PM -0400, Rik van Riel wrote:

On 05/14/2015 08:06 PM, Daniel Phillips wrote:

The issue is that things like ptrace, AIO, infiniband
RDMA, and other direct memory access subsystems can take
a reference to page A, which Tux3 clones into a new page B
when the process writes it.

However, while the process now points at page B, ptrace,
AIO, infiniband, etc will still be pointing at page A.

This causes the process and the other subsystem to each
look at a different page, instead of at shared state,
causing ptrace to do nothing, AIO and RDMA data to be
invisible (or corrupted), etc...


Is this a bit like page migration?


Yes. Page migration will fail if there is an extra
reference to the page that is not accounted for by
the migration code.


When I said it's not like page migration, I was referring to the fact
that a COW on a pinned page for RDMA is a different problem to page
migration. The COW of a pinned page can lead to lost writes or
corruption depending on the ordering of events.


I see the lost writes case, but not the corruption case,


Data corruption can occur depending on the ordering of events and the
applications expectations. If a process starts IO, RDMA pins the page
for read and forks are combined with writes from another thread then when
the IO completes the reads may not be visible. The application may take
improper action at that point.


if tux3 forks the page and writes the copy while the original page is being 
modified by other things, this means that some of the changes won't be in the 
version written (and this could catch partial writes with 'interesting' results 
if the forking happens at the wrong time)


But if the original page gets re-marked as needing to be written out when it's 
changed by one of the other things that are accessing it, there shouldn't be any 
long-term corruption.


As far as short-term corruption goes, any time you have a page mmapped it could 
get written out at any time, with only some of the application changes applied 
to it, so this sort of corruption could happen anyway couldn't it?



Users of RDMA are typically expected to use MADV_DONTFORK to avoid this
class of problem.

You can choose to not define this as data corruption because thge kernel
is not directly involved and that's your call.


Do you
mean corruption by changing a page already in writeout? If so,
don't all filesystems have that problem?



No, the problem is different. Backing devices requiring stable pages will
block the write until the IO is complete. For those that do not require
stable pages it's ok to allow the write as long as the page is dirtied so
that it'll be written out again and no data is lost.


so if tux3 is prevented from forking the page in cases where the write would be 
blocked, and will get forked again for follow-up writes if it's modified again 
otherwise, won't this be the same thing?


David Lang


If RDMA to a mmapped file races with write(2) to the same file,
maybe it is reasonable and expected to lose some data.



In the RDMA case, there is at least application awareness to work around
the problems. Normally it's ok to have both mapped and write() access
to data although userspace might need a lock to co-ordinate updates and
event ordering.



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-12 Thread David Lang

On Tue, 12 May 2015, Daniel Phillips wrote:


On 05/12/2015 02:30 PM, David Lang wrote:

On Tue, 12 May 2015, Daniel Phillips wrote:

Phoronix published a headline that identifies Dave Chinner as
someone who takes shots at other projects. Seems pretty much on
the money to me, and it ought to be obvious why he does it.


Phoronix turns any correction or criticism into an attack.


Phoronix gets attacked in an unseemly way by a number of people
in the developer community who should behave better. You are
doing it yourself, seemingly oblivious to the valuable role that
the publication plays in our community. Google for filesystem
benchmarks. Where do you find them? Right. Not to mention the
Xorg coverage, community issues, etc etc. The last thing we
need is a monoculture in Linux news, and we are dangerously
close to that now.


It's on my 'sites to check daily' list, but they have also had some pretty nasty 
errors in their benchmarks, some of which have been pointed out repeatedly over 
the years (doing fsync dependent workloads in situations where one FS actually 
honors the fsyncs and another doesn't is a classic)



So, how is "EXT4 is not as stable or as well tested as most
people think" not a cheap shot? By my first hand experience,
that claim is absurd. Add to that the first hand experience
of roughly two billion other people. Seems to be a bit self
serving too, or was that just an accident.


I happen to think that it's correct. It's not that Ext4 isn't tested, but that 
people's expectations of how much it's been tested, and at what scale don't 
match the reality.



You need to get out of the mindset that Ted and Dave are Enemies that you need 
to overcome, they are
friendly competitors, not Enemies.


You are wrong about Dave These are not the words of any friend:

  "I don't think I'm alone in my suspicion that there was something
  stinky about your numbers." -- Dave Chinner


you are looking for offense. That just means that something is wrong with them, 
not that they were deliberatly falsified.



Basically allegations of cheating. And wrong. Maybe Dave just
lives in his own dreamworld where everybody is out to get him, so
he has to attack people he views as competitors first.


you are the one doing the attacking. Please stop. Take a break if needed, and 
then get back to producing software rather than complaining about how everyone 
is out to get you.


David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-12 Thread David Lang

On Tue, 12 May 2015, Daniel Phillips wrote:


On 05/12/2015 11:39 AM, David Lang wrote:

On Mon, 11 May 2015, Daniel Phillips wrote:

...it's the mm and core kernel developers that need to
review and accept that code *before* we can consider merging tux3.


Please do not say "we" when you know that I am just as much a "we"
as you are. Merging Tux3 is not your decision. The people whose
decision it actually is are perfectly capable of recognizing your
agenda for what it is.

  http://www.phoronix.com/scan.php?page=news_item=MTA0NzM
  "XFS Developer Takes Shots At Btrfs, EXT4"


umm, Phoronix has no input on what gets merged into the kernel. they also hae a 
reputation for
trying to turn anything into click-bait by making it sound like a fight when it 
isn't.


Perhaps you misunderstood. Linus decides what gets merged. Andrew
decides. Greg decides. Dave Chinner does not decide, he just does
his level best to create the impression that our project is unfit
to merge. Any chance there might be an agenda?

Phoronix published a headline that identifies Dave Chinner as
someone who takes shots at other projects. Seems pretty much on
the money to me, and it ought to be obvious why he does it.


Phoronix turns any correction or criticism into an attack.

You need to get out of the mindset that Ted and Dave are Enemies that you need 
to overcome, they are friendly competitors, not Enemies. They assume that you 
are working in good faith (but are inexperienced compared to them), and you need 
to assume that they are working in good faith. If they ever do resort to 
underhanded means to sabotage you, Linus and the other kernel developers will 
take action. But pointing out limits in your current implementation, problems in 
your benchmarks based on how they are run, and concepts that are going to be 
difficult to merge is not underhanded, it's exactly the type of assistance that 
you should be greatful for in friendly competition.


You were the one who started crowing about how badly XFS performed. Dave gave a 
long and detailed explination about the reasons for the differences, and showing 
benchmarks on other hardware that showed that XFS works very well there. That's 
not an attack on EXT4 (or Tux3), it's an explination.



The real question is, has the Linux development process become
so political and toxic that worthwhile projects fail to benefit
from supposed grassroots community support. You are the poster
child for that.


The linux development process is making code available, responding to concerns 
from the experts in
the community, and letting the code talk for itself.


Nice idea, but it isn't working. Did you let the code talk to you?
Right, you let the code talk to Dave Chinner, then you listen to
what Dave Chinner has to say about it. Any chance that there might
be some creative licence acting somewhere in that chain?


I have my own concerns about how things are going to work (I've voiced some of 
them), but no, I haven't tried running Tux3 because you say it's not ready yet.



There have been many people pushing code for inclusion that has not gotten into 
the kernel, or has
not been used by any distros after it's made it into the kernel, in spite of 
benchmarks being posted
that seem to show how wonderful the new code is. ReiserFS was one of the first, 
and part of what
tarnished it's reputation with many people was how much they were pushing the 
benchmarks that were
shown to be faulty (the one I remember most vividly was that the entire benchmark 
completed in <30
seconds, and they had the FS tuned to not start flushing data to disk for 30 
seconds, so the entire
'benchmark' ran out of ram without ever touching the disk)


You know what to do about checking for faulty benchmarks.


That requires that the code be readily available, which last I heard, Tux3 
wasn't. Has this been fixed?



So when Ted and Dave point out problems with the benchmark (the difference in 
behavior between a
single spinning disk, different partitions on the same disk, SSDs, and 
ramdisks), you would be
better off acknowledging them and if you can't adjust and re-run the 
benchmarks, don't start
attacking them as a result.


Ted and Dave failed to point out any actual problem with any
benchmark. They invented issues with benchmarks and promoted those
as FUD.


They pointed out problems with using ramdisk to simulate a SSD and huge 
differences between spinning rust and an SSD (or disk array). Those aren't FUD.



As Dave says above, it's not the other filesystem people you have to convince, 
it's the core VFS and
Memory Mangement folks you have to convince. You may need a little benchmarking 
to show that there
is a real advantage to be gained, but the real discussion is going to be on the 
impact that page
forking is going to have on everything else (both in complexity and in 
performance impact to other
things)


Yet he clearly wrote "we" as if he believes he is part of it.

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-12 Thread David Lang

On Mon, 11 May 2015, Daniel Phillips wrote:


On Monday, May 11, 2015 10:38:42 PM PDT, Dave Chinner wrote:

I think Ted and I are on the same page here. "Competitive
benchmarks" only matter to the people who are trying to sell
something. You're trying to sell Tux3, but


By "same page", do you mean "transparently obvious about
obstructing other projects"?


The "except page forking design" statement is your biggest hurdle
for getting tux3 merged, not performance.


No, the "except page forking design" is because the design is
already good and effective. The small adjustments needed in core
are well worth merging because the benefits are proved by benchmarks.
So benchmarks are key and will not stop just because you don't like
the attention they bring to XFS issues.


Without page forking, tux3
cannot be merged at all. But it's not filesystem developers you need
to convince about the merits of the page forking design and
implementation - it's the mm and core kernel developers that need to
review and accept that code *before* we can consider merging tux3.


Please do not say "we" when you know that I am just as much a "we"
as you are. Merging Tux3 is not your decision. The people whose
decision it actually is are perfectly capable of recognizing your
agenda for what it is.

  http://www.phoronix.com/scan.php?page=news_item=MTA0NzM
  "XFS Developer Takes Shots At Btrfs, EXT4"


umm, Phoronix has no input on what gets merged into the kernel. they also hae a 
reputation for trying to turn anything into click-bait by making it sound like a 
fight when it isn't.



The real question is, has the Linux development process become
so political and toxic that worthwhile projects fail to benefit
from supposed grassroots community support. You are the poster
child for that.


The linux development process is making code available, responding to concerns 
from the experts in the community, and letting the code talk for itself.


There have been many people pushing code for inclusion that has not gotten into 
the kernel, or has not been used by any distros after it's made it into the 
kernel, in spite of benchmarks being posted that seem to show how wonderful the 
new code is. ReiserFS was one of the first, and part of what tarnished it's 
reputation with many people was how much they were pushing the benchmarks that 
were shown to be faulty (the one I remember most vividly was that the entire 
benchmark completed in <30 seconds, and they had the FS tuned to not start 
flushing data to disk for 30 seconds, so the entire 'benchmark' ran out of ram 
without ever touching the disk)


So when Ted and Dave point out problems with the benchmark (the difference in 
behavior between a single spinning disk, different partitions on the same disk, 
SSDs, and ramdisks), you would be better off acknowledging them and if you can't 
adjust and re-run the benchmarks, don't start attacking them as a result.


As Dave says above, it's not the other filesystem people you have to convince, 
it's the core VFS and Memory Mangement folks you have to convince. You may need 
a little benchmarking to show that there is a real advantage to be gained, but 
the real discussion is going to be on the impact that page forking is going to 
have on everything else (both in complexity and in performance impact to other 
things)



IOWs, you need to focus on the important things needed to acheive
your stated goal of getting tux3 merged. New filesystems should be
faster than those based on 20-25 year old designs, so you don't need
to waste time trying to convince people that tux3, when complete,
will be fast.


You know that Tux3 is already fast. Not just that of course. It
has a higher standard of data integrity than your metadata-only
journalling filesystem and a small enough code base that it can
be reasonably expected to reach the quality expected of an
enterprise class filesystem, quite possibly before XFS gets
there.


We wouldn't expect anyone developing a new filesystem to believe any 
differently. If they didn't believe this, why would they be working on the 
filesystem instead of just using an existing filesystem.


The ugly reality is that everyone's early versions of their new filesystem looks 
really good. The problem is when they extend it to cover the corner cases and 
when it gets stressed by real-world (as opposed to benchmark) workloads. This 
isn't saying that you are wrong in your belief, just that you may not be right, 
and nobody will know until you are to a usable state and other people can start 
beating on it.


David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-12 Thread David Lang

On Mon, 11 May 2015, Daniel Phillips wrote:


On Monday, May 11, 2015 10:38:42 PM PDT, Dave Chinner wrote:

I think Ted and I are on the same page here. Competitive
benchmarks only matter to the people who are trying to sell
something. You're trying to sell Tux3, but


By same page, do you mean transparently obvious about
obstructing other projects?


The except page forking design statement is your biggest hurdle
for getting tux3 merged, not performance.


No, the except page forking design is because the design is
already good and effective. The small adjustments needed in core
are well worth merging because the benefits are proved by benchmarks.
So benchmarks are key and will not stop just because you don't like
the attention they bring to XFS issues.


Without page forking, tux3
cannot be merged at all. But it's not filesystem developers you need
to convince about the merits of the page forking design and
implementation - it's the mm and core kernel developers that need to
review and accept that code *before* we can consider merging tux3.


Please do not say we when you know that I am just as much a we
as you are. Merging Tux3 is not your decision. The people whose
decision it actually is are perfectly capable of recognizing your
agenda for what it is.

  http://www.phoronix.com/scan.php?page=news_itempx=MTA0NzM
  XFS Developer Takes Shots At Btrfs, EXT4


umm, Phoronix has no input on what gets merged into the kernel. they also hae a 
reputation for trying to turn anything into click-bait by making it sound like a 
fight when it isn't.



The real question is, has the Linux development process become
so political and toxic that worthwhile projects fail to benefit
from supposed grassroots community support. You are the poster
child for that.


The linux development process is making code available, responding to concerns 
from the experts in the community, and letting the code talk for itself.


There have been many people pushing code for inclusion that has not gotten into 
the kernel, or has not been used by any distros after it's made it into the 
kernel, in spite of benchmarks being posted that seem to show how wonderful the 
new code is. ReiserFS was one of the first, and part of what tarnished it's 
reputation with many people was how much they were pushing the benchmarks that 
were shown to be faulty (the one I remember most vividly was that the entire 
benchmark completed in 30 seconds, and they had the FS tuned to not start 
flushing data to disk for 30 seconds, so the entire 'benchmark' ran out of ram 
without ever touching the disk)


So when Ted and Dave point out problems with the benchmark (the difference in 
behavior between a single spinning disk, different partitions on the same disk, 
SSDs, and ramdisks), you would be better off acknowledging them and if you can't 
adjust and re-run the benchmarks, don't start attacking them as a result.


As Dave says above, it's not the other filesystem people you have to convince, 
it's the core VFS and Memory Mangement folks you have to convince. You may need 
a little benchmarking to show that there is a real advantage to be gained, but 
the real discussion is going to be on the impact that page forking is going to 
have on everything else (both in complexity and in performance impact to other 
things)



IOWs, you need to focus on the important things needed to acheive
your stated goal of getting tux3 merged. New filesystems should be
faster than those based on 20-25 year old designs, so you don't need
to waste time trying to convince people that tux3, when complete,
will be fast.


You know that Tux3 is already fast. Not just that of course. It
has a higher standard of data integrity than your metadata-only
journalling filesystem and a small enough code base that it can
be reasonably expected to reach the quality expected of an
enterprise class filesystem, quite possibly before XFS gets
there.


We wouldn't expect anyone developing a new filesystem to believe any 
differently. If they didn't believe this, why would they be working on the 
filesystem instead of just using an existing filesystem.


The ugly reality is that everyone's early versions of their new filesystem looks 
really good. The problem is when they extend it to cover the corner cases and 
when it gets stressed by real-world (as opposed to benchmark) workloads. This 
isn't saying that you are wrong in your belief, just that you may not be right, 
and nobody will know until you are to a usable state and other people can start 
beating on it.


David Lang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-12 Thread David Lang

On Tue, 12 May 2015, Daniel Phillips wrote:


On 05/12/2015 11:39 AM, David Lang wrote:

On Mon, 11 May 2015, Daniel Phillips wrote:

...it's the mm and core kernel developers that need to
review and accept that code *before* we can consider merging tux3.


Please do not say we when you know that I am just as much a we
as you are. Merging Tux3 is not your decision. The people whose
decision it actually is are perfectly capable of recognizing your
agenda for what it is.

  http://www.phoronix.com/scan.php?page=news_itempx=MTA0NzM
  XFS Developer Takes Shots At Btrfs, EXT4


umm, Phoronix has no input on what gets merged into the kernel. they also hae a 
reputation for
trying to turn anything into click-bait by making it sound like a fight when it 
isn't.


Perhaps you misunderstood. Linus decides what gets merged. Andrew
decides. Greg decides. Dave Chinner does not decide, he just does
his level best to create the impression that our project is unfit
to merge. Any chance there might be an agenda?

Phoronix published a headline that identifies Dave Chinner as
someone who takes shots at other projects. Seems pretty much on
the money to me, and it ought to be obvious why he does it.


Phoronix turns any correction or criticism into an attack.

You need to get out of the mindset that Ted and Dave are Enemies that you need 
to overcome, they are friendly competitors, not Enemies. They assume that you 
are working in good faith (but are inexperienced compared to them), and you need 
to assume that they are working in good faith. If they ever do resort to 
underhanded means to sabotage you, Linus and the other kernel developers will 
take action. But pointing out limits in your current implementation, problems in 
your benchmarks based on how they are run, and concepts that are going to be 
difficult to merge is not underhanded, it's exactly the type of assistance that 
you should be greatful for in friendly competition.


You were the one who started crowing about how badly XFS performed. Dave gave a 
long and detailed explination about the reasons for the differences, and showing 
benchmarks on other hardware that showed that XFS works very well there. That's 
not an attack on EXT4 (or Tux3), it's an explination.



The real question is, has the Linux development process become
so political and toxic that worthwhile projects fail to benefit
from supposed grassroots community support. You are the poster
child for that.


The linux development process is making code available, responding to concerns 
from the experts in
the community, and letting the code talk for itself.


Nice idea, but it isn't working. Did you let the code talk to you?
Right, you let the code talk to Dave Chinner, then you listen to
what Dave Chinner has to say about it. Any chance that there might
be some creative licence acting somewhere in that chain?


I have my own concerns about how things are going to work (I've voiced some of 
them), but no, I haven't tried running Tux3 because you say it's not ready yet.



There have been many people pushing code for inclusion that has not gotten into 
the kernel, or has
not been used by any distros after it's made it into the kernel, in spite of 
benchmarks being posted
that seem to show how wonderful the new code is. ReiserFS was one of the first, 
and part of what
tarnished it's reputation with many people was how much they were pushing the 
benchmarks that were
shown to be faulty (the one I remember most vividly was that the entire benchmark 
completed in 30
seconds, and they had the FS tuned to not start flushing data to disk for 30 
seconds, so the entire
'benchmark' ran out of ram without ever touching the disk)


You know what to do about checking for faulty benchmarks.


That requires that the code be readily available, which last I heard, Tux3 
wasn't. Has this been fixed?



So when Ted and Dave point out problems with the benchmark (the difference in 
behavior between a
single spinning disk, different partitions on the same disk, SSDs, and 
ramdisks), you would be
better off acknowledging them and if you can't adjust and re-run the 
benchmarks, don't start
attacking them as a result.


Ted and Dave failed to point out any actual problem with any
benchmark. They invented issues with benchmarks and promoted those
as FUD.


They pointed out problems with using ramdisk to simulate a SSD and huge 
differences between spinning rust and an SSD (or disk array). Those aren't FUD.



As Dave says above, it's not the other filesystem people you have to convince, 
it's the core VFS and
Memory Mangement folks you have to convince. You may need a little benchmarking 
to show that there
is a real advantage to be gained, but the real discussion is going to be on the 
impact that page
forking is going to have on everything else (both in complexity and in 
performance impact to other
things)


Yet he clearly wrote we as if he believes he is part of it.


He is part of the group of people who use and work

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-12 Thread David Lang

On Tue, 12 May 2015, Daniel Phillips wrote:


On 05/12/2015 02:30 PM, David Lang wrote:

On Tue, 12 May 2015, Daniel Phillips wrote:

Phoronix published a headline that identifies Dave Chinner as
someone who takes shots at other projects. Seems pretty much on
the money to me, and it ought to be obvious why he does it.


Phoronix turns any correction or criticism into an attack.


Phoronix gets attacked in an unseemly way by a number of people
in the developer community who should behave better. You are
doing it yourself, seemingly oblivious to the valuable role that
the publication plays in our community. Google for filesystem
benchmarks. Where do you find them? Right. Not to mention the
Xorg coverage, community issues, etc etc. The last thing we
need is a monoculture in Linux news, and we are dangerously
close to that now.


It's on my 'sites to check daily' list, but they have also had some pretty nasty 
errors in their benchmarks, some of which have been pointed out repeatedly over 
the years (doing fsync dependent workloads in situations where one FS actually 
honors the fsyncs and another doesn't is a classic)



So, how is EXT4 is not as stable or as well tested as most
people think not a cheap shot? By my first hand experience,
that claim is absurd. Add to that the first hand experience
of roughly two billion other people. Seems to be a bit self
serving too, or was that just an accident.


I happen to think that it's correct. It's not that Ext4 isn't tested, but that 
people's expectations of how much it's been tested, and at what scale don't 
match the reality.



You need to get out of the mindset that Ted and Dave are Enemies that you need 
to overcome, they are
friendly competitors, not Enemies.


You are wrong about Dave These are not the words of any friend:

  I don't think I'm alone in my suspicion that there was something
  stinky about your numbers. -- Dave Chinner


you are looking for offense. That just means that something is wrong with them, 
not that they were deliberatly falsified.



Basically allegations of cheating. And wrong. Maybe Dave just
lives in his own dreamworld where everybody is out to get him, so
he has to attack people he views as competitors first.


you are the one doing the attacking. Please stop. Take a break if needed, and 
then get back to producing software rather than complaining about how everyone 
is out to get you.


David Lang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-11 Thread David Lang

On Mon, 11 May 2015, Daniel Phillips wrote:


On 05/11/2015 03:12 PM, Pavel Machek wrote:

It is a fact of life that when you change one aspect of an intimately 
interconnected system,
something else will change as well. You have naive/nonexistent free space 
management now; when you
design something workable there it is going to impact everything else you've 
already done. It's an
easy bet that the impact will be negative, the only question is to what degree.


You might lose that bet. For example, suppose we do strictly linear allocation
each delta, and just leave nice big gaps between the deltas for future
expansion. Clearly, we run at similar or identical speed to the current naive
strategy until we must start filling in the gaps, and at that point our layout
is not any worse than XFS, which started bad and stayed that way.


Umm, are you sure. If "some areas of disk are faster than others" is
still true on todays harddrives, the gaps will decrease the
performance (as you'll "use up" the fast areas more quickly).


That's why I hedged my claim with "similar or identical". The
difference in media speed seems to be a relatively small effect
compared to extra seeks. It seems that XFS puts big spaces between
new directories, and suffers a lot of extra seeks because of it.
I propose to batch new directories together initially, then change
the allocation goal to a new, relatively empty area if a big batch
of files lands on a directory in a crowded region. The "big" gaps
would be on the order of delta size, so not really very big.


This is an interesting idea, but what happens if the files don't arrive as a big 
batch, but rather trickle in over time (think a logserver that if putting files 
into a bunch of directories at a fairly modest rate per directory)


And when you then decide that you have to move the directory/file info, doesn't 
that create a potentially large amount of unexpected IO that could end up 
interfering with what the user is trying to do?


David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-11 Thread David Lang

On Mon, 11 May 2015, Daniel Phillips wrote:


On 05/11/2015 03:12 PM, Pavel Machek wrote:

It is a fact of life that when you change one aspect of an intimately 
interconnected system,
something else will change as well. You have naive/nonexistent free space 
management now; when you
design something workable there it is going to impact everything else you've 
already done. It's an
easy bet that the impact will be negative, the only question is to what degree.


You might lose that bet. For example, suppose we do strictly linear allocation
each delta, and just leave nice big gaps between the deltas for future
expansion. Clearly, we run at similar or identical speed to the current naive
strategy until we must start filling in the gaps, and at that point our layout
is not any worse than XFS, which started bad and stayed that way.


Umm, are you sure. If some areas of disk are faster than others is
still true on todays harddrives, the gaps will decrease the
performance (as you'll use up the fast areas more quickly).


That's why I hedged my claim with similar or identical. The
difference in media speed seems to be a relatively small effect
compared to extra seeks. It seems that XFS puts big spaces between
new directories, and suffers a lot of extra seeks because of it.
I propose to batch new directories together initially, then change
the allocation goal to a new, relatively empty area if a big batch
of files lands on a directory in a crowded region. The big gaps
would be on the order of delta size, so not really very big.


This is an interesting idea, but what happens if the files don't arrive as a big 
batch, but rather trickle in over time (think a logserver that if putting files 
into a bunch of directories at a fairly modest rate per directory)


And when you then decide that you have to move the directory/file info, doesn't 
that create a potentially large amount of unexpected IO that could end up 
interfering with what the user is trying to do?


David Lang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: how to have the kernel do udev's job and autoload the right modules ?

2015-05-07 Thread David Lang

On Thu, 7 May 2015, Austin S Hemmelgarn wrote:


On 2015-05-06 16:49, David Lang wrote:

On Wed, 6 May 2015, linuxcbon linuxcbon wrote:


On Wed, May 6, 2015 at 7:53 PM, David Lang  wrote:

It's perfectly legitimate to not want to use udev, but that doesn't mean
that the kernel will (or should) do it for you.
David Lang


When I boot the kernel without modules, I don't have anything working
except "minimal video".
I think the kernel should give a minimal support for network, sound and
video, even if 0 modules are loaded. I am just dreaming,


You can do that, you just need to build in all the network and sound
drivers (and pick which driver in the case of conflicts)

There isn't such a thing as a 'generic' network or sound card. For video
there is 'VGA video' which is used by default on x86 systems, but even
that's a driver that could be disabled.

To explain further, video has a standardized hardware level API (VGA and VBE) 
because it is considered critical system functionality (which is BS in my 
opinion, you can get by just fine with a serial console, but that's 
irrelevant to this discussion).  Sound is traditionally not considered 
critical, and therefore doesn't have a standardized hardware API.  Networking 
is (traditionally) only considered critical if the system is booting off the 
network, and therefore only has a standardized API (part of the PXE spec, 
known as UNDI) on some systems, and even then only when they are configured 
to netboot (and IIRC, also only when the processor is in real mode, just like 
for all other BIOS calls).


I don't think that it has anything to do with critical system functionality, but 
rather just the legacy history of the PC clones. At one point VGA was the 
standard, and at that point the different video card manufacturers got into the 
game, but since they all had to boot the system, and the BIOS only knew how to 
talk to a VGA card, all the enhanced cards had to implement VGA so that DOS and 
the BIOS could function. That legacy has continued on the PC clone systems to 
today. Non PC clones didn't have such a standard, and they don't implement VGA 
on their video cards (unless it's a card ported from a PC)


Network cards were never standardized, and were optional add-ons. They also 
weren't needed for the system to boot, so there was never any standard for 
newcomers to implement.


David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: how to have the kernel do udev's job and autoload the right modules ?

2015-05-07 Thread David Lang

On Thu, 7 May 2015, Austin S Hemmelgarn wrote:


On 2015-05-06 16:49, David Lang wrote:

On Wed, 6 May 2015, linuxcbon linuxcbon wrote:


On Wed, May 6, 2015 at 7:53 PM, David Lang da...@lang.hm wrote:

It's perfectly legitimate to not want to use udev, but that doesn't mean
that the kernel will (or should) do it for you.
David Lang


When I boot the kernel without modules, I don't have anything working
except minimal video.
I think the kernel should give a minimal support for network, sound and
video, even if 0 modules are loaded. I am just dreaming,


You can do that, you just need to build in all the network and sound
drivers (and pick which driver in the case of conflicts)

There isn't such a thing as a 'generic' network or sound card. For video
there is 'VGA video' which is used by default on x86 systems, but even
that's a driver that could be disabled.

To explain further, video has a standardized hardware level API (VGA and VBE) 
because it is considered critical system functionality (which is BS in my 
opinion, you can get by just fine with a serial console, but that's 
irrelevant to this discussion).  Sound is traditionally not considered 
critical, and therefore doesn't have a standardized hardware API.  Networking 
is (traditionally) only considered critical if the system is booting off the 
network, and therefore only has a standardized API (part of the PXE spec, 
known as UNDI) on some systems, and even then only when they are configured 
to netboot (and IIRC, also only when the processor is in real mode, just like 
for all other BIOS calls).


I don't think that it has anything to do with critical system functionality, but 
rather just the legacy history of the PC clones. At one point VGA was the 
standard, and at that point the different video card manufacturers got into the 
game, but since they all had to boot the system, and the BIOS only knew how to 
talk to a VGA card, all the enhanced cards had to implement VGA so that DOS and 
the BIOS could function. That legacy has continued on the PC clone systems to 
today. Non PC clones didn't have such a standard, and they don't implement VGA 
on their video cards (unless it's a card ported from a PC)


Network cards were never standardized, and were optional add-ons. They also 
weren't needed for the system to boot, so there was never any standard for 
newcomers to implement.


David Lang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: how to have the kernel do udev's job and autoload the right modules ?

2015-05-06 Thread David Lang

On Wed, 6 May 2015, linuxcbon linuxcbon wrote:


On Wed, May 6, 2015 at 7:53 PM, David Lang  wrote:

It's perfectly legitimate to not want to use udev, but that doesn't mean
that the kernel will (or should) do it for you.
David Lang


When I boot the kernel without modules, I don't have anything working
except "minimal video".
I think the kernel should give a minimal support for network, sound and
video, even if 0 modules are loaded. I am just dreaming,


You can do that, you just need to build in all the network and sound drivers 
(and pick which driver in the case of conflicts)


There isn't such a thing as a 'generic' network or sound card. For video there 
is 'VGA video' which is used by default on x86 systems, but even that's a driver 
that could be disabled.


David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: how to have the kernel do udev's job and autoload the right modules ?

2015-05-06 Thread David Lang

On Wed, 6 May 2015, linuxcbon linuxcbon wrote:


On Wed, May 6, 2015 at 5:55 PM, Ken Moffat  wrote:

I suggest that you take the time to look at eudev and mdev, and
think about how you can use the facilities they offer.



I was wishing the kernel would offer some minimal support for
network, sound and full screen video for my hw :(.
But it seems I need to load modules to achieve this. And to load modules,
it needs some kind of "hotplug" called udev or mdev.


I've been building my own kernels for production systems for a long time. It is 
absolutly possible to have a kernel provide support for your hardware without 
modules.


The problem is the question of how much hardware you want to support. Modules 
were created because compiling everything into the kernel at once has multiple 
problems


1. sometimes different drivers can handle the same hardware, and you can only 
use one driver for the hardware


2. sometimes different hardware conflicts in that drivers for one piece of 
hardware will think that they've found their hardware, and prevent the proper 
drivers from working (sometimes doing 'strange' things to the hardware in the 
process)


3. the resulting kernel is VERY large. Back in the day, the problem was that the 
kernel would no longer fit on a floppy. We don't have that limit, but we still 
don't want to waste time reading a huge amount of data into RAM (at which point 
it prevents the RAM from being used for other things)


4. boot time would be horrible as all the drivers try to detect their hardware 
and time out.



so if you want to cover your hardware, you have two choices.

1. If you have a relatively small variation of hardware, just compile in all the 
drivers you need. This even works for most hotplugged items.


2. use modules

If you use modules, then you need to have some way of loading them. It's a very 
bad idea to have this happen by magic, without any control over the policies 
(sometimes you don't want drivers to load just because hardware exists). So you 
need to have a place to set the policy. Since the kernel provides mechanisms, 
not policy, the result is that the kernel tells userspace what it thinks it's 
found and it's up to userspace to then 'do the right thing'


So if you don't want to use udev, then you need to have something that replaces 
it to load the right module with the right options.


It's perfectly legitimate to not want to use udev, but that doesn't mean that 
the kernel will (or should) do it for you.


David Lang

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: how to have the kernel do udev's job and autoload the right modules ?

2015-05-06 Thread David Lang

On Wed, 6 May 2015, linuxcbon linuxcbon wrote:


On Wed, May 6, 2015 at 7:53 PM, David Lang da...@lang.hm wrote:

It's perfectly legitimate to not want to use udev, but that doesn't mean
that the kernel will (or should) do it for you.
David Lang


When I boot the kernel without modules, I don't have anything working
except minimal video.
I think the kernel should give a minimal support for network, sound and
video, even if 0 modules are loaded. I am just dreaming,


You can do that, you just need to build in all the network and sound drivers 
(and pick which driver in the case of conflicts)


There isn't such a thing as a 'generic' network or sound card. For video there 
is 'VGA video' which is used by default on x86 systems, but even that's a driver 
that could be disabled.


David Lang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: how to have the kernel do udev's job and autoload the right modules ?

2015-05-06 Thread David Lang

On Wed, 6 May 2015, linuxcbon linuxcbon wrote:


On Wed, May 6, 2015 at 5:55 PM, Ken Moffat zarniwh...@ntlworld.com wrote:

I suggest that you take the time to look at eudev and mdev, and
think about how you can use the facilities they offer.



I was wishing the kernel would offer some minimal support for
network, sound and full screen video for my hw :(.
But it seems I need to load modules to achieve this. And to load modules,
it needs some kind of hotplug called udev or mdev.


I've been building my own kernels for production systems for a long time. It is 
absolutly possible to have a kernel provide support for your hardware without 
modules.


The problem is the question of how much hardware you want to support. Modules 
were created because compiling everything into the kernel at once has multiple 
problems


1. sometimes different drivers can handle the same hardware, and you can only 
use one driver for the hardware


2. sometimes different hardware conflicts in that drivers for one piece of 
hardware will think that they've found their hardware, and prevent the proper 
drivers from working (sometimes doing 'strange' things to the hardware in the 
process)


3. the resulting kernel is VERY large. Back in the day, the problem was that the 
kernel would no longer fit on a floppy. We don't have that limit, but we still 
don't want to waste time reading a huge amount of data into RAM (at which point 
it prevents the RAM from being used for other things)


4. boot time would be horrible as all the drivers try to detect their hardware 
and time out.



so if you want to cover your hardware, you have two choices.

1. If you have a relatively small variation of hardware, just compile in all the 
drivers you need. This even works for most hotplugged items.


2. use modules

If you use modules, then you need to have some way of loading them. It's a very 
bad idea to have this happen by magic, without any control over the policies 
(sometimes you don't want drivers to load just because hardware exists). So you 
need to have a place to set the policy. Since the kernel provides mechanisms, 
not policy, the result is that the kernel tells userspace what it thinks it's 
found and it's up to userspace to then 'do the right thing'


So if you don't want to use udev, then you need to have something that replaces 
it to load the right module with the right options.


It's perfectly legitimate to not want to use udev, but that doesn't mean that 
the kernel will (or should) do it for you.


David Lang

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: A desktop environment[1] kernel wishlist

2015-05-05 Thread David Lang

On Wed, 6 May 2015, Rafael J. Wysocki wrote:


You are, of course, correct.  Ultimately the only requirement we have
is that there exists a way for userspace to determine if the system
woke up because of a user-triggered event.  The actual mechanism by
which this determination is made isn't something I feel strongly
about.  The reason I had been focusing on exposing the actual wakeup
event to userspace is because classifying wakeup events as
user-triggered or not feels to me like a policy decision that should
be left to userspace.  If the kernel maintainers are ok with doing
this work in the kernel instead and only exposing a binary yes/no bit
to userspace for user-triggered wakeups, that's perfectly fine because
it still meets our requirements.


Well, please see the message I've just sent.

All wakeup devices have a wakeup source object associated with them.  In
principle, we can expose a "priority" attribute from that for user space to
set as it wants to.  There may be two values of it, like "normal" and "high"
for example.

Then, what only remains is to introduce separate wakeup counts for the "high"
priority and "normal" priority wakeup sources and teach the power manager to
use them.

That leaves no policy in the kernel, but it actually has a chance to work.


how about instead of setting two states and defining that one must be a subset 
of the other you instead have the existing feed of events and then allow 
software that cares to define additional feeds that take the current feed and 
filter it. We allow bpf filters in the kernel, so use those to filter what 
events the additional feed is going to receive.


remember that the interesting numbers in CS are 0, 1, and many, not 2 :-)

don't limit things to two feeds with one always being a subset of the other, 
create a mechanism to allow an arbitrary number of feeds that can be filtered in 
different ways


David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: A desktop environment[1] kernel wishlist

2015-05-05 Thread David Lang

On Wed, 6 May 2015, Rafael J. Wysocki wrote:


You are, of course, correct.  Ultimately the only requirement we have
is that there exists a way for userspace to determine if the system
woke up because of a user-triggered event.  The actual mechanism by
which this determination is made isn't something I feel strongly
about.  The reason I had been focusing on exposing the actual wakeup
event to userspace is because classifying wakeup events as
user-triggered or not feels to me like a policy decision that should
be left to userspace.  If the kernel maintainers are ok with doing
this work in the kernel instead and only exposing a binary yes/no bit
to userspace for user-triggered wakeups, that's perfectly fine because
it still meets our requirements.


Well, please see the message I've just sent.

All wakeup devices have a wakeup source object associated with them.  In
principle, we can expose a priority attribute from that for user space to
set as it wants to.  There may be two values of it, like normal and high
for example.

Then, what only remains is to introduce separate wakeup counts for the high
priority and normal priority wakeup sources and teach the power manager to
use them.

That leaves no policy in the kernel, but it actually has a chance to work.


how about instead of setting two states and defining that one must be a subset 
of the other you instead have the existing feed of events and then allow 
software that cares to define additional feeds that take the current feed and 
filter it. We allow bpf filters in the kernel, so use those to filter what 
events the additional feed is going to receive.


remember that the interesting numbers in CS are 0, 1, and many, not 2 :-)

don't limit things to two feeds with one always being a subset of the other, 
create a mechanism to allow an arbitrary number of feeds that can be filtered in 
different ways


David Lang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Tux3 Report: How fast can we fsync?

2015-05-01 Thread David Lang
r directly or 
donated)


David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Tux3 Report: How fast can we fsync?

2015-05-01 Thread David Lang
 or 
donated)


David Lang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] kdbus for 4.1-rc1

2015-04-29 Thread David Lang

On Thu, 30 Apr 2015, Dave Airlie wrote:


On 30 April 2015 at 10:05, David Lang  wrote:

On Wed, 29 Apr 2015, Theodore Ts'o wrote:


On Wed, Apr 29, 2015 at 12:26:59PM -0400, John Stoffel wrote:


If your customers wnat this feature, you're more than welcome to fork
the kernel and support it yourself.  Oh wait... Redhat does that
already.  So what's the problem?   Just put it into RHEL (which I use
I admit, along with Debian/Mint) and be done with it.



Harald,

If you make the RHEL initramfs harder to debug in the field, I will
await the time when some Red Hat field engineers will need to do the
same sort of thing I have had to do in the field, and be amused when
they want to shake you very warmly by the throat.  :-)

Seriously, keep things as simple as possible in the initramfs; don't
use complicated bus protocols; that way lies madness.  Enterprise
systems aren't constantly booting (or they shouldn't be, if your
kernels are sufficiently reliable :-), so trying to optimize for an
extra 2 or 3 seconds worth of boot time really, REALLY isn't worth it.



I've had Enterprise systems where I could hit power on two boxes, and finish
the OS install on one before the other has even finished POST and look for
the boot media. I did this 5 years ago, before the "let's speed up boot"
push started.

Admittedly, this wasn't a stock distro boot/install, it was my own optimized
one, but it also wasn't as optimized and automated as it could have been
(several points where the installer needed to pick items from a menu and
enter values)



You guys might have missed this new industry trend, I think they call
it virtualisation,

I hear it's going to be big, you might want to look into it.


So what do you run your virtual machines on? you still have to put an OS on the 
hardware to support your VMs. Virtualization doesn't eliminate servers (as much 
as some cloud advocates like to claim it does)


And virtualization has overhead, sometimes very significant overhead, so it's 
not always the right answer.


David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] kdbus for 4.1-rc1

2015-04-29 Thread David Lang

On Wed, 29 Apr 2015, Theodore Ts'o wrote:


On Wed, Apr 29, 2015 at 12:26:59PM -0400, John Stoffel wrote:

If your customers wnat this feature, you're more than welcome to fork
the kernel and support it yourself.  Oh wait... Redhat does that
already.  So what's the problem?   Just put it into RHEL (which I use
I admit, along with Debian/Mint) and be done with it.


Harald,

If you make the RHEL initramfs harder to debug in the field, I will
await the time when some Red Hat field engineers will need to do the
same sort of thing I have had to do in the field, and be amused when
they want to shake you very warmly by the throat.  :-)

Seriously, keep things as simple as possible in the initramfs; don't
use complicated bus protocols; that way lies madness.  Enterprise
systems aren't constantly booting (or they shouldn't be, if your
kernels are sufficiently reliable :-), so trying to optimize for an
extra 2 or 3 seconds worth of boot time really, REALLY isn't worth it.


I've had Enterprise systems where I could hit power on two boxes, and finish the 
OS install on one before the other has even finished POST and look for the boot 
media. I did this 5 years ago, before the "let's speed up boot" push started.


Admittedly, this wasn't a stock distro boot/install, it was my own optimized 
one, but it also wasn't as optimized and automated as it could have been 
(several points where the installer needed to pick items from a menu and enter 
values)


David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] kdbus for 4.1-rc1

2015-04-29 Thread David Lang

On Wed, 29 Apr 2015, Andy Lutomirski wrote:


On Wed, Apr 29, 2015 at 1:15 PM, David Lang  wrote:

On Wed, 29 Apr 2015, Andy Lutomirski wrote:


On Wed, Apr 29, 2015 at 12:30 PM, Austin S Hemmelgarn
 wrote:


On 2015-04-29 14:54, Andy Lutomirski wrote:



On Apr 29, 2015 5:48 AM, "Harald Hoyer"  wrote:




  * Being in the kernel closes a lot of races which can't be fixed with
the current userspace solutions.  For example, with kdbus, there is
a
way a client can disconnect from a bus, but do so only if no
further
messages present in its queue, which is crucial for implementing
race-free "exit-on-idle" services




This can be implemented in userspace.

Client to dbus daemon: may I exit now?
Dbus daemon to client: yes (and no more messages) or no


Depending on how this is implemented, there would be a potential issue if
a
message arrived for the client after the daemon told it it could exit,
but
before it finished shutdown, in which case the message might get lost.



Then implement it the right way?  The client sends some kind of
sequence number with its request.



so any app in the system can prevent any other app from exiting/restarting
by just sending it the equivalent of a ping over dbus?

preventing an app from exiting because there are unhandled messages doesn't
mean that those messages are going to be handled, just that they will get
read and dropped on the floor by an app trying to exit. Sometimes you will
just end up with a hung app that can't process messages and needs to be
restarted, but can't be restarted because there are pending messages.


I think this consideration is more or less the same whether it's
handled in the kernel or in userspace, though.


If the justification for why this needs to be in the kernel is that you can't 
reliably prevent apps from exiting if there are pending messages, then the 
answer of "preventing apps from exiting if there are pending messages isn't a 
sane thing to try and do" is a direct counter to that justification for 
including it in the kernel.


David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] kdbus for 4.1-rc1

2015-04-29 Thread David Lang

On Wed, 29 Apr 2015, Andy Lutomirski wrote:


On Wed, Apr 29, 2015 at 12:30 PM, Austin S Hemmelgarn
 wrote:

On 2015-04-29 14:54, Andy Lutomirski wrote:


On Apr 29, 2015 5:48 AM, "Harald Hoyer"  wrote:



  * Being in the kernel closes a lot of races which can't be fixed with
the current userspace solutions.  For example, with kdbus, there is a
way a client can disconnect from a bus, but do so only if no further
messages present in its queue, which is crucial for implementing
race-free "exit-on-idle" services



This can be implemented in userspace.

Client to dbus daemon: may I exit now?
Dbus daemon to client: yes (and no more messages) or no


Depending on how this is implemented, there would be a potential issue if a
message arrived for the client after the daemon told it it could exit, but
before it finished shutdown, in which case the message might get lost.



Then implement it the right way?  The client sends some kind of
sequence number with its request.


so any app in the system can prevent any other app from exiting/restarting by 
just sending it the equivalent of a ping over dbus?


preventing an app from exiting because there are unhandled messages doesn't mean 
that those messages are going to be handled, just that they will get read and 
dropped on the floor by an app trying to exit. Sometimes you will just end up 
with a hung app that can't process messages and needs to be restarted, but can't 
be restarted because there are pending messages.


The problem with "guaranteed delivery" messages is that things _will_ go wrong 
that will cause the messages to not be received and processed. At that point you 
have the choice of loosing some messages or freezing your entire system (you can 
buffer them for some time, but eventually you will run out of buffer space)


We see this all the time in the logging world, people configure their systems 
for reliable delivery of log messages to a remote machine, then when that remote 
machine goes down and can't receive messages (or a network issue blocks the 
traffic), the sending machine blocks and causes an outage.


Being too strict about guaranteeing delivery just doesn't work. You must have a 
mechanism to abort and throw away unprocessed messages. If this means 
disconnecting the receiver so that there are no missing messages to the 
receiver, that's a valid choice. But preventing a receiver from exiting because 
it hasn't processed a message is not a valid choice.


David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] kdbus for 4.1-rc1

2015-04-29 Thread David Lang

On Wed, 29 Apr 2015, Martin Steigerwald wrote:


Am Mittwoch, 29. April 2015, 14:47:53 schrieb Harald Hoyer:

We really don't want the IPC mechanism to be in a flux state. All tools
have to fallback to a non-standard mechanism in that case.

If I have to pull in a dbus daemon in the initramfs, we still have the
chicken and egg problem for PID 1 talking to the logging daemon and
starting dbus.
systemd cannot talk to journald via dbus unless dbus-daemon is started,
dbus cannot log anything on startup, if journald is not running, etc...


Do I get this right that it is basically a userspace *design* decision
that you use as a reason to have kdbus inside the kernel?

Is it really necessary to use DBUS for talking to journald? And does it
really matter that much if any message before starting up dbus do not
appear in the log? /proc/kmsg is a ring buffer, it can still be copied over
later.


I've been getting the early boot messages in my logs for decades (assuming the 
system doesn't fail before the syslog daemon is started). It sometimes has 
required setting a larger than default ringbuffer in the kernel, but that's easy 
enough to do.


David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] kdbus for 4.1-rc1

2015-04-29 Thread David Lang

On Wed, 29 Apr 2015, Martin Steigerwald wrote:


Am Mittwoch, 29. April 2015, 14:47:53 schrieb Harald Hoyer:

We really don't want the IPC mechanism to be in a flux state. All tools
have to fallback to a non-standard mechanism in that case.

If I have to pull in a dbus daemon in the initramfs, we still have the
chicken and egg problem for PID 1 talking to the logging daemon and
starting dbus.
systemd cannot talk to journald via dbus unless dbus-daemon is started,
dbus cannot log anything on startup, if journald is not running, etc...


Do I get this right that it is basically a userspace *design* decision
that you use as a reason to have kdbus inside the kernel?

Is it really necessary to use DBUS for talking to journald? And does it
really matter that much if any message before starting up dbus do not
appear in the log? /proc/kmsg is a ring buffer, it can still be copied over
later.


I've been getting the early boot messages in my logs for decades (assuming the 
system doesn't fail before the syslog daemon is started). It sometimes has 
required setting a larger than default ringbuffer in the kernel, but that's easy 
enough to do.


David Lang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] kdbus for 4.1-rc1

2015-04-29 Thread David Lang

On Wed, 29 Apr 2015, Andy Lutomirski wrote:


On Wed, Apr 29, 2015 at 12:30 PM, Austin S Hemmelgarn
ahferro...@gmail.com wrote:

On 2015-04-29 14:54, Andy Lutomirski wrote:


On Apr 29, 2015 5:48 AM, Harald Hoyer har...@redhat.com wrote:



  * Being in the kernel closes a lot of races which can't be fixed with
the current userspace solutions.  For example, with kdbus, there is a
way a client can disconnect from a bus, but do so only if no further
messages present in its queue, which is crucial for implementing
race-free exit-on-idle services



This can be implemented in userspace.

Client to dbus daemon: may I exit now?
Dbus daemon to client: yes (and no more messages) or no


Depending on how this is implemented, there would be a potential issue if a
message arrived for the client after the daemon told it it could exit, but
before it finished shutdown, in which case the message might get lost.



Then implement it the right way?  The client sends some kind of
sequence number with its request.


so any app in the system can prevent any other app from exiting/restarting by 
just sending it the equivalent of a ping over dbus?


preventing an app from exiting because there are unhandled messages doesn't mean 
that those messages are going to be handled, just that they will get read and 
dropped on the floor by an app trying to exit. Sometimes you will just end up 
with a hung app that can't process messages and needs to be restarted, but can't 
be restarted because there are pending messages.


The problem with guaranteed delivery messages is that things _will_ go wrong 
that will cause the messages to not be received and processed. At that point you 
have the choice of loosing some messages or freezing your entire system (you can 
buffer them for some time, but eventually you will run out of buffer space)


We see this all the time in the logging world, people configure their systems 
for reliable delivery of log messages to a remote machine, then when that remote 
machine goes down and can't receive messages (or a network issue blocks the 
traffic), the sending machine blocks and causes an outage.


Being too strict about guaranteeing delivery just doesn't work. You must have a 
mechanism to abort and throw away unprocessed messages. If this means 
disconnecting the receiver so that there are no missing messages to the 
receiver, that's a valid choice. But preventing a receiver from exiting because 
it hasn't processed a message is not a valid choice.


David Lang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] kdbus for 4.1-rc1

2015-04-29 Thread David Lang

On Wed, 29 Apr 2015, Andy Lutomirski wrote:


On Wed, Apr 29, 2015 at 1:15 PM, David Lang da...@lang.hm wrote:

On Wed, 29 Apr 2015, Andy Lutomirski wrote:


On Wed, Apr 29, 2015 at 12:30 PM, Austin S Hemmelgarn
ahferro...@gmail.com wrote:


On 2015-04-29 14:54, Andy Lutomirski wrote:



On Apr 29, 2015 5:48 AM, Harald Hoyer har...@redhat.com wrote:




  * Being in the kernel closes a lot of races which can't be fixed with
the current userspace solutions.  For example, with kdbus, there is
a
way a client can disconnect from a bus, but do so only if no
further
messages present in its queue, which is crucial for implementing
race-free exit-on-idle services




This can be implemented in userspace.

Client to dbus daemon: may I exit now?
Dbus daemon to client: yes (and no more messages) or no


Depending on how this is implemented, there would be a potential issue if
a
message arrived for the client after the daemon told it it could exit,
but
before it finished shutdown, in which case the message might get lost.



Then implement it the right way?  The client sends some kind of
sequence number with its request.



so any app in the system can prevent any other app from exiting/restarting
by just sending it the equivalent of a ping over dbus?

preventing an app from exiting because there are unhandled messages doesn't
mean that those messages are going to be handled, just that they will get
read and dropped on the floor by an app trying to exit. Sometimes you will
just end up with a hung app that can't process messages and needs to be
restarted, but can't be restarted because there are pending messages.


I think this consideration is more or less the same whether it's
handled in the kernel or in userspace, though.


If the justification for why this needs to be in the kernel is that you can't 
reliably prevent apps from exiting if there are pending messages, then the 
answer of preventing apps from exiting if there are pending messages isn't a 
sane thing to try and do is a direct counter to that justification for 
including it in the kernel.


David Lang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] kdbus for 4.1-rc1

2015-04-29 Thread David Lang

On Wed, 29 Apr 2015, Theodore Ts'o wrote:


On Wed, Apr 29, 2015 at 12:26:59PM -0400, John Stoffel wrote:

If your customers wnat this feature, you're more than welcome to fork
the kernel and support it yourself.  Oh wait... Redhat does that
already.  So what's the problem?   Just put it into RHEL (which I use
I admit, along with Debian/Mint) and be done with it.


Harald,

If you make the RHEL initramfs harder to debug in the field, I will
await the time when some Red Hat field engineers will need to do the
same sort of thing I have had to do in the field, and be amused when
they want to shake you very warmly by the throat.  :-)

Seriously, keep things as simple as possible in the initramfs; don't
use complicated bus protocols; that way lies madness.  Enterprise
systems aren't constantly booting (or they shouldn't be, if your
kernels are sufficiently reliable :-), so trying to optimize for an
extra 2 or 3 seconds worth of boot time really, REALLY isn't worth it.


I've had Enterprise systems where I could hit power on two boxes, and finish the 
OS install on one before the other has even finished POST and look for the boot 
media. I did this 5 years ago, before the let's speed up boot push started.


Admittedly, this wasn't a stock distro boot/install, it was my own optimized 
one, but it also wasn't as optimized and automated as it could have been 
(several points where the installer needed to pick items from a menu and enter 
values)


David Lang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] kdbus for 4.1-rc1

2015-04-29 Thread David Lang

On Thu, 30 Apr 2015, Dave Airlie wrote:


On 30 April 2015 at 10:05, David Lang da...@lang.hm wrote:

On Wed, 29 Apr 2015, Theodore Ts'o wrote:


On Wed, Apr 29, 2015 at 12:26:59PM -0400, John Stoffel wrote:


If your customers wnat this feature, you're more than welcome to fork
the kernel and support it yourself.  Oh wait... Redhat does that
already.  So what's the problem?   Just put it into RHEL (which I use
I admit, along with Debian/Mint) and be done with it.



Harald,

If you make the RHEL initramfs harder to debug in the field, I will
await the time when some Red Hat field engineers will need to do the
same sort of thing I have had to do in the field, and be amused when
they want to shake you very warmly by the throat.  :-)

Seriously, keep things as simple as possible in the initramfs; don't
use complicated bus protocols; that way lies madness.  Enterprise
systems aren't constantly booting (or they shouldn't be, if your
kernels are sufficiently reliable :-), so trying to optimize for an
extra 2 or 3 seconds worth of boot time really, REALLY isn't worth it.



I've had Enterprise systems where I could hit power on two boxes, and finish
the OS install on one before the other has even finished POST and look for
the boot media. I did this 5 years ago, before the let's speed up boot
push started.

Admittedly, this wasn't a stock distro boot/install, it was my own optimized
one, but it also wasn't as optimized and automated as it could have been
(several points where the installer needed to pick items from a menu and
enter values)



You guys might have missed this new industry trend, I think they call
it virtualisation,

I hear it's going to be big, you might want to look into it.


So what do you run your virtual machines on? you still have to put an OS on the 
hardware to support your VMs. Virtualization doesn't eliminate servers (as much 
as some cloud advocates like to claim it does)


And virtualization has overhead, sometimes very significant overhead, so it's 
not always the right answer.


David Lang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] kdbus for 4.1-rc1

2015-04-28 Thread David Lang

On Tue, 28 Apr 2015, Havoc Pennington wrote:


On Tue, Apr 28, 2015 at 1:19 PM, David Lang  wrote:

If the examples that are being used to show the performance advantage of
kdbus vs normal dbus are doing the wrong thing, then we need to get some
other examples available to people who don't live and breath dbus that 'so
things right' so that the kernel developers can see what you think is the
real problem and how kdbus addresses it.

So far, this 'wrong' example is the only thing that's been posted to show
the performance advantage of kdbus.


I'm hopeful someone will do that.

fwiw, I would be suspicious of a broken benchmark if it didn't show:

* the bus daemon means an extra read/parse and marshal/write per
message, so 4 vs. 2
* the existence of the bus daemon therefore makes a message
send/receive take roughly twice as long

https://lwn.net/Articles/580194/ has a bit more elaboration about
number of copies, validations, and context switches in each case.

From what I can tell, the core performance claim for kdbus is that for
a userspace daemon to be a routing intermediary, it has to receive and
re-send messages. If the baseline performance of IPC is the cost to
send once and receive once, adding the daemon means there's twice as
much to do (1 more receive, 1 more send). However fast you make
send/receive, the daemon always means there are twice as many
send/receives as there would be with no daemon.


there are twice as many context switches, nobody disputes that, the question is 
if it matters.


It doesn't matter if the message router is in kernel space or user space, it 
still needs to read/parse, marshal/write the data, so you aren't saving that 
time due to it being in the kernel.



If that isn't what a benchmark shows, then there's a mystery to
explain... (one disruption to the ratio of course could be if the
clients use a much faster or slower dbus lib than the daemon)

As noted many times, of course this 2x penalty for the daemon was a
conscious tradeoff - kdbus is trying to escape the tradeoff in order
to extend usage of dbus to more use cases. Given the tradeoff,
_existing_ uses of dbus seem to prefer the performance hit to the loss
of useful semantics, but potential new users would like to or need to
have both.


If there is a 2x performance improvement for being in the kernel, but a 100x 
performance improvement from fixing the userspace code, the effort should be 
spent on the userspace code, not on moving things to kernel space.


Remember the Tux in-kernel webserver? it showed performance improvements from 
putting the http daemon in the kernel, and a lot of the arguments about it sound 
very similar (reduced context switches, etc)


David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] kdbus for 4.1-rc1

2015-04-28 Thread David Lang

On Tue, 28 Apr 2015, Havoc Pennington wrote:


btw if I can make a suggestion, it's quite confusing to talk about
"dbus" unqualified when we are talking about implementation issues,
since it muddles bus daemon vs. clients, and also since there are lots
of implementations of the client bindings:

 http://www.freedesktop.org/wiki/Software/DBusBindings/

For the bus daemon, the only two implementations I know of are the
original one (which uses libdbus as its binding) and kdbus, though.

I would expect there's no question the bus daemon can be faster, maybe
say 1.5x raw sockets instead of 2.5x, or whatever - something on that
order. Should probably simply stipulate this for discussion purposes:
"someone could optimize the crap out of the bus daemon". The kdbus
question is about whether to eliminate this daemon entirely.


As I'm seeing things, we aren't talking about 1.5x vs 2.5x, we're talking about 
1000x


If the examples that are being used to show the performance advantage of kdbus 
vs normal dbus are doing the wrong thing, then we need to get some other 
examples available to people who don't live and breath dbus that 'so things 
right' so that the kernel developers can see what you think is the real problem 
and how kdbus addresses it.


So far, this 'wrong' example is the only thing that's been posted to show the 
performance advantage of kdbus.


David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] kdbus for 4.1-rc1

2015-04-28 Thread David Lang

On Tue, 28 Apr 2015, Havoc Pennington wrote:


On Tue, Apr 28, 2015 at 1:19 PM, David Lang da...@lang.hm wrote:

If the examples that are being used to show the performance advantage of
kdbus vs normal dbus are doing the wrong thing, then we need to get some
other examples available to people who don't live and breath dbus that 'so
things right' so that the kernel developers can see what you think is the
real problem and how kdbus addresses it.

So far, this 'wrong' example is the only thing that's been posted to show
the performance advantage of kdbus.


I'm hopeful someone will do that.

fwiw, I would be suspicious of a broken benchmark if it didn't show:

* the bus daemon means an extra read/parse and marshal/write per
message, so 4 vs. 2
* the existence of the bus daemon therefore makes a message
send/receive take roughly twice as long

https://lwn.net/Articles/580194/ has a bit more elaboration about
number of copies, validations, and context switches in each case.

From what I can tell, the core performance claim for kdbus is that for
a userspace daemon to be a routing intermediary, it has to receive and
re-send messages. If the baseline performance of IPC is the cost to
send once and receive once, adding the daemon means there's twice as
much to do (1 more receive, 1 more send). However fast you make
send/receive, the daemon always means there are twice as many
send/receives as there would be with no daemon.


there are twice as many context switches, nobody disputes that, the question is 
if it matters.


It doesn't matter if the message router is in kernel space or user space, it 
still needs to read/parse, marshal/write the data, so you aren't saving that 
time due to it being in the kernel.



If that isn't what a benchmark shows, then there's a mystery to
explain... (one disruption to the ratio of course could be if the
clients use a much faster or slower dbus lib than the daemon)

As noted many times, of course this 2x penalty for the daemon was a
conscious tradeoff - kdbus is trying to escape the tradeoff in order
to extend usage of dbus to more use cases. Given the tradeoff,
_existing_ uses of dbus seem to prefer the performance hit to the loss
of useful semantics, but potential new users would like to or need to
have both.


If there is a 2x performance improvement for being in the kernel, but a 100x 
performance improvement from fixing the userspace code, the effort should be 
spent on the userspace code, not on moving things to kernel space.


Remember the Tux in-kernel webserver? it showed performance improvements from 
putting the http daemon in the kernel, and a lot of the arguments about it sound 
very similar (reduced context switches, etc)


David Lang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] kdbus for 4.1-rc1

2015-04-28 Thread David Lang

On Tue, 28 Apr 2015, Havoc Pennington wrote:


btw if I can make a suggestion, it's quite confusing to talk about
dbus unqualified when we are talking about implementation issues,
since it muddles bus daemon vs. clients, and also since there are lots
of implementations of the client bindings:

 http://www.freedesktop.org/wiki/Software/DBusBindings/

For the bus daemon, the only two implementations I know of are the
original one (which uses libdbus as its binding) and kdbus, though.

I would expect there's no question the bus daemon can be faster, maybe
say 1.5x raw sockets instead of 2.5x, or whatever - something on that
order. Should probably simply stipulate this for discussion purposes:
someone could optimize the crap out of the bus daemon. The kdbus
question is about whether to eliminate this daemon entirely.


As I'm seeing things, we aren't talking about 1.5x vs 2.5x, we're talking about 
1000x


If the examples that are being used to show the performance advantage of kdbus 
vs normal dbus are doing the wrong thing, then we need to get some other 
examples available to people who don't live and breath dbus that 'so things 
right' so that the kernel developers can see what you think is the real problem 
and how kdbus addresses it.


So far, this 'wrong' example is the only thing that's been posted to show the 
performance advantage of kdbus.


David Lang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] kdbus for 4.1-rc1

2015-04-27 Thread David Lang

On Mon, 27 Apr 2015, Lukasz Skalski wrote:


Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On 04/24/2015 09:25 PM, Greg Kroah-Hartman wrote:

On Fri, Apr 24, 2015 at 04:34:34PM +0200, Lukasz Skalski wrote:

On 04/24/2015 04:19 PM, Havoc Pennington wrote:

On Fri, Apr 24, 2015 at 9:50 AM, Lukasz Skalski  wrote:

- client: http://fpaste.org/215156/



Cool - it might also be interesting to try this without blocking round
trips, i.e. send requests as quickly as you can, and collect replies
asynchronously. That's how people ideally use dbus. It should
certainly reduce the total benchmark time, but just wondering if this
usage increases or decreases the delta between userspace daemon and
kdbus.


No problem - I'll prepare also asynchronous version.


That would be great to see as well.  Many thanks for doing this work.


As it was proposed by Havoc and Greg I've created simple benchmark for
asynchronous calls:

- server: http://fpaste.org/215157/ (the same as in the previous test)
- client: http://fpaste.org/215724/ (asynchronous version)

For asynchronous version of client I had to decrease number of calls to
128 (for synchronous version it was x2 calls), otherwise we can
exceed the maximum number of pending replies per connection.


aren't we being told that part of the reason for needing kdbus is that 
thousands, or tens of thousands of messages are being spewed out? how does 
limiting it to 128 messages represent real-life if this is the case?


David Lang


The test results are following:

+--+++
|  |Elapsed time|Elapsed time|
| Message size |  GLIB WITH NATIVE  | GLIB + DBUS-DAEMON |
|   [bytes]|KDBUS SUPPORT*  ||
+--+++
|  |1) 0.018639 s   |1) 0.029947 s   |
| 1000 |2) 0.017045 s   |2) 0.032812 s   |
|  |3) 0.017490 s   |3) 0.029971 s   |
|  |4) 0.018001 s   |4) 0.026485 s   |
+--+++
|  |3) 0.019898 s   |3) 0.040914 s   |
|1 |3) 0.022187 s   |3) 0.033604 s   |
|  |3) 0.020854 s   |3) 0.037616 s   |
|  |3) 0.020020 s   |3) 0.033772 s   |
+--+++
*all tests performed without using memfd mechanism.

And as I wrote in my previous mail, kdbus transport for GLib is not
finished yet and there are still some places for improvements, so please
do not treat these test results as final).



greg k-h



Cheers,


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] kdbus for 4.1-rc1

2015-04-27 Thread David Lang

On Mon, 27 Apr 2015, Lukasz Skalski wrote:


Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On 04/24/2015 09:25 PM, Greg Kroah-Hartman wrote:

On Fri, Apr 24, 2015 at 04:34:34PM +0200, Lukasz Skalski wrote:

On 04/24/2015 04:19 PM, Havoc Pennington wrote:

On Fri, Apr 24, 2015 at 9:50 AM, Lukasz Skalski l.skal...@samsung.com wrote:

- client: http://fpaste.org/215156/



Cool - it might also be interesting to try this without blocking round
trips, i.e. send requests as quickly as you can, and collect replies
asynchronously. That's how people ideally use dbus. It should
certainly reduce the total benchmark time, but just wondering if this
usage increases or decreases the delta between userspace daemon and
kdbus.


No problem - I'll prepare also asynchronous version.


That would be great to see as well.  Many thanks for doing this work.


As it was proposed by Havoc and Greg I've created simple benchmark for
asynchronous calls:

- server: http://fpaste.org/215157/ (the same as in the previous test)
- client: http://fpaste.org/215724/ (asynchronous version)

For asynchronous version of client I had to decrease number of calls to
128 (for synchronous version it was x2 calls), otherwise we can
exceed the maximum number of pending replies per connection.


aren't we being told that part of the reason for needing kdbus is that 
thousands, or tens of thousands of messages are being spewed out? how does 
limiting it to 128 messages represent real-life if this is the case?


David Lang


The test results are following:

+--+++
|  |Elapsed time|Elapsed time|
| Message size |  GLIB WITH NATIVE  | GLIB + DBUS-DAEMON |
|   [bytes]|KDBUS SUPPORT*  ||
+--+++
|  |1) 0.018639 s   |1) 0.029947 s   |
| 1000 |2) 0.017045 s   |2) 0.032812 s   |
|  |3) 0.017490 s   |3) 0.029971 s   |
|  |4) 0.018001 s   |4) 0.026485 s   |
+--+++
|  |3) 0.019898 s   |3) 0.040914 s   |
|1 |3) 0.022187 s   |3) 0.033604 s   |
|  |3) 0.020854 s   |3) 0.037616 s   |
|  |3) 0.020020 s   |3) 0.033772 s   |
+--+++
*all tests performed without using memfd mechanism.

And as I wrote in my previous mail, kdbus transport for GLib is not
finished yet and there are still some places for improvements, so please
do not treat these test results as final).



greg k-h



Cheers,


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Trusted kernel patchset

2015-03-16 Thread David Lang

On Mon, 16 Mar 2015, Matthew Garrett wrote:


On Mon, 2015-03-16 at 14:45 +, One Thousand Gnomes wrote:

On Fri, 13 Mar 2015 11:38:16 -1000
Matthew Garrett  wrote:


4) Used the word "measured"

Nothing is being measured.


Nothing is being trusted either. It's simple ensuring you probably have
the same holes as before.

Also the boot loader should be measuring the kernel before it runs it,
thats how it knows the signature is correct.


That's one implementation. Another is the kernel being stored on
non-volatile media.


Anything that encourages deploying systems that can't be upgraded to fix bugs 
that are discovered is a problem.


This is an issue that the Internet of Things folks are just starting to notice, 
and it's only going to get worse before it gets better.


How do you patch bugs on your non-volitile media? What keeps that mechansim from 
being abused.


David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Trusted kernel patchset

2015-03-16 Thread David Lang

On Mon, 16 Mar 2015, Matthew Garrett wrote:


On Mon, 2015-03-16 at 14:45 +, One Thousand Gnomes wrote:

On Fri, 13 Mar 2015 11:38:16 -1000
Matthew Garrett matthew.garr...@nebula.com wrote:


4) Used the word measured

Nothing is being measured.


Nothing is being trusted either. It's simple ensuring you probably have
the same holes as before.

Also the boot loader should be measuring the kernel before it runs it,
thats how it knows the signature is correct.


That's one implementation. Another is the kernel being stored on
non-volatile media.


Anything that encourages deploying systems that can't be upgraded to fix bugs 
that are discovered is a problem.


This is an issue that the Internet of Things folks are just starting to notice, 
and it's only going to get worse before it gets better.


How do you patch bugs on your non-volitile media? What keeps that mechansim from 
being abused.


David Lang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2015-03-04 Thread David Lang

On Wed, 4 Mar 2015, Luke Kenneth Casson Leighton wrote:




and why he concludes that having a single hierarchy for all resource types.


correcting to add "is not always a good idea"



i think having a single hierarchy is fine *if* and only if it is
possible to overlay something similar to SE/Linux policy files -
enforced by the kernel *not* by userspace (sorry serge!) - such that
through those policy files any type of hierarchy be it single or multi
layer, recursive or in fact absolutely anything, may be emulated and
properly enforced.


The fundamental problem is that sometimes you have types of controls that are 
orthoginal to each other, and you either manage the two types of things in 
separate hierarchies, or you end up with one hierarchy that is a permutation of 
all the combinations of what would have been separate hierarchies.


David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2015-03-04 Thread David Lang

On Wed, 4 Mar 2015, Luke Kenneth Casson Leighton wrote:




and why he concludes that having a single hierarchy for all resource types.


correcting to add is not always a good idea



i think having a single hierarchy is fine *if* and only if it is
possible to overlay something similar to SE/Linux policy files -
enforced by the kernel *not* by userspace (sorry serge!) - such that
through those policy files any type of hierarchy be it single or multi
layer, recursive or in fact absolutely anything, may be emulated and
properly enforced.


The fundamental problem is that sometimes you have types of controls that are 
orthoginal to each other, and you either manage the two types of things in 
separate hierarchies, or you end up with one hierarchy that is a permutation of 
all the combinations of what would have been separate hierarchies.


David Lang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2015-03-03 Thread David Lang

On Tue, 3 Mar 2015, Luke Leighton wrote:


I wrote about that many times, but here are two of the problems.

* There's no way to designate a cgroup to a resource, because cgroup
  is only defined by the combination of who's looking at it for which
  controller.  That's how you end up with tagging the same resource
  multiple times for different controllers and even then it's broken
  as when you move resources from one cgroup to another, you can't
  tell what to do with other tags.

  While allowing obscene level of flexibility, multiple hierarchies
  destroy a very fundamental concept that it *should* provide - that
  of a resource container.  It can't because a "cgroup" is undefined
  under multiple hierarchies.


ok, there is an alternative to hierarchies, which has precedent
(and, importantly, a set of userspace management tools as well as
 existing code in the linux kernel), and it's the FLASK model which
 you know as SE/Linux.

whilst the majority of people view management to be "hierarchical"
(so there is a top dog or God process and everything trickles down
 from that), this is viewed as such an anathema in the security
industry that someone came up with a formal specification for the
real-world way in which permissions are managed, and it's called the
FLASK model.


On this topic it's also worth reading Neil Brown's series of articles on this 
over at http://lwn.net/Articles/604609/ and why he concludes that having a 
single hierarchy for all resource types.


David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2015-03-03 Thread David Lang

On Tue, 3 Mar 2015, Luke Leighton wrote:


I wrote about that many times, but here are two of the problems.

* There's no way to designate a cgroup to a resource, because cgroup
  is only defined by the combination of who's looking at it for which
  controller.  That's how you end up with tagging the same resource
  multiple times for different controllers and even then it's broken
  as when you move resources from one cgroup to another, you can't
  tell what to do with other tags.

  While allowing obscene level of flexibility, multiple hierarchies
  destroy a very fundamental concept that it *should* provide - that
  of a resource container.  It can't because a cgroup is undefined
  under multiple hierarchies.


ok, there is an alternative to hierarchies, which has precedent
(and, importantly, a set of userspace management tools as well as
 existing code in the linux kernel), and it's the FLASK model which
 you know as SE/Linux.

whilst the majority of people view management to be hierarchical
(so there is a top dog or God process and everything trickles down
 from that), this is viewed as such an anathema in the security
industry that someone came up with a formal specification for the
real-world way in which permissions are managed, and it's called the
FLASK model.


On this topic it's also worth reading Neil Brown's series of articles on this 
over at http://lwn.net/Articles/604609/ and why he concludes that having a 
single hierarchy for all resource types.


David Lang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   >