Re: [PATCH] Shrink ext3_inode_info by 8 bytes for !POSIX_ACL.
On Fri, January 18, 2008 20:16, Mingming Cao wrote: > On Sat, 2008-01-12 at 21:35 +0100, Indan Zupancic wrote: >> i_file_acl and i_dir_acl aren't always needed. >> >> With certain configs this makes 10 ext3_inode_cache objects fit in >> one slab instead of the current 9, as the size shrinks from 416 to >> 408 bytes for 32 bit, !POSIX_ACL and !EXT3_FS_XATTR configs. >> >> Signed-off-by: Indan Zupancic <[EMAIL PROTECTED]> >> --- >> fs/ext3/ialloc.c |2 ++ >> fs/ext3/inode.c | 29 +++-- >> include/linux/ext3_fs_i.h |2 ++ >> 3 files changed, 23 insertions(+), 10 deletions(-) >> >> diff --git a/fs/ext3/ialloc.c b/fs/ext3/ialloc.c >> index 1bc8cd8..01745bc 100644 >> --- a/fs/ext3/ialloc.c >> +++ b/fs/ext3/ialloc.c >> @@ -574,8 +574,10 @@ got: >> ei->i_frag_no = 0; >> ei->i_frag_size = 0; >> #endif >> +#ifdef CONFIG_EXT3_FS_POSIX_ACL >> ei->i_file_acl = 0; >> ei->i_dir_acl = 0; >> +#endif > > For regular file, i_dir_acl is being reused as i_size_high to support > large file. Only the i_dir_acl of struct ext3_inode, not the one from ext3_inode_info. Thanks, Indan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Shrink ext3_inode_info by 8 bytes for !POSIX_ACL.
On Fri, January 18, 2008 20:16, Mingming Cao wrote: On Sat, 2008-01-12 at 21:35 +0100, Indan Zupancic wrote: i_file_acl and i_dir_acl aren't always needed. With certain configs this makes 10 ext3_inode_cache objects fit in one slab instead of the current 9, as the size shrinks from 416 to 408 bytes for 32 bit, !POSIX_ACL and !EXT3_FS_XATTR configs. Signed-off-by: Indan Zupancic [EMAIL PROTECTED] --- fs/ext3/ialloc.c |2 ++ fs/ext3/inode.c | 29 +++-- include/linux/ext3_fs_i.h |2 ++ 3 files changed, 23 insertions(+), 10 deletions(-) diff --git a/fs/ext3/ialloc.c b/fs/ext3/ialloc.c index 1bc8cd8..01745bc 100644 --- a/fs/ext3/ialloc.c +++ b/fs/ext3/ialloc.c @@ -574,8 +574,10 @@ got: ei-i_frag_no = 0; ei-i_frag_size = 0; #endif +#ifdef CONFIG_EXT3_FS_POSIX_ACL ei-i_file_acl = 0; ei-i_dir_acl = 0; +#endif For regular file, i_dir_acl is being reused as i_size_high to support large file. Only the i_dir_acl of struct ext3_inode, not the one from ext3_inode_info. Thanks, Indan -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Shrink ext3_inode_info by 8 bytes for !POSIX_ACL.
i_file_acl and i_dir_acl aren't always needed. With certain configs this makes 10 ext3_inode_cache objects fit in one slab instead of the current 9, as the size shrinks from 416 to 408 bytes for 32 bit, !POSIX_ACL and !EXT3_FS_XATTR configs. Signed-off-by: Indan Zupancic <[EMAIL PROTECTED]> --- fs/ext3/ialloc.c |2 ++ fs/ext3/inode.c | 29 +++-- include/linux/ext3_fs_i.h |2 ++ 3 files changed, 23 insertions(+), 10 deletions(-) diff --git a/fs/ext3/ialloc.c b/fs/ext3/ialloc.c index 1bc8cd8..01745bc 100644 --- a/fs/ext3/ialloc.c +++ b/fs/ext3/ialloc.c @@ -574,8 +574,10 @@ got: ei->i_frag_no = 0; ei->i_frag_size = 0; #endif +#ifdef CONFIG_EXT3_FS_POSIX_ACL ei->i_file_acl = 0; ei->i_dir_acl = 0; +#endif ei->i_dtime = 0; ei->i_block_alloc_info = NULL; ei->i_block_group = group; diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c index 9b162cd..20a8aeb 100644 --- a/fs/ext3/inode.c +++ b/fs/ext3/inode.c @@ -46,9 +46,12 @@ static int ext3_writepage_trans_blocks(struct inode *inode); */ static int ext3_inode_is_fast_symlink(struct inode *inode) { - int ea_blocks = EXT3_I(inode)->i_file_acl ? - (inode->i_sb->s_blocksize >> 9) : 0; + int ea_blocks = 0; +#ifdef CONFIG_EXT3_FS_POSIX_ACL + if (EXT3_I(inode)->i_file_acl) + ea_blocks = inode->i_sb->s_blocksize >> 9; +#endif return (S_ISLNK(inode->i_mode) && inode->i_blocks - ea_blocks == 0); } @@ -2717,13 +2720,16 @@ void ext3_read_inode(struct inode * inode) ei->i_frag_no = raw_inode->i_frag; ei->i_frag_size = raw_inode->i_fsize; #endif - ei->i_file_acl = le32_to_cpu(raw_inode->i_file_acl); - if (!S_ISREG(inode->i_mode)) { - ei->i_dir_acl = le32_to_cpu(raw_inode->i_dir_acl); - } else { + if (S_ISREG(inode->i_mode)) { inode->i_size |= ((__u64)le32_to_cpu(raw_inode->i_size_high)) << 32; } +#ifdef CONFIG_EXT3_FS_POSIX_ACL + else { + ei->i_dir_acl = le32_to_cpu(raw_inode->i_dir_acl); + } + ei->i_file_acl = le32_to_cpu(raw_inode->i_file_acl); +#endif ei->i_disksize = inode->i_size; inode->i_generation = le32_to_cpu(raw_inode->i_generation); ei->i_block_group = iloc.block_group; @@ -2854,10 +2860,7 @@ static int ext3_do_update_inode(handle_t *handle, raw_inode->i_frag = ei->i_frag_no; raw_inode->i_fsize = ei->i_frag_size; #endif - raw_inode->i_file_acl = cpu_to_le32(ei->i_file_acl); - if (!S_ISREG(inode->i_mode)) { - raw_inode->i_dir_acl = cpu_to_le32(ei->i_dir_acl); - } else { + if (S_ISREG(inode->i_mode)) { raw_inode->i_size_high = cpu_to_le32(ei->i_disksize >> 32); if (ei->i_disksize > 0x7fffULL) { @@ -2883,6 +2886,12 @@ static int ext3_do_update_inode(handle_t *handle, } } } +#ifdef CONFIG_EXT3_FS_POSIX_ACL + else { + raw_inode->i_dir_acl = cpu_to_le32(ei->i_dir_acl); + } + raw_inode->i_file_acl = cpu_to_le32(ei->i_file_acl); +#endif raw_inode->i_generation = cpu_to_le32(inode->i_generation); if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode)) { if (old_valid_dev(inode->i_rdev)) { diff --git a/include/linux/ext3_fs_i.h b/include/linux/ext3_fs_i.h index 7894dd0..9e7f1b6 100644 --- a/include/linux/ext3_fs_i.h +++ b/include/linux/ext3_fs_i.h @@ -75,8 +75,10 @@ struct ext3_inode_info { __u8i_frag_no; __u8i_frag_size; #endif +#ifdef CONFIG_EXT3_FS_POSIX_ACL ext3_fsblk_ti_file_acl; __u32 i_dir_acl; +#endif __u32 i_dtime; /* -- 1.5.3.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Shrink ext3_inode_info by 8 bytes for !POSIX_ACL.
i_file_acl and i_dir_acl aren't always needed. With certain configs this makes 10 ext3_inode_cache objects fit in one slab instead of the current 9, as the size shrinks from 416 to 408 bytes for 32 bit, !POSIX_ACL and !EXT3_FS_XATTR configs. Signed-off-by: Indan Zupancic [EMAIL PROTECTED] --- fs/ext3/ialloc.c |2 ++ fs/ext3/inode.c | 29 +++-- include/linux/ext3_fs_i.h |2 ++ 3 files changed, 23 insertions(+), 10 deletions(-) diff --git a/fs/ext3/ialloc.c b/fs/ext3/ialloc.c index 1bc8cd8..01745bc 100644 --- a/fs/ext3/ialloc.c +++ b/fs/ext3/ialloc.c @@ -574,8 +574,10 @@ got: ei-i_frag_no = 0; ei-i_frag_size = 0; #endif +#ifdef CONFIG_EXT3_FS_POSIX_ACL ei-i_file_acl = 0; ei-i_dir_acl = 0; +#endif ei-i_dtime = 0; ei-i_block_alloc_info = NULL; ei-i_block_group = group; diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c index 9b162cd..20a8aeb 100644 --- a/fs/ext3/inode.c +++ b/fs/ext3/inode.c @@ -46,9 +46,12 @@ static int ext3_writepage_trans_blocks(struct inode *inode); */ static int ext3_inode_is_fast_symlink(struct inode *inode) { - int ea_blocks = EXT3_I(inode)-i_file_acl ? - (inode-i_sb-s_blocksize 9) : 0; + int ea_blocks = 0; +#ifdef CONFIG_EXT3_FS_POSIX_ACL + if (EXT3_I(inode)-i_file_acl) + ea_blocks = inode-i_sb-s_blocksize 9; +#endif return (S_ISLNK(inode-i_mode) inode-i_blocks - ea_blocks == 0); } @@ -2717,13 +2720,16 @@ void ext3_read_inode(struct inode * inode) ei-i_frag_no = raw_inode-i_frag; ei-i_frag_size = raw_inode-i_fsize; #endif - ei-i_file_acl = le32_to_cpu(raw_inode-i_file_acl); - if (!S_ISREG(inode-i_mode)) { - ei-i_dir_acl = le32_to_cpu(raw_inode-i_dir_acl); - } else { + if (S_ISREG(inode-i_mode)) { inode-i_size |= ((__u64)le32_to_cpu(raw_inode-i_size_high)) 32; } +#ifdef CONFIG_EXT3_FS_POSIX_ACL + else { + ei-i_dir_acl = le32_to_cpu(raw_inode-i_dir_acl); + } + ei-i_file_acl = le32_to_cpu(raw_inode-i_file_acl); +#endif ei-i_disksize = inode-i_size; inode-i_generation = le32_to_cpu(raw_inode-i_generation); ei-i_block_group = iloc.block_group; @@ -2854,10 +2860,7 @@ static int ext3_do_update_inode(handle_t *handle, raw_inode-i_frag = ei-i_frag_no; raw_inode-i_fsize = ei-i_frag_size; #endif - raw_inode-i_file_acl = cpu_to_le32(ei-i_file_acl); - if (!S_ISREG(inode-i_mode)) { - raw_inode-i_dir_acl = cpu_to_le32(ei-i_dir_acl); - } else { + if (S_ISREG(inode-i_mode)) { raw_inode-i_size_high = cpu_to_le32(ei-i_disksize 32); if (ei-i_disksize 0x7fffULL) { @@ -2883,6 +2886,12 @@ static int ext3_do_update_inode(handle_t *handle, } } } +#ifdef CONFIG_EXT3_FS_POSIX_ACL + else { + raw_inode-i_dir_acl = cpu_to_le32(ei-i_dir_acl); + } + raw_inode-i_file_acl = cpu_to_le32(ei-i_file_acl); +#endif raw_inode-i_generation = cpu_to_le32(inode-i_generation); if (S_ISCHR(inode-i_mode) || S_ISBLK(inode-i_mode)) { if (old_valid_dev(inode-i_rdev)) { diff --git a/include/linux/ext3_fs_i.h b/include/linux/ext3_fs_i.h index 7894dd0..9e7f1b6 100644 --- a/include/linux/ext3_fs_i.h +++ b/include/linux/ext3_fs_i.h @@ -75,8 +75,10 @@ struct ext3_inode_info { __u8i_frag_no; __u8i_frag_size; #endif +#ifdef CONFIG_EXT3_FS_POSIX_ACL ext3_fsblk_ti_file_acl; __u32 i_dir_acl; +#endif __u32 i_dtime; /* -- 1.5.3.7 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RFC] Simple tamper-proof device filesystem.
Hi, On Fri, January 11, 2008 09:46, Tetsuo Handa wrote: > It depends. > Some users have to continue using brain dead legacy applications > without modification because ... > >the application's source code is not available. Source isn't needed, as long as the vendor has it. >the distributor no longer supports the application. Then why should anyone else support it? >the application is too difficult/complicated to reconstruct. Then you can't trust it and it shouldn't have permission to do potentially dangerous things in /dev/ either. Even if you can contain the device node creation, it most likely does other potentially dangerous things too. As a whole it can't be trusted. > I assume "a static /dev/" means a /dev/ directory in 2.4 kernels. > This filesystem's advantage: I'm not talking about devfs, I'm talking about a real static /dev. I'm using it now and it works fine (I let udev manage /udev/ to see what's it's doing). > (1) Can guarantee filename/attribute pairs. Wrong. All nodes are created and thus there's never a need to create new nodes. So /dev/ can't be modified by anyone. This works because all nodes that anyone might want to create already exist. > (2) Can keep nodes that needn't to be deleted/modified for read-only. This would also be true for all nodes in a static /dev I think. > (3) Can hide unwanted device nodes. In a static /dev you only create the nodes you want. It's true that it can't hide nodes for hardware that doesn't exist (other than deleting the nodes manually), but that was the norm for years before the whole dynamic /dev thing catched up. > I don't know. I'm not using rare software. It doesn't have to be rare, anything is fine. You don't know anything else than udev? (And shell commands like mknod etc.) Then why all the talk about mysterious apps that might need to do all kind of crazy things in /dev? > No. The kernel must be involved. > Who can guarantee that the daemon can access all namespaces? > The process who requests "mknod" and the process who performs "mknod" > are not always using the same namespace. This is true on a theoretical level. But practically I think you can either run multiple daemons, one for each namespace where you want to control /dev/, or if you really want one daemon you can pass the directory fd to it where the node should be created and use mknodat(). I believe that crosses namespaces correctly. If the daemon can't be contacted or doesn't want to do a mknod for you, the preloaded lib can fallback to doing the mknod itself, though normally that would be disallowed by MAC. But I think that the chance that any process needs to create device nodes in a chroot is at the level of fairy existance. > If "foo" or "bar" is a statically linked or suid-root application > (where LD_PRELOAD is ignored), they would attempt to create device nodes > directly (i.e. call sys_mknod() instead of communicating with the daemon) > and abort due to failure. > Not only applications who wants to create device nodes in /dev/ , > but also all applications who wants to modify entries in /dev/ . If the preloaded library is setuid, it will also work for setuid programs. It's true that it won't work for statically linked apps, but so what? Device node creating apps are rare enough, let alone the ones that are also statically linked. Nice theoretical problem, but I don't think anyone will care in practice. > From the beginning, the kernel is deeply involved because in-kernel MAC > is essential to realize "only the tiny daemon can modify /dev/". > Why not do this "filename/attribute" checking in the kernel too? That "only the tiny daemon can modify /dev/" is done with MAC rules, the ones that should be the default for all applications except udev by default already. For teh kernel nothing changes. >> The ammount of code will be the current parsing code + a few hundred >> lines of code, including the preloaded library. > > You will be bothered with "what is the realpath of /dev/null?" and > "how can I reach the realpath?" because you have to manage > namespace information. Or ignore the problem and see if it's a real problem or a nice theoretical case. And when it turns out to be a real problem, there are probably ways to fix it (See above). But you know what exactly is needed only after problems do turn up. > OK. I'll consider adding this feature. > But I'd like to use approach (B) to keep the advantage (3). > > (A) White-listing + Black-listing approach. > > "Permit any operations if the filename didn't appear > in the configuration file". > > (B) White-listing + Wild-card approach. > > "Support wildcard and permit only operations if > the filename-with-wildcard/attributes-with-wildcard appeared > in the configuration file". With this the filesystem at least adds some unique abilities. If anyone really needs it and where/how it should be implemented is another matter. Without it it's a glorified and complicated
Re: [PATCH][RFC] Simple tamper-proof device filesystem.
Hi, On Fri, January 11, 2008 09:46, Tetsuo Handa wrote: It depends. Some users have to continue using brain dead legacy applications without modification because ... the application's source code is not available. Source isn't needed, as long as the vendor has it. the distributor no longer supports the application. Then why should anyone else support it? the application is too difficult/complicated to reconstruct. Then you can't trust it and it shouldn't have permission to do potentially dangerous things in /dev/ either. Even if you can contain the device node creation, it most likely does other potentially dangerous things too. As a whole it can't be trusted. I assume a static /dev/ means a /dev/ directory in 2.4 kernels. This filesystem's advantage: I'm not talking about devfs, I'm talking about a real static /dev. I'm using it now and it works fine (I let udev manage /udev/ to see what's it's doing). (1) Can guarantee filename/attribute pairs. Wrong. All nodes are created and thus there's never a need to create new nodes. So /dev/ can't be modified by anyone. This works because all nodes that anyone might want to create already exist. (2) Can keep nodes that needn't to be deleted/modified for read-only. This would also be true for all nodes in a static /dev I think. (3) Can hide unwanted device nodes. In a static /dev you only create the nodes you want. It's true that it can't hide nodes for hardware that doesn't exist (other than deleting the nodes manually), but that was the norm for years before the whole dynamic /dev thing catched up. I don't know. I'm not using rare software. It doesn't have to be rare, anything is fine. You don't know anything else than udev? (And shell commands like mknod etc.) Then why all the talk about mysterious apps that might need to do all kind of crazy things in /dev? No. The kernel must be involved. Who can guarantee that the daemon can access all namespaces? The process who requests mknod and the process who performs mknod are not always using the same namespace. This is true on a theoretical level. But practically I think you can either run multiple daemons, one for each namespace where you want to control /dev/, or if you really want one daemon you can pass the directory fd to it where the node should be created and use mknodat(). I believe that crosses namespaces correctly. If the daemon can't be contacted or doesn't want to do a mknod for you, the preloaded lib can fallback to doing the mknod itself, though normally that would be disallowed by MAC. But I think that the chance that any process needs to create device nodes in a chroot is at the level of fairy existance. If foo or bar is a statically linked or suid-root application (where LD_PRELOAD is ignored), they would attempt to create device nodes directly (i.e. call sys_mknod() instead of communicating with the daemon) and abort due to failure. Not only applications who wants to create device nodes in /dev/ , but also all applications who wants to modify entries in /dev/ . If the preloaded library is setuid, it will also work for setuid programs. It's true that it won't work for statically linked apps, but so what? Device node creating apps are rare enough, let alone the ones that are also statically linked. Nice theoretical problem, but I don't think anyone will care in practice. From the beginning, the kernel is deeply involved because in-kernel MAC is essential to realize only the tiny daemon can modify /dev/. Why not do this filename/attribute checking in the kernel too? That only the tiny daemon can modify /dev/ is done with MAC rules, the ones that should be the default for all applications except udev by default already. For teh kernel nothing changes. The ammount of code will be the current parsing code + a few hundred lines of code, including the preloaded library. You will be bothered with what is the realpath of /dev/null? and how can I reach the realpath? because you have to manage namespace information. Or ignore the problem and see if it's a real problem or a nice theoretical case. And when it turns out to be a real problem, there are probably ways to fix it (See above). But you know what exactly is needed only after problems do turn up. OK. I'll consider adding this feature. But I'd like to use approach (B) to keep the advantage (3). (A) White-listing + Black-listing approach. Permit any operations if the filename didn't appear in the configuration file. (B) White-listing + Wild-card approach. Support wildcard and permit only operations if the filename-with-wildcard/attributes-with-wildcard appeared in the configuration file. With this the filesystem at least adds some unique abilities. If anyone really needs it and where/how it should be implemented is another matter. Without it it's a glorified and complicated drop-in replacement for a static /dev/. Regards, Indan -- To
Re: [PATCH][RFC] Simple tamper-proof device filesystem.
On Thu, January 10, 2008 05:57, Tetsuo Handa wrote: > It seems to me that the alternatives you are proposing include > modification of userland applications. But my assumption is > that "Don't require modification of userland applications". If you want a secure system it isn't that unreasonable to expect applications to not do brain dead things, so not requiring any modifications or config changes seems a bit optimistic to me. > In other words, I want to implement without asking applications > to use /dev/dynamic/ or something. > This filesystem is intended to provide support for legacy applications. > (In fact, this filesystem in TOMOYO Linux is for kernel 2.4.30/2.6.11 and > later.) Legacy applications should cope with a static /dev/. What is the advantage of your filesystem compared to a static /dev/? > Use of a tiny daemon that communicates with udev is not sufficient. > The udev is not the only application that modifies /dev files. Oh, it isn't? Which other applications do modify /dev files? I'd like to hear about a few, no matter how obscure or proprietary. And please tell how many of those will stop working with a static /dev with all nodes they might create already existing. > At least, the tiny daemon should communicate with the kernel > so that all requests are checked by the tiny daemon. No, why should the kernel be involved? The tiny daemon would be the only one allowed to modify /dev/, so all mknod commands will be done by it. Of course it means that you might need to modify the two or three apps wanting to create device nodes, or you can make an LD_PRELOAD lib that intercepts mknod commands and sends them to the daemon. The ammount of code will be the current parsing code + a few hundred lines of code, including the preloaded library. > But use of the tiny daemon (which is a process running in userland) > causes a lot of troubles. No, it doesn't, and most of those problems are true for all programs that access /dev! If those are straced or whatever they can be forced to open the wrong file, practically breaking the filename/attribute pairs. So all security you think you need to have for the daemon process is the same security you already need for all processes anyway to protect them against each other. >> If an administrator wants something else than >> 3 or 5, you're breaking something. > That's the fate of white-list based access control. > > Does this filesystem sound too strict to support dynamic device? > May be this filesystem should be able to permit creation of device > nodes that are not listed in the policy file. Actually, I assumed that was the case, because if it's strictly white-list based it's almost the same as a static /dev with some nodes hidden. Without it has even less value, because it just complicates matters compared to a normal static dev. I thought it checked that if a device name was in the list, it has the correct attributes, and was free to create nodes without restricted names. > From your next posting: >> But I think doing more is getting ridiculous, because if a process can >> create a device node, it can also access it and do whatever harm could >> be done by the confusion caused by unexpected name/attribute pairs. > > FYI. Being able to create a device node is different from being able to access > it and do whatever harm. You will need read and/or write permission to open > that device. Yes, but as the process creates the device it can also choose the file mode and probably also ownership. And as it creates a new file there likely aren't strict MAC rules in place restricting the process from reading or writing to it. So yes, you're right, but in practise it isn't as easy to close that hole, especially not if the applications isn't very clean and single purpose. If it creates the node it probably wans to use it too, and that means read/write access. Even if it can live without it, it could give access to the node to another process and let the other process do the dirty work. Very tricky. Greetings, Indan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RFC] Simple tamper-proof device filesystem.
On Thu, January 10, 2008 00:08, Serge E. Hallyn wrote: > These emails again are getting really long, but I think the gist of > Indan's suggestion can be concisely summarized: No worry, I wasn't planning on extending it, I've said what I've to say. Except... > > "To confine process P3 to /dev/hda2 being 'b 3 2', create > /dev/p3, launch P3 in a new mounts namespace, mount --bind > /dev/p3 /dev, exec what you want p3 running, and have > MAC prevent umount /dev/p3." > > This is a neat idea, but Tetsuo's rebutall is > > "P3 may be legacy code needing to create or delete > /dev/floppy, where -EPERM confuses P3 and prevents > it working correctly." > > Indan's idea is interesting and I like it, but is there an answer to > Tetsuo's problem with it? ...that I didn't mean that, but a more simple /dev/ directory protected from any modifications by MAC, /dev/* all the nodes that need to have guaranteed name/attribute pairs, like /dev/null, /dev/zero, /dev/random, etc. and: /dev/dynamic/ being a dir where apps who really need to create/modify device nodes can do whatever they want to do. It can be multiple dirs too, like /dev/snd/, /dev/input/ etc. I guess this covers about 96% of the usecases of this tamper-proof dev fs. You can think of unlikely cases that aren't solved by this, but those can be solved in another way if really wanted (like a checking daemon, modified udev, shadow /dev/, to name a few). But I think doing more is getting ridiculous, because if a process can create a device node, it can also access it and do whatever harm could be done by the confusion caused by unexpected name/attribute pairs. As for information snooping, that's mostly about /dev/null or other things that are known beforehand. > PS - Indan, you also said in essence "if P3 can be trusted to create > /dev/floppy why can't it be trusted to create /dev/hda1". I trust that, > phrased that way, the question answers itself? Not exactly. If there's a process that dynamically created certain device nodes, and it wants to create one that doesn't fit the rules, you can't know if it's wrong or if your rules are wrong. The process has a certain policy of naming/creating the devices, but you also have a policy at the kernel side with this fs. If it mismatches you don't know which one is right. If you trust a process to create /dev/hd*, you can also trust it to create the proper /dev/hdXn, no need to verify if /dev/hda1 is really 3 1. The whole thing about filename/attribute pairs is that it's about what applications expect. There aren't many expectations about dynamically created device nodes which might not always be there, because their name isn't stable. The use case for this fs is a malicious app that can create device nodes, and we're worried about mismatching name/attribute pairs. Not about our data, or anything else. Call me an optimist, but I think you don't need to worry about name/attribute pairs. Greetings, Indan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RFC] Simple tamper-proof device filesystem.
Hello, On Wed, January 9, 2008 05:39, Tetsuo Handa wrote: > Hello. > > Indan Zupancic wrote: >> I think you focus too much on your way of enforcing filename/attributes >> pairs. > So? So that you miss alternatives and don't see the bigger picture. > >> The same can be achieved by creating the device nodes with >> expected attributes, and preventing processes from changing those files. > The device nodes have to be deletable if some process (including udev) needs > to delete. > Thus, you cannot unconditionally prevent processes from changing those files. > >> This because expected combinations are known beforehand. > Yes. > >> And once those files are present, the MAC system used doesn't have to have >> special >> device nodes attributes support. Protecting those files is enough to >> guarantee filename/attributes pairs. > If MAC system needn't to support this filesystem's functionality, > who creates those files with warrantee of expected attributes? The udev does? > If udev is exploited, who can guarantee? The person that would write the config file for your fs, the one who wants that guarantee. > >> No, this is because rename permission was given for files that it shouldn't >> had. > Do you think all MAC implementation have the same granularity and > functionalities? > I don't think so. Not all MAC implementation can control with such > granularity. > This filesystem is designed to be combined with any MAC, > although the MAC used with this filesystem should be able to restrict > namespace manipulation requests so that this filesystem can remain /dev > and visible to userland applications. Good point, but I assume they all have at least a directory granularity, and then /dev/ can be static and udev and other can have free reign in e.g. /dev/dynamic/. Just use subdirs for the dynamic stuff and this granularity problem is, with slight inconvenience, solved. > >> Either you want a process to manage device names and attributes, and then >> you >> give it permission to do that, or you want to enforce certain >> filename/attribute >> pairs and then you just do it yourself. > If I modify udev to enforce certain filename/attribute pairs and the modified > udev > was exploited, who can guarantee? > "Don't trust userland application" is the basis of restricting access in > kernel space. > If you can trust userland application, you don't need in-kernel access > control. Funny, I thought that it was in the kernel because that's the way to protect processes against eachother, the fs against processes, and for performance reasons. Exploits are in code, and where that code is doesn't matter that much, either kernel or userspace, though if it's exploitable you'll rather not have it in the kernel. So I think it's more secure if the checking would be done by udev than in a special filesystem, even if that means that you're screwed if udev is exploited. Of course you fully trust your own code, naturally. A tiny daemon that communicates with udev and does the checking you have now, and if ok it creates the node is really not much more code than your fs, so as hard to exploit too. Then if udev is hacked you have the same guarantee as you have now. I can think of more alternatives that are as secure or more secure than the current solution. > > >> Will your filesystem prevent the trivial case of >> >> rm /dev/hda1 >> ln -s /dev/hda2 /dev/hda1 >> > Of course. To permit the above operation, the following permissions are > needed. > > hda1660 0 6 2 b 3 1 > hda1777 0 0 33 l . Yes, I should've read the code before asking that, instead of the other way round. > >> Rename permission can be given for /dev in general, but prohibited for >> certain files in /dev, the ones you want to have specific attributes. >> It isn't all or nothing. > Do you think all MAC implementation can prohibit renaming for certain files in > /dev ? > >> It's "forbid modifying certain nodes that process needn't to modify" >> versus "forbid breaking filename/attribute pairs of certain nodes". >> >> Both have the same effect, except that the first one is generic and >> can be done by existing MAC systems, while the second one needs >> a special filesystem and a handful of MAC rules to make it effective. > Do you think all MAC implementation can do? > I think the first one is implementation specific and the second one is > generic. Protecting certain files from being modified seems to me more generic than enforcing filename/attributes pairs on device nodes. And if they can't do it surely they can do it per directory, and the using su
Re: [PATCH][RFC] Simple tamper-proof device filesystem.
Hello, On Wed, January 9, 2008 05:39, Tetsuo Handa wrote: Hello. Indan Zupancic wrote: I think you focus too much on your way of enforcing filename/attributes pairs. So? So that you miss alternatives and don't see the bigger picture. The same can be achieved by creating the device nodes with expected attributes, and preventing processes from changing those files. The device nodes have to be deletable if some process (including udev) needs to delete. Thus, you cannot unconditionally prevent processes from changing those files. This because expected combinations are known beforehand. Yes. And once those files are present, the MAC system used doesn't have to have special device nodes attributes support. Protecting those files is enough to guarantee filename/attributes pairs. If MAC system needn't to support this filesystem's functionality, who creates those files with warrantee of expected attributes? The udev does? If udev is exploited, who can guarantee? The person that would write the config file for your fs, the one who wants that guarantee. No, this is because rename permission was given for files that it shouldn't had. Do you think all MAC implementation have the same granularity and functionalities? I don't think so. Not all MAC implementation can control with such granularity. This filesystem is designed to be combined with any MAC, although the MAC used with this filesystem should be able to restrict namespace manipulation requests so that this filesystem can remain /dev and visible to userland applications. Good point, but I assume they all have at least a directory granularity, and then /dev/ can be static and udev and other can have free reign in e.g. /dev/dynamic/. Just use subdirs for the dynamic stuff and this granularity problem is, with slight inconvenience, solved. Either you want a process to manage device names and attributes, and then you give it permission to do that, or you want to enforce certain filename/attribute pairs and then you just do it yourself. If I modify udev to enforce certain filename/attribute pairs and the modified udev was exploited, who can guarantee? Don't trust userland application is the basis of restricting access in kernel space. If you can trust userland application, you don't need in-kernel access control. Funny, I thought that it was in the kernel because that's the way to protect processes against eachother, the fs against processes, and for performance reasons. Exploits are in code, and where that code is doesn't matter that much, either kernel or userspace, though if it's exploitable you'll rather not have it in the kernel. So I think it's more secure if the checking would be done by udev than in a special filesystem, even if that means that you're screwed if udev is exploited. Of course you fully trust your own code, naturally. A tiny daemon that communicates with udev and does the checking you have now, and if ok it creates the node is really not much more code than your fs, so as hard to exploit too. Then if udev is hacked you have the same guarantee as you have now. I can think of more alternatives that are as secure or more secure than the current solution. Will your filesystem prevent the trivial case of rm /dev/hda1 ln -s /dev/hda2 /dev/hda1 Of course. To permit the above operation, the following permissions are needed. hda1660 0 6 2 b 3 1 hda1777 0 0 33 l . Yes, I should've read the code before asking that, instead of the other way round. Rename permission can be given for /dev in general, but prohibited for certain files in /dev, the ones you want to have specific attributes. It isn't all or nothing. Do you think all MAC implementation can prohibit renaming for certain files in /dev ? It's forbid modifying certain nodes that process needn't to modify versus forbid breaking filename/attribute pairs of certain nodes. Both have the same effect, except that the first one is generic and can be done by existing MAC systems, while the second one needs a special filesystem and a handful of MAC rules to make it effective. Do you think all MAC implementation can do? I think the first one is implementation specific and the second one is generic. Protecting certain files from being modified seems to me more generic than enforcing filename/attributes pairs on device nodes. And if they can't do it surely they can do it per directory, and the using subdirs solves it. It doesn't matter where they are, it's that a different fs than yours could be mounted over it. You say a MAC can prevent that from happening, but a MAC can also prevent all processes except for udev from modifying /dev. But MAC cannot prevent udev from modifying /dev . And what if exploited? Not all MAC can enforce access control over all processes with the granularity you are talking. And what if a process that cannot
Re: [PATCH][RFC] Simple tamper-proof device filesystem.
On Thu, January 10, 2008 00:08, Serge E. Hallyn wrote: These emails again are getting really long, but I think the gist of Indan's suggestion can be concisely summarized: No worry, I wasn't planning on extending it, I've said what I've to say. Except... To confine process P3 to /dev/hda2 being 'b 3 2', create /dev/p3, launch P3 in a new mounts namespace, mount --bind /dev/p3 /dev, exec what you want p3 running, and have MAC prevent umount /dev/p3. This is a neat idea, but Tetsuo's rebutall is P3 may be legacy code needing to create or delete /dev/floppy, where -EPERM confuses P3 and prevents it working correctly. Indan's idea is interesting and I like it, but is there an answer to Tetsuo's problem with it? ...that I didn't mean that, but a more simple /dev/ directory protected from any modifications by MAC, /dev/* all the nodes that need to have guaranteed name/attribute pairs, like /dev/null, /dev/zero, /dev/random, etc. and: /dev/dynamic/ being a dir where apps who really need to create/modify device nodes can do whatever they want to do. It can be multiple dirs too, like /dev/snd/, /dev/input/ etc. I guess this covers about 96% of the usecases of this tamper-proof dev fs. You can think of unlikely cases that aren't solved by this, but those can be solved in another way if really wanted (like a checking daemon, modified udev, shadow /dev/, to name a few). But I think doing more is getting ridiculous, because if a process can create a device node, it can also access it and do whatever harm could be done by the confusion caused by unexpected name/attribute pairs. As for information snooping, that's mostly about /dev/null or other things that are known beforehand. PS - Indan, you also said in essence if P3 can be trusted to create /dev/floppy why can't it be trusted to create /dev/hda1. I trust that, phrased that way, the question answers itself? Not exactly. If there's a process that dynamically created certain device nodes, and it wants to create one that doesn't fit the rules, you can't know if it's wrong or if your rules are wrong. The process has a certain policy of naming/creating the devices, but you also have a policy at the kernel side with this fs. If it mismatches you don't know which one is right. If you trust a process to create /dev/hd*, you can also trust it to create the proper /dev/hdXn, no need to verify if /dev/hda1 is really 3 1. The whole thing about filename/attribute pairs is that it's about what applications expect. There aren't many expectations about dynamically created device nodes which might not always be there, because their name isn't stable. The use case for this fs is a malicious app that can create device nodes, and we're worried about mismatching name/attribute pairs. Not about our data, or anything else. Call me an optimist, but I think you don't need to worry about name/attribute pairs. Greetings, Indan -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RFC] Simple tamper-proof device filesystem.
Hi Tetsuo, I think you focus too much on your way of enforcing filename/attributes pairs. The same can be achieved by creating the device nodes with expected attributes, and preventing processes from changing those files. This because expected combinations are known beforehand. And once those files are present, the MAC system used doesn't have to have special device nodes attributes support. Protecting those files is enough to guarantee filename/attributes pairs. On Tue, January 8, 2008 14:50, Tetsuo Handa wrote: > Hello. > > > Indan Zupancic wrote: >> > I want to use this filesystem in case where a process with root privilege >> was >> > hijacked but the behavior of the hijacked process is still restricted by >> MAC. >> >> 1) If the behaviour can be controlled, why can't the process be >>disallowed to change anything badly in /dev? Like disallowing anything >>from modifying existing nodes that weren't created by that process. >>That would have practically the same effect as your filesystem, >>won't it? > MAC system can prevent hijacked processes from changing anything badly in /dev > . > But MAC system can't prevent hijacked processes from doing > "mv /dev/hda1 /dev/hda1.tmp; mv /dev/hda2 /dev/hda1; mv /dev/hda1.tmp > /dev/hda2" > if permissions to rename device nodes in /dev are given to hijacked processes. > This is because MAC implementation doesn't check filename/attribute pairs. No, this is because rename permission was given for files that it shouldn't had. Either you want a process to manage device names and attributes, and then you give it permission to do that, or you want to enforce certain filename/attribute pairs and then you just do it yourself. Will your filesystem prevent the trivial case of rm /dev/hda1 ln -s /dev/hda2 /dev/hda1 > > But this filesystem can prevent hijacked processes from doing > "mv /dev/hda1 /dev/hda1.tmp; mv /dev/hda2 /dev/hda1; mv /dev/hda1.tmp > /dev/hda2" > even if permissions to rename device nodes in /dev are given to hijacked > processes. Rename permission can be given for /dev in general, but prohibited for certain files in /dev, the ones you want to have specific attributes. It isn't all or nothing. > > This filesystem is not designed to > "forbid modifying nodes if that process needn't to modify nodes". > This filesystem is designed to > "forbid breaking filename/attribute pairs of nodes > even if that process need to (or permitted to) modify nodes". It's "forbid modifying certain nodes that process needn't to modify" versus "forbid breaking filename/attribute pairs of certain nodes". Both have the same effect, except that the first one is generic and can be done by existing MAC systems, while the second one needs a special filesystem and a handful of MAC rules to make it effective. >> 2) The MAC system may not be able to guarantee certain combinations >>of device names and properties, but isn't that policy that shouldn't >>be in the kernel anyway? But if it is, shouldn't all device nodes be >>checked? That is, shouldn't it be a global check instead of a filesystem >>specific one? > I think the reason why MAC system doesn't handle filename/attributes pairs is > that: > > Filename and its attributes pairs are conventionally considered as > constant and reliable. > > It makes the MAC's policy syntax complicated to describe this attribute > enforcement information in MAC's policy. > > Thus, this should be a global check. But usually device nodes are only in /dev > . It doesn't matter where they are, it's that a different fs than yours could be mounted over it. You say a MAC can prevent that from happening, but a MAC can also prevent all processes except for udev from modifying /dev. Done globally instead of as a filesystem it can actually guarantee name/attr pairs, now it can't even do that on its own. > >> 3) Code efficiency. Thousand lines of code just to close one very specific >>attack, which can be done in lots of different other ways that all need >>to be prevented by the MAC system. (mounting over it, intercepting open >>calls, duping the fd, etc.) Is it worth it? > This filesystem is doing what MAC system is not doing. > So, please don't complain about inability of this filesystem to close all > attacks. I don't. What I complain about is that it's too specific and does it one chosen job badly. It lacks abstraction. As far as I can see any decent MAC can achieve the same end result as your filesystem, without directly enforcing name/attr pairs. > You can use MAC system to prevent attackers from mounting other filesystem > over this filesystem. > > The filename/attribute pairs are so
Re: [PATCH][RFC] Simple tamper-proof device filesystem.
Hi Tetsuo, I think you focus too much on your way of enforcing filename/attributes pairs. The same can be achieved by creating the device nodes with expected attributes, and preventing processes from changing those files. This because expected combinations are known beforehand. And once those files are present, the MAC system used doesn't have to have special device nodes attributes support. Protecting those files is enough to guarantee filename/attributes pairs. On Tue, January 8, 2008 14:50, Tetsuo Handa wrote: Hello. Indan Zupancic wrote: I want to use this filesystem in case where a process with root privilege was hijacked but the behavior of the hijacked process is still restricted by MAC. 1) If the behaviour can be controlled, why can't the process be disallowed to change anything badly in /dev? Like disallowing anything from modifying existing nodes that weren't created by that process. That would have practically the same effect as your filesystem, won't it? MAC system can prevent hijacked processes from changing anything badly in /dev . But MAC system can't prevent hijacked processes from doing mv /dev/hda1 /dev/hda1.tmp; mv /dev/hda2 /dev/hda1; mv /dev/hda1.tmp /dev/hda2 if permissions to rename device nodes in /dev are given to hijacked processes. This is because MAC implementation doesn't check filename/attribute pairs. No, this is because rename permission was given for files that it shouldn't had. Either you want a process to manage device names and attributes, and then you give it permission to do that, or you want to enforce certain filename/attribute pairs and then you just do it yourself. Will your filesystem prevent the trivial case of rm /dev/hda1 ln -s /dev/hda2 /dev/hda1 But this filesystem can prevent hijacked processes from doing mv /dev/hda1 /dev/hda1.tmp; mv /dev/hda2 /dev/hda1; mv /dev/hda1.tmp /dev/hda2 even if permissions to rename device nodes in /dev are given to hijacked processes. Rename permission can be given for /dev in general, but prohibited for certain files in /dev, the ones you want to have specific attributes. It isn't all or nothing. This filesystem is not designed to forbid modifying nodes if that process needn't to modify nodes. This filesystem is designed to forbid breaking filename/attribute pairs of nodes even if that process need to (or permitted to) modify nodes. It's forbid modifying certain nodes that process needn't to modify versus forbid breaking filename/attribute pairs of certain nodes. Both have the same effect, except that the first one is generic and can be done by existing MAC systems, while the second one needs a special filesystem and a handful of MAC rules to make it effective. 2) The MAC system may not be able to guarantee certain combinations of device names and properties, but isn't that policy that shouldn't be in the kernel anyway? But if it is, shouldn't all device nodes be checked? That is, shouldn't it be a global check instead of a filesystem specific one? I think the reason why MAC system doesn't handle filename/attributes pairs is that: Filename and its attributes pairs are conventionally considered as constant and reliable. It makes the MAC's policy syntax complicated to describe this attribute enforcement information in MAC's policy. Thus, this should be a global check. But usually device nodes are only in /dev . It doesn't matter where they are, it's that a different fs than yours could be mounted over it. You say a MAC can prevent that from happening, but a MAC can also prevent all processes except for udev from modifying /dev. Done globally instead of as a filesystem it can actually guarantee name/attr pairs, now it can't even do that on its own. 3) Code efficiency. Thousand lines of code just to close one very specific attack, which can be done in lots of different other ways that all need to be prevented by the MAC system. (mounting over it, intercepting open calls, duping the fd, etc.) Is it worth it? This filesystem is doing what MAC system is not doing. So, please don't complain about inability of this filesystem to close all attacks. I don't. What I complain about is that it's too specific and does it one chosen job badly. It lacks abstraction. As far as I can see any decent MAC can achieve the same end result as your filesystem, without directly enforcing name/attr pairs. You can use MAC system to prevent attackers from mounting other filesystem over this filesystem. The filename/attribute pairs are something like system call entry tables. The application will go wrong if __NR_read is mapped to sys_write() and __NR_write is mapped to sys_read(). Userland applications access special functionalities (e.g. /dev/zero and /dev/random) by name (i.e. syscall numbers). Therefore, keeping the filename/attribute pairs tamper-proof is important. You recognize that there is a threat that device
Re: [PATCH][RFC] Simple tamper-proof device filesystem.
Hello, Some questions: On Sun, January 6, 2008 16:20, Tetsuo Handa wrote: > I want to use this filesystem in case where a process with root privilege was > hijacked but the behavior of the hijacked process is still restricted by MAC. 1) If the behaviour can be controlled, why can't the process be disallowed to change anything badly in /dev? Like disallowing anything from modifying existing nodes that weren't created by that process. That would have practically the same effect as your filesystem, won't it? Or phrased differently, if the MAC system used can't protect /dev, it won't be able to protect other directories either, and if it can't protect e.g. my homedir, doesn't it make the whole MAC system ineffective? And if the MAC system used is ineffective, your filesystem is useless and you've bigger problems to fix. 2) The MAC system may not be able to guarantee certain combinations of device names and properties, but isn't that policy that shouldn't be in the kernel anyway? But if it is, shouldn't all device nodes be checked? That is, shouldn't it be a global check instead of a filesystem specific one? 3) Code efficiency. Thousand lines of code just to close one very specific attack, which can be done in lots of different other ways that all need to be prevented by the MAC system. (mounting over it, intercepting open calls, duping the fd, etc.) Is it worth it? I really don't care how you try to protect your system, but I don't think this is an effective way to do it. Good luck, Indan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RFC] Simple tamper-proof device filesystem.
Hello, Some questions: On Sun, January 6, 2008 16:20, Tetsuo Handa wrote: I want to use this filesystem in case where a process with root privilege was hijacked but the behavior of the hijacked process is still restricted by MAC. 1) If the behaviour can be controlled, why can't the process be disallowed to change anything badly in /dev? Like disallowing anything from modifying existing nodes that weren't created by that process. That would have practically the same effect as your filesystem, won't it? Or phrased differently, if the MAC system used can't protect /dev, it won't be able to protect other directories either, and if it can't protect e.g. my homedir, doesn't it make the whole MAC system ineffective? And if the MAC system used is ineffective, your filesystem is useless and you've bigger problems to fix. 2) The MAC system may not be able to guarantee certain combinations of device names and properties, but isn't that policy that shouldn't be in the kernel anyway? But if it is, shouldn't all device nodes be checked? That is, shouldn't it be a global check instead of a filesystem specific one? 3) Code efficiency. Thousand lines of code just to close one very specific attack, which can be done in lots of different other ways that all need to be prevented by the MAC system. (mounting over it, intercepting open calls, duping the fd, etc.) Is it worth it? I really don't care how you try to protect your system, but I don't think this is an effective way to do it. Good luck, Indan -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.
Hi, On Mon, December 17, 2007 01:40, Tetsuo Handa wrote: > Hello. > > Indan Zupancic wrote: >> What prevents them from mounting tmpfs on top of /dev, bypassing your fs? > Mandatory access control (MAC) prevents them from mounting tmpfs on top of > /dev . > MAC mediates namespace manipulation requests such as mount()/umount(). > >> Also, if they have root there are plenty of ways to prevent an administrator >> from logging in, e.g. using iptables or changing the password. > MAC mediates execution of /sbin/iptables or /usr/bin/passwd . > > So, use of this filesystem alone is meaningless because > attackers with root privileges can do what you are saying. > But use of this filesystem with MAC is still valid because > MAC can prevent attackers with root privileges from doing what you are saying. If MAC can avoid all that, then why can't it also avoid tampering with /dev? What security does your filesystem add at all, if it's useless without a MAC doing all the hard work? I think you can better spend your time on read-only bind mounts. Greetings, Indan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.
Hi, On Mon, December 17, 2007 01:40, Tetsuo Handa wrote: Hello. Indan Zupancic wrote: What prevents them from mounting tmpfs on top of /dev, bypassing your fs? Mandatory access control (MAC) prevents them from mounting tmpfs on top of /dev . MAC mediates namespace manipulation requests such as mount()/umount(). Also, if they have root there are plenty of ways to prevent an administrator from logging in, e.g. using iptables or changing the password. MAC mediates execution of /sbin/iptables or /usr/bin/passwd . So, use of this filesystem alone is meaningless because attackers with root privileges can do what you are saying. But use of this filesystem with MAC is still valid because MAC can prevent attackers with root privileges from doing what you are saying. If MAC can avoid all that, then why can't it also avoid tampering with /dev? What security does your filesystem add at all, if it's useless without a MAC doing all the hard work? I think you can better spend your time on read-only bind mounts. Greetings, Indan -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.
Hi, On Sun, December 16, 2007 13:03, Tetsuo Handa wrote: > Hello. > > David Newall wrote: >> > You won't be able to login to the system because /sbin/mingetty >> > fails to "chown/chmod" /dev/tty* if /dev is mounted for read-only mode. >> >> Good point. So, if only root can modify files in /dev, what's the >> problem you're fixing? (I'm sure you tried to explain this in your >> original post, but your reasons weren't clear to me.) > > In 2003, I was trying to make / partition read-only to avoid tampering system > files. > Use of policy based mandatory access control (such as SELinux) is > one of ways to avoid tampering, but management of policy was a daunting task. > So, I tried to store / partition in a read-only medium so that > the system is free from tampering system files. > > When I attended at Security Stadium 2003 as a defense side, > I was using devfs for /dev directory. The files in /dev directory > were deleted by attckers and the administrator was unable to login. > So I developed this filesystem so that attackers who got root privilege > can't tamper files in /dev directory. What prevents them from mounting tmpfs on top of /dev, bypassing your fs? Also, if they have root there are plenty of ways to prevent an administrator from logging in, e.g. using iptables or changing the password. Greetings, Indan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.
Hi, On Sun, December 16, 2007 13:03, Tetsuo Handa wrote: Hello. David Newall wrote: You won't be able to login to the system because /sbin/mingetty fails to chown/chmod /dev/tty* if /dev is mounted for read-only mode. Good point. So, if only root can modify files in /dev, what's the problem you're fixing? (I'm sure you tried to explain this in your original post, but your reasons weren't clear to me.) In 2003, I was trying to make / partition read-only to avoid tampering system files. Use of policy based mandatory access control (such as SELinux) is one of ways to avoid tampering, but management of policy was a daunting task. So, I tried to store / partition in a read-only medium so that the system is free from tampering system files. When I attended at Security Stadium 2003 as a defense side, I was using devfs for /dev directory. The files in /dev directory were deleted by attckers and the administrator was unable to login. So I developed this filesystem so that attackers who got root privilege can't tamper files in /dev directory. What prevents them from mounting tmpfs on top of /dev, bypassing your fs? Also, if they have root there are plenty of ways to prevent an administrator from logging in, e.g. using iptables or changing the password. Greetings, Indan -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] Linux-tiny project revival
On Thu, September 27, 2007 09:00, Arnd Bergmann wrote: > Assuming that we want to go down that road, I think you can do better with > more evil macro magic, by using something along the lines of > > #define KERN_NOTICE "<5>", > > #define PRINTK_CONTINUED "", > > #define printk(level, str, ...) \ >do { \ > if (sizeof(level) == 1) /* continued printk */\ > actual_printk(str, __VA_ARGS__); \ > else if ((level[1] - '0') < CONFIG_PRINTK_DOICARE) \ >actual_printk(level str, __VA_ARGS__); \ >} while(0); > > Then you don't have to change every single printk in the kernel, but > only those that don't currently come with a log level. More importantly, > you can do the conversion without a flag day, by spreading (an empty) > PRINTK_CONTINUED in places that do need a printk without a log level. The problem is, how do you know whether to print a continued printk or not? It depends on the loglevel of the first printk. So besides compile-time parsing of the source code, replacing printk with loglevel specific alternatives (one way or the other) seems the only option. Greetings, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] Linux-tiny project revival
On Thu, September 27, 2007 09:00, Arnd Bergmann wrote: Assuming that we want to go down that road, I think you can do better with more evil macro magic, by using something along the lines of #define KERN_NOTICE 5, #define PRINTK_CONTINUED , #define printk(level, str, ...) \ do { \ if (sizeof(level) == 1) /* continued printk */\ actual_printk(str, __VA_ARGS__); \ else if ((level[1] - '0') CONFIG_PRINTK_DOICARE) \ actual_printk(level str, __VA_ARGS__); \ } while(0); Then you don't have to change every single printk in the kernel, but only those that don't currently come with a log level. More importantly, you can do the conversion without a flag day, by spreading (an empty) PRINTK_CONTINUED in places that do need a printk without a log level. The problem is, how do you know whether to print a continued printk or not? It depends on the loglevel of the first printk. So besides compile-time parsing of the source code, replacing printk with loglevel specific alternatives (one way or the other) seems the only option. Greetings, Indan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] Linux-tiny project revival
On Fri, September 21, 2007 01:18, Rob Landley wrote: > On Thursday 20 September 2007 4:26:13 pm Indan Zupancic wrote: >> A quick scroll through a vmlinux binary shows that there are quite a >> lot areas consisting only of some repeated pattern. Mostly 0x00, but >> also 0x90 and ".GCC: (GNU) 4.2.1.". Getting rid of those would save >> something between 50 and 100KB. > > Worse, if you feed an absolute path to O= when you build the kernel out of > tree, then it uses absolute paths for all the __FILE__ strings and that makes > kernel BIG. (Did that by accident a while back.) Too bad there's no way > to keep the __FILE__ strings compressed at runtime and gunzip them as needed > like busybox does with help messages... :) I suspect that can be fixed by changing the built system. How can using O= change the source file path anyway? That seems unnecessary. It seems to be worse, full pathnames are also used when giving a relative path. (I'm using O=../obj/). On the other hand, it doesn't seem to cause that much bloat here: $ strings vmlinux | grep /home/ |wc 119 1816400 CC'ing Sam Ravnborg, perhaps he has some ideas. Greetings, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] Linux-tiny project revival
On Thu, September 20, 2007 22:38, Rob Landley wrote: > I've been playing with an idea for a while to improve the printk() situation, > but it's a more intrusive change than I've had time to bang on. > > Right now, the first argument to printk() is a loglevel, but it's handled via > string concatenation. I'd like to change that to be an integer, and make it > an actual comma-separated first argument. (Mandatory, not optional.) > > So instead of: > printk(KERN_NOTICE "Fruit=%d\n", banana); > It would now be: > printk(KERN_NOTICE, "Fruit=%d\n", banana); > > Change the header from: > #define KERN_NOTICE "<5>" > to: > #define KERN_NOTICE 5 You have to jump through less hoops if you do: #define KERN_NOTICE 5, But the problem remains that there are printk's which don't have a KERN_* as the first argument. Those are also impossible to get rid off in this way, as the loglevel is unknown (and you don't want partially printed messages). So adding the comma is really needed and in addition all printk's without a loglevel should get one. Which clutters the code and may increase codesize. A quick scroll through a vmlinux binary shows that there are quite a lot areas consisting only of some repeated pattern. Mostly 0x00, but also 0x90 and ".GCC: (GNU) 4.2.1.". Getting rid of those would save something between 50 and 100KB. Greetings, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] Linux-tiny project revival
On Thu, September 20, 2007 22:38, Rob Landley wrote: I've been playing with an idea for a while to improve the printk() situation, but it's a more intrusive change than I've had time to bang on. Right now, the first argument to printk() is a loglevel, but it's handled via string concatenation. I'd like to change that to be an integer, and make it an actual comma-separated first argument. (Mandatory, not optional.) So instead of: printk(KERN_NOTICE Fruit=%d\n, banana); It would now be: printk(KERN_NOTICE, Fruit=%d\n, banana); Change the header from: #define KERN_NOTICE 5 to: #define KERN_NOTICE 5 You have to jump through less hoops if you do: #define KERN_NOTICE 5, But the problem remains that there are printk's which don't have a KERN_* as the first argument. Those are also impossible to get rid off in this way, as the loglevel is unknown (and you don't want partially printed messages). So adding the comma is really needed and in addition all printk's without a loglevel should get one. Which clutters the code and may increase codesize. A quick scroll through a vmlinux binary shows that there are quite a lot areas consisting only of some repeated pattern. Mostly 0x00, but also 0x90 and .GCC: (GNU) 4.2.1.. Getting rid of those would save something between 50 and 100KB. Greetings, Indan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] Linux-tiny project revival
On Fri, September 21, 2007 01:18, Rob Landley wrote: On Thursday 20 September 2007 4:26:13 pm Indan Zupancic wrote: A quick scroll through a vmlinux binary shows that there are quite a lot areas consisting only of some repeated pattern. Mostly 0x00, but also 0x90 and .GCC: (GNU) 4.2.1.. Getting rid of those would save something between 50 and 100KB. Worse, if you feed an absolute path to O= when you build the kernel out of tree, then it uses absolute paths for all the __FILE__ strings and that makes kernel BIG. (Did that by accident a while back.) Too bad there's no way to keep the __FILE__ strings compressed at runtime and gunzip them as needed like busybox does with help messages... :) I suspect that can be fixed by changing the built system. How can using O= change the source file path anyway? That seems unnecessary. It seems to be worse, full pathnames are also used when giving a relative path. (I'm using O=../obj/). On the other hand, it doesn't seem to cause that much bloat here: $ strings vmlinux | grep /home/ |wc 119 1816400 CC'ing Sam Ravnborg, perhaps he has some ideas. Greetings, Indan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.23-rc2: WARNING at kernel/irq/resend.c:70 check_irq_resend()
Hi, Just added an old network card, RTL-8029(AS), ne2k-pci driver, and tried to expand the network (failed because I didn't use a cross-over cable). The code snippet that spat the thing: /* * IRQ resend * * Is called with interrupts disabled and desc->lock held. */ void check_irq_resend(struct irq_desc *desc, unsigned int irq) { unsigned int status = desc->status; /* * Make sure the interrupt is enabled, before resending it: */ desc->chip->enable(irq); /* * Temporary hack to figure out more about the problem, which * is causing the ancient network cards to die. */ if (desc->handle_irq != handle_edge_irq) { WARN_ON_ONCE(1); return; } and relevant dmesg snippet: [ 446.836399] eth1: RealTek RTL-8029 found at 0x9000, IRQ 16, 00:00:B4:BB:BF:E5. [ 855.458468] r8169: eth0: link up [ 857.500667] r8169: eth0: link up [ 902.294283] r8169: eth0: link up [ 4126.348427] r8169: eth0: link up [ 5598.446233] r8169: eth0: link down [ 5618.904846] r8169: eth0: link up [ 5623.133090] r8169: eth0: link down [ 5624.708071] r8169: eth0: link up [ 6328.267872] WARNING: at /home/indan/src/git/linux-2.6/kernel/irq/resend.c:70 check_irq_resend() [ 6328.267896] [] check_irq_resend+0x4f/0x8c [ 6328.267912] [] enable_irq+0x84/0xaa [ 6328.267918] [] ne2k_pci_block_output+0x0/0x138 [ne2k_pci] [ 6328.267931] [] ne2k_pci_block_output+0x0/0x138 [ne2k_pci] [ 6328.267939] [] ne2k_pci_block_output+0x0/0x138 [ne2k_pci] [ 6328.267947] [] ei_start_xmit+0x29f/0x2b9 [8390] [ 6328.267966] [] dev_hard_start_xmit+0x19e/0x1fd [ 6328.267976] [] __qdisc_run+0x6c/0x151 [ 6328.267986] [] dev_queue_xmit+0x12a/0x271 [ 6328.267991] [] arp_send+0x4c/0x64 [ 6328.268002] [] arp_xmit+0x4d/0x51 [ 6328.268009] [] arp_process+0x29b/0x50a [ 6328.268018] [] ip_forward+0x22c/0x23a [ 6328.268026] [] ip_rcv_finish+0x0/0x282 [ 6328.268032] [] ip_rcv+0x479/0x4a8 [ 6328.268037] [] ip_rcv_finish+0x0/0x282 [ 6328.268044] [] arp_rcv+0xf0/0x104 [ 6328.268049] [] hrtimer_run_queues+0x12/0x1a1 [ 6328.268055] [] arp_rcv+0x0/0x104 [ 6328.268061] [] netif_receive_skb+0x2d9/0x317 [ 6328.268068] [] process_backlog+0x6d/0xd2 [ 6328.268075] [] net_rx_action+0x81/0x14d [ 6328.268082] [] __do_softirq+0x35/0x75 [ 6328.268088] [] do_softirq+0x3e/0x8d [ 6328.268098] [] handle_fasteoi_irq+0x0/0xbe [ 6328.268103] [] irq_exit+0x25/0x30 [ 6328.268107] [] do_IRQ+0x94/0xad [ 6328.268114] [] common_interrupt+0x23/0x28 [ 6328.268123] [] default_idle+0x27/0x39 [ 6328.268128] [] cpu_idle+0x41/0x55 [ 6328.268133] [] start_kernel+0x20e/0x213 [ 6328.268140] [] unknown_bootoption+0x0/0x196 [ 6328.268146] === # cat /proc/interrupts CPU0 0:9804926 IO-APIC-edge timer 1: 8270 IO-APIC-edge i8042 9: 0 IO-APIC-fasteoi acpi 12: 181281 IO-APIC-edge i8042 16: 40410 IO-APIC-fasteoi sata_sil, eth1 17: 0 IO-APIC-fasteoi ohci_hcd:usb1, NVidia nForce2 18: 499428 IO-APIC-fasteoi ohci_hcd:usb2 19: 2 IO-APIC-fasteoi ehci_hcd:usb3 20: 689588 IO-APIC-fasteoi eth0 NMI: 0 LOC:9805098 ERR: 0 MIS: 0 So the card is sharing an IRQ with the disk controller. No idea if the network card died after this warning, for all practical purposes it didn't work before, so couldn't check. I can provide more information and run some tests if anyone wants that, just keep in mind that the card isn't connected to anything. Greetings, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.23-rc2: WARNING at kernel/irq/resend.c:70 check_irq_resend()
Hi, Just added an old network card, RTL-8029(AS), ne2k-pci driver, and tried to expand the network (failed because I didn't use a cross-over cable). The code snippet that spat the thing: /* * IRQ resend * * Is called with interrupts disabled and desc-lock held. */ void check_irq_resend(struct irq_desc *desc, unsigned int irq) { unsigned int status = desc-status; /* * Make sure the interrupt is enabled, before resending it: */ desc-chip-enable(irq); /* * Temporary hack to figure out more about the problem, which * is causing the ancient network cards to die. */ if (desc-handle_irq != handle_edge_irq) { WARN_ON_ONCE(1); return; } and relevant dmesg snippet: [ 446.836399] eth1: RealTek RTL-8029 found at 0x9000, IRQ 16, 00:00:B4:BB:BF:E5. [ 855.458468] r8169: eth0: link up [ 857.500667] r8169: eth0: link up [ 902.294283] r8169: eth0: link up [ 4126.348427] r8169: eth0: link up [ 5598.446233] r8169: eth0: link down [ 5618.904846] r8169: eth0: link up [ 5623.133090] r8169: eth0: link down [ 5624.708071] r8169: eth0: link up [ 6328.267872] WARNING: at /home/indan/src/git/linux-2.6/kernel/irq/resend.c:70 check_irq_resend() [ 6328.267896] [b0135d8f] check_irq_resend+0x4f/0x8c [ 6328.267912] [b0135888] enable_irq+0x84/0xaa [ 6328.267918] [c08e37cc] ne2k_pci_block_output+0x0/0x138 [ne2k_pci] [ 6328.267931] [c08e37cc] ne2k_pci_block_output+0x0/0x138 [ne2k_pci] [ 6328.267939] [c08e37cc] ne2k_pci_block_output+0x0/0x138 [ne2k_pci] [ 6328.267947] [c08e0f8a] ei_start_xmit+0x29f/0x2b9 [8390] [ 6328.267966] [b022a53e] dev_hard_start_xmit+0x19e/0x1fd [ 6328.267976] [b023620a] __qdisc_run+0x6c/0x151 [ 6328.267986] [b022c459] dev_queue_xmit+0x12a/0x271 [ 6328.267991] [b0260ec9] arp_send+0x4c/0x64 [ 6328.268002] [b0260524] arp_xmit+0x4d/0x51 [ 6328.268009] [b026117c] arp_process+0x29b/0x50a [ 6328.268018] [b0244f0a] ip_forward+0x22c/0x23a [ 6328.268026] [b02434a8] ip_rcv_finish+0x0/0x282 [ 6328.268032] [b0243db9] ip_rcv+0x479/0x4a8 [ 6328.268037] [b02434a8] ip_rcv_finish+0x0/0x282 [ 6328.268044] [b02614db] arp_rcv+0xf0/0x104 [ 6328.268049] [b012ae17] hrtimer_run_queues+0x12/0x1a1 [ 6328.268055] [b02613eb] arp_rcv+0x0/0x104 [ 6328.268061] [b022a07d] netif_receive_skb+0x2d9/0x317 [ 6328.268068] [b022baa0] process_backlog+0x6d/0xd2 [ 6328.268075] [b022c1c2] net_rx_action+0x81/0x14d [ 6328.268082] [b011dd84] __do_softirq+0x35/0x75 [ 6328.268088] [b0106030] do_softirq+0x3e/0x8d [ 6328.268098] [b0136530] handle_fasteoi_irq+0x0/0xbe [ 6328.268103] [b011dd44] irq_exit+0x25/0x30 [ 6328.268107] [b010630f] do_IRQ+0x94/0xad [ 6328.268114] [b0104763] common_interrupt+0x23/0x28 [ 6328.268123] [b0102a0c] default_idle+0x27/0x39 [ 6328.268128] [b0102347] cpu_idle+0x41/0x55 [ 6328.268133] [b032f99b] start_kernel+0x20e/0x213 [ 6328.268140] [b032f317] unknown_bootoption+0x0/0x196 [ 6328.268146] === # cat /proc/interrupts CPU0 0:9804926 IO-APIC-edge timer 1: 8270 IO-APIC-edge i8042 9: 0 IO-APIC-fasteoi acpi 12: 181281 IO-APIC-edge i8042 16: 40410 IO-APIC-fasteoi sata_sil, eth1 17: 0 IO-APIC-fasteoi ohci_hcd:usb1, NVidia nForce2 18: 499428 IO-APIC-fasteoi ohci_hcd:usb2 19: 2 IO-APIC-fasteoi ehci_hcd:usb3 20: 689588 IO-APIC-fasteoi eth0 NMI: 0 LOC:9805098 ERR: 0 MIS: 0 So the card is sharing an IRQ with the disk controller. No idea if the network card died after this warning, for all practical purposes it didn't work before, so couldn't check. I can provide more information and run some tests if anyone wants that, just keep in mind that the card isn't connected to anything. Greetings, Indan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux Kernel cfs scheduler and xorg kbd
On Tue, August 7, 2007 08:05, Arjan van de Ven wrote: > On Tue, 2007-08-07 at 00:40 -0400, Kyle Moffett wrote: >> On Aug 01, 2007, at 11:06:00, Indan Zupancic wrote: >> > Different programs, different keys "stuck", so hard to tell. The >> > amount the pressed key is repeated differs too, from double to >> > hundreds. >> > >> > As an apparently separate problem my mouse warps once in a while >> > too, and that's not totally fixed yet, neither with that locking >> > patch. Throw in that two of my USB ports stopped working after a >> > rare power outage, I didn't totally trust my hardware any more, >> > hence my initial reluctance to report these problems. >> > >> > We could have independent problems with more or less the same >> > symptoms, at least Teresa's problem seems much worse than ours, and >> > if you and Ingo only experience a stuck delete key, it might be >> > something else. > > > the last time I saw something like this it was a BIOS that kept doing > USB->PS/2 emulation even when the kernel was running.. that caused great > havoc somehow... > > those who are seeing this... is it all on a USB keyboard? If so, is the > bios set to do PS/2 emulation (sometimes called USB legacy emulation) ?? No, using PS/2 keyboard and mouse here, and made sure to disable USB legacy emulation in the BIOS ages ago. But for me the problems are gone now, since 2.6.23-rc1 (I don't think it was fixed by CFS though, just in the same timeframe). Greetings, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux Kernel cfs scheduler and xorg kbd
On Tue, August 7, 2007 08:05, Arjan van de Ven wrote: On Tue, 2007-08-07 at 00:40 -0400, Kyle Moffett wrote: On Aug 01, 2007, at 11:06:00, Indan Zupancic wrote: Different programs, different keys stuck, so hard to tell. The amount the pressed key is repeated differs too, from double to hundreds. As an apparently separate problem my mouse warps once in a while too, and that's not totally fixed yet, neither with that locking patch. Throw in that two of my USB ports stopped working after a rare power outage, I didn't totally trust my hardware any more, hence my initial reluctance to report these problems. We could have independent problems with more or less the same symptoms, at least Teresa's problem seems much worse than ours, and if you and Ingo only experience a stuck delete key, it might be something else. the last time I saw something like this it was a BIOS that kept doing USB-PS/2 emulation even when the kernel was running.. that caused great havoc somehow... those who are seeing this... is it all on a USB keyboard? If so, is the bios set to do PS/2 emulation (sometimes called USB legacy emulation) ?? No, using PS/2 keyboard and mouse here, and made sure to disable USB legacy emulation in the BIOS ages ago. But for me the problems are gone now, since 2.6.23-rc1 (I don't think it was fixed by CFS though, just in the same timeframe). Greetings, Indan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
OT: enabling Xcomposite [was: Linux Kernel cfs scheduler and xorg kbd]
On Mon, August 6, 2007 19:46, Rene Herman wrote: > On 08/06/2007 05:19 PM, Indan Zupancic wrote: > > [ Yes, OT, I'll shelve it after this ] Me too, and I wonder if we should've dropped a bunch of CCs... > >> Anyway, if you want to experience a shock, try enabling xcomposite and >> run xcompmgr -a or something. It's so wonderfully smooth (even though >> some rendering seems slower than before, it's really worth it for me). > > Don't believe I can -- using a Matrox Millenium G550 (driver mga) and > nothing I do seems to have an effect. Well, as far as I know adding the following to your xorg.conf should be enough: Section "Extensions" Option "Composite" "enable" EndSection and then to enable it at runtime start a composite manager. Reading the mga manpage it seems you need to add Option "AccelMethod" "EXA" to the driver section. (Though it says EXA is possibly unstable.) Run "xdpyinfo | grep Composite" to see if it works or not. But as you said, getting off-topic, so if the above doesn't work sent a private mail, or go to the xorg mailinglist. Greetings, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.23-rc2
On Mon, August 6, 2007 19:16, H. Peter Anvin wrote: > Indan Zupancic wrote: >> >> Does executing unicode_start, unicode_stop fixes it? >> > > Does that seem likely, since his display isn't even in text mode? No idea, just thought it would be worth a try. My console is garbled too, doing that unicode stuff fixed it for me. (Though it appears to work better nowaday: new text is fine and other than the first consoles are fine too, which wasn't the case a while ago I think.) I thought he was using a VGA console, wouldn't that be in text mode? Greetings, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.23-rc2
On Mon, August 6, 2007 18:06, Jeff Chua wrote: > On 8/6/07, Jan Engelhardt <[EMAIL PROTECTED]> wrote: > >> >��� >> That looks pretty much like all-one-bits (� = 255). >> What about vcsa1 (use hexdump -C)? > > # hexdump -C /dev/vcsa1 > 19 50 00 18 ff ff ff ff ff ff ff ff ff ff ff ff |.P..| > 0010 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff || > * > 0fa0 ff ff ff ff || > 0fa4 > > Yep, it's all one's. > > And once "s2ram" (without parameter) is executed, there's no way to > restore back the console in text-mode even using "s2ram --force > --acpi_sleep 1 --vbe_mode" until a "suspend-to-disk" or a reboot. Does executing unicode_start, unicode_stop fixes it? Greetings, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux Kernel cfs scheduler and xorg kbd
On Mon, August 6, 2007 15:10, Rene Herman wrote: > On 08/06/2007 09:12 AM, Ingo Molnar wrote: > >> * Indan Zupancic <[EMAIL PROTECTED]> wrote: >> >>> All right, how would you debug it? Give us some insight in how to >>> solve hard to trigger, happens at most only a few times a day, >>> annoying input bug? I thought the mouse warping was fixed after 23-rc1 >>> and that input locking patch, but alas, the third day it happened >>> again. >> >> i've got no idea how to debug such input bugs best, but, as a starting >> point, i've Cc:-ed the current maintainer of the input subsystem :-) > > FWIW, I haven't experienced my "stuck delete" key anymore -- "since using > CFS v19.1", but that might very well just be coincedence. If anyone > wants/needs me to, I'll try to debug it, but for now I seem to be fine again. No stuck keys and no warping mouse seen yet since last time (currently running 2.6.23-rc2), so both problems are, for all practical purposes, fixed. > Now all I need to know is whether or not moving thunderbird's windows around > is expected to leave such an enormous non-repainting visual trail on the > screen... > > I believe I'm concluding that I'm not all together fond of the "new" modular > X.org. I don't think it's fair to blame Xorg for that, except if changing back to an older version fixes it. If Thunderbird is anything comparable to Firefox, then it's most likely the generic slowdown of that program that might have made it worse. If it's Xorg then it's probably something driver/settings related I guess, best to ask on their mailinglist for that. Anyway, if you want to experience a shock, try enabling xcomposite and run xcompmgr -a or something. It's so wonderfully smooth (even though some rendering seems slower than before, it's really worth it for me). Greetings, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux Kernel cfs scheduler and xorg kbd
On Mon, August 6, 2007 15:10, Rene Herman wrote: On 08/06/2007 09:12 AM, Ingo Molnar wrote: * Indan Zupancic [EMAIL PROTECTED] wrote: All right, how would you debug it? Give us some insight in how to solve hard to trigger, happens at most only a few times a day, annoying input bug? I thought the mouse warping was fixed after 23-rc1 and that input locking patch, but alas, the third day it happened again. i've got no idea how to debug such input bugs best, but, as a starting point, i've Cc:-ed the current maintainer of the input subsystem :-) FWIW, I haven't experienced my stuck delete key anymore -- since using CFS v19.1, but that might very well just be coincedence. If anyone wants/needs me to, I'll try to debug it, but for now I seem to be fine again. No stuck keys and no warping mouse seen yet since last time (currently running 2.6.23-rc2), so both problems are, for all practical purposes, fixed. Now all I need to know is whether or not moving thunderbird's windows around is expected to leave such an enormous non-repainting visual trail on the screen... I believe I'm concluding that I'm not all together fond of the new modular X.org. I don't think it's fair to blame Xorg for that, except if changing back to an older version fixes it. If Thunderbird is anything comparable to Firefox, then it's most likely the generic slowdown of that program that might have made it worse. If it's Xorg then it's probably something driver/settings related I guess, best to ask on their mailinglist for that. Anyway, if you want to experience a shock, try enabling xcomposite and run xcompmgr -a or something. It's so wonderfully smooth (even though some rendering seems slower than before, it's really worth it for me). Greetings, Indan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.23-rc2
On Mon, August 6, 2007 18:06, Jeff Chua wrote: On 8/6/07, Jan Engelhardt [EMAIL PROTECTED] wrote: ��� That looks pretty much like all-one-bits (� = 255). What about vcsa1 (use hexdump -C)? # hexdump -C /dev/vcsa1 19 50 00 18 ff ff ff ff ff ff ff ff ff ff ff ff |.P..| 0010 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff || * 0fa0 ff ff ff ff || 0fa4 Yep, it's all one's. And once s2ram (without parameter) is executed, there's no way to restore back the console in text-mode even using s2ram --force --acpi_sleep 1 --vbe_mode until a suspend-to-disk or a reboot. Does executing unicode_start, unicode_stop fixes it? Greetings, Indan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.23-rc2
On Mon, August 6, 2007 19:16, H. Peter Anvin wrote: Indan Zupancic wrote: Does executing unicode_start, unicode_stop fixes it? Does that seem likely, since his display isn't even in text mode? No idea, just thought it would be worth a try. My console is garbled too, doing that unicode stuff fixed it for me. (Though it appears to work better nowaday: new text is fine and other than the first consoles are fine too, which wasn't the case a while ago I think.) I thought he was using a VGA console, wouldn't that be in text mode? Greetings, Indan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
OT: enabling Xcomposite [was: Linux Kernel cfs scheduler and xorg kbd]
On Mon, August 6, 2007 19:46, Rene Herman wrote: On 08/06/2007 05:19 PM, Indan Zupancic wrote: [ Yes, OT, I'll shelve it after this ] Me too, and I wonder if we should've dropped a bunch of CCs... Anyway, if you want to experience a shock, try enabling xcomposite and run xcompmgr -a or something. It's so wonderfully smooth (even though some rendering seems slower than before, it's really worth it for me). Don't believe I can -- using a Matrox Millenium G550 (driver mga) and nothing I do seems to have an effect. Well, as far as I know adding the following to your xorg.conf should be enough: Section Extensions Option Composite enable EndSection and then to enable it at runtime start a composite manager. Reading the mga manpage it seems you need to add Option AccelMethod EXA to the driver section. (Though it says EXA is possibly unstable.) Run xdpyinfo | grep Composite to see if it works or not. But as you said, getting off-topic, so if the above doesn't work sent a private mail, or go to the xorg mailinglist. Greetings, Indan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: Linux Kernel cfs scheduler and xorg kbd
On Thu, August 2, 2007 14:22, Rene Herman wrote: > On 08/02/2007 09:11 AM, S�bastien Dugu� wrote: > >> This reminds me I had something similar happening about a year or so ago >> running with Debian's unstable xorg and gnome. >> >> The keys would stick for a short time (no, no coffee spilled on the >> keyboard, I can assure you ;-), the most annoying one being the delete >> key. Nothing deterministic, it could happen anytime (or not at all for >> several days). >> >> As a final resort, I upgraded the xorg and gnome packages - problem >> solved. > > Thank you. That "about a year" sounds possible; slackware has been one of > the last to switch to modular xorg and until recently I was still running > the old 6.9 series, so it could be xorg... Well, I'm running xorg since 2004, and hadn't the sticking key problem until a few months ago, half a year max. I didn't install a new xorg version just before the problem went away, and I'm using Fluxbox. Oh well, the problem's gone now, no matter how or why. Greetings, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: Linux Kernel cfs scheduler and xorg kbd
On Thu, August 2, 2007 14:22, Rene Herman wrote: On 08/02/2007 09:11 AM, S�bastien Dugu� wrote: This reminds me I had something similar happening about a year or so ago running with Debian's unstable xorg and gnome. The keys would stick for a short time (no, no coffee spilled on the keyboard, I can assure you ;-), the most annoying one being the delete key. Nothing deterministic, it could happen anytime (or not at all for several days). As a final resort, I upgraded the xorg and gnome packages - problem solved. Thank you. That about a year sounds possible; slackware has been one of the last to switch to modular xorg and until recently I was still running the old 6.9 series, so it could be xorg... Well, I'm running xorg since 2004, and hadn't the sticking key problem until a few months ago, half a year max. I didn't install a new xorg version just before the problem went away, and I'm using Fluxbox. Oh well, the problem's gone now, no matter how or why. Greetings, Indan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux Kernel cfs scheduler and xorg kbd
On Wed, August 1, 2007 17:09, Ingo Molnar wrote: > > * Indan Zupancic <[EMAIL PROTECTED]> wrote: > >> We could have independent problems with more or less the same >> symptoms, at least Teresa's problem seems much worse than ours, and if >> you and Ingo only experience a stuck delete key, it might be something >> else. > > i experienced nothing in this area so far - if i experienced anything of > the likes it would be long fixed! =B-) Ah, indeed, you appended Rene's email without too many new lines, so I thought your wrote it (http://lkml.org/lkml/2007/8/1/4). All right, how would you debug it? Give us some insight in how to solve hard to trigger, happens at most only a few times a day, annoying input bug? I thought the mouse warping was fixed after 23-rc1 and that input locking patch, but alas, the third day it happened again. I would be happy just to know if it's bad hardware or a software bug, I don't ask for too much, do I? Greetings, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux Kernel cfs scheduler and xorg kbd
On Wed, August 1, 2007 15:58, Rene Herman wrote: > On 08/01/2007 03:33 PM, <:::.. TeresaII ..:::> wrote: > >> Problem is, that i can't reproduce it after last reboot. >> Can this be different on each reboot ? Any idea ? >> >> About another kernel: i only had -ck kernels since last year or more, i >> can remember i had same behavier atleast 6 month ago, or maybe more. But >> it wasn't that hard and it dissapiered self after some time. > > Hmm. > >> Now on first run of 2.6.22.1-cfs-v19.1 i got it again. And realy bad. It >> was even very hard to write report mail. > > I notice by the way that you are also using Thunderbird 2.0 -- that was my > own suspect; I had just switched from Thunderbird 1.5. > >> I did reboot, day after that it was back, but not that hard. And now >> rebooted yesterday and cant reproduce it at all. > > I believe that at least for now we can safely remove this from any list of > possible CFS regresssions. I'll sit on it and see what I can find out > when/if my stuck delete happens again. > > Indan -- under what circumstances/programs did you experience what? Different programs, different keys "stuck", so hard to tell. The amount the pressed key is repeated differs too, from double to hundreds. As an apparently separate problem my mouse warps once in a while too, and that's not totally fixed yet, neither with that locking patch. Throw in that two of my USB ports stopped working after a rare power outage, I didn't totally trust my hardware any more, hence my initial reluctance to report these problems. We could have independent problems with more or less the same symptoms, at least Teresa's problem seems much worse than ours, and if you and Ingo only experience a stuck delete key, it might be something else. Greetings, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux Kernel cfs scheduler and xorg kbd
On Wed, August 1, 2007 14:50, Rene Herman wrote: > On 08/01/2007 02:34 PM, Indan Zupancic wrote: > >> On Wed, August 1, 2007 13:01, Rene Herman wrote: > >>> Teresa was already using 2.6.22.1, with CFS (v19.1) patched in, so reverting >>> that would be a matter of patching it out again. She said she wasn't seeing >>> trouble on other kernels though. >>> >>> I was the one who also saw keyboard trouble on 2.6.22.1 without CFS by the >>> way -- and haven't (yet) seen any anymore since patching _in_ CFS v19.1... >> >> Could the people who had or have keyboard problems try out Dmitry's >> input locking patches, and see if those help: >> http://lkml.org/lkml/2007/7/24/17 >> And if it does, report it to him, either there or here after CCing him. > > I'm having trouble reproducing this at will -- it sounded as though Teresa > had less trouble at least originally. Teresa? Same here, so testing is a pain. I was more thinking about running a patched version for a day or longer and see if it happened or not. For me it happened irregularly, but definitely a few times a day. (It might help if people report what they're using. So, for what it's worth, I'm using a PS/2 keyboard with xorg 1.2.0 with the "keyboard" driver.) Greetings, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux Kernel cfs scheduler and xorg kbd
On Wed, August 1, 2007 13:01, Rene Herman wrote: > On 08/01/2007 01:31 PM, Andi Kleen wrote: > >> Ingo Molnar <[EMAIL PROTECTED]> writes: > >>> another reaction below in this thread reported kbd problems in vanilla >>> 2.6.22.1 as well. What is the X versions, etc.? Does the problem go away >>> if the X kbd driver selection is tweaked to a simpler model, say: >> >> Or perhaps just test with CFS reverted? It really sounds more like something >> independent from the scheduler. Perhaps someone could put a >> CFS-revert-for-testing patch >> on ftp somewhere. It's only a few git changesets so shouldn't be too >> difficult to >> do. > > Teresa was already using 2.6.22.1, with CFS (v19.1) patched in, so reverting > that would be a matter of patching it out again. She said she wasn't seeing > trouble on other kernels though. > > I was the one who also saw keyboard trouble on 2.6.22.1 without CFS by the > way -- and haven't (yet) seen any anymore since patching _in_ CFS v19.1... Could the people who had or have keyboard problems try out Dmitry's input locking patches, and see if those help: http://lkml.org/lkml/2007/7/24/17 And if it does, report it to him, either there or here after CCing him. Greetings, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux Kernel cfs scheduler and xorg kbd
On Wed, August 1, 2007 13:01, Rene Herman wrote: On 08/01/2007 01:31 PM, Andi Kleen wrote: Ingo Molnar [EMAIL PROTECTED] writes: another reaction below in this thread reported kbd problems in vanilla 2.6.22.1 as well. What is the X versions, etc.? Does the problem go away if the X kbd driver selection is tweaked to a simpler model, say: Or perhaps just test with CFS reverted? It really sounds more like something independent from the scheduler. Perhaps someone could put a CFS-revert-for-testing patch on ftp somewhere. It's only a few git changesets so shouldn't be too difficult to do. Teresa was already using 2.6.22.1, with CFS (v19.1) patched in, so reverting that would be a matter of patching it out again. She said she wasn't seeing trouble on other kernels though. I was the one who also saw keyboard trouble on 2.6.22.1 without CFS by the way -- and haven't (yet) seen any anymore since patching _in_ CFS v19.1... Could the people who had or have keyboard problems try out Dmitry's input locking patches, and see if those help: http://lkml.org/lkml/2007/7/24/17 And if it does, report it to him, either there or here after CCing him. Greetings, Indan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux Kernel cfs scheduler and xorg kbd
On Wed, August 1, 2007 14:50, Rene Herman wrote: On 08/01/2007 02:34 PM, Indan Zupancic wrote: On Wed, August 1, 2007 13:01, Rene Herman wrote: Teresa was already using 2.6.22.1, with CFS (v19.1) patched in, so reverting that would be a matter of patching it out again. She said she wasn't seeing trouble on other kernels though. I was the one who also saw keyboard trouble on 2.6.22.1 without CFS by the way -- and haven't (yet) seen any anymore since patching _in_ CFS v19.1... Could the people who had or have keyboard problems try out Dmitry's input locking patches, and see if those help: http://lkml.org/lkml/2007/7/24/17 And if it does, report it to him, either there or here after CCing him. I'm having trouble reproducing this at will -- it sounded as though Teresa had less trouble at least originally. Teresa? Same here, so testing is a pain. I was more thinking about running a patched version for a day or longer and see if it happened or not. For me it happened irregularly, but definitely a few times a day. (It might help if people report what they're using. So, for what it's worth, I'm using a PS/2 keyboard with xorg 1.2.0 with the keyboard driver.) Greetings, Indan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux Kernel cfs scheduler and xorg kbd
On Wed, August 1, 2007 17:09, Ingo Molnar wrote: * Indan Zupancic [EMAIL PROTECTED] wrote: We could have independent problems with more or less the same symptoms, at least Teresa's problem seems much worse than ours, and if you and Ingo only experience a stuck delete key, it might be something else. i experienced nothing in this area so far - if i experienced anything of the likes it would be long fixed! =B-) Ah, indeed, you appended Rene's email without too many new lines, so I thought your wrote it (http://lkml.org/lkml/2007/8/1/4). All right, how would you debug it? Give us some insight in how to solve hard to trigger, happens at most only a few times a day, annoying input bug? I thought the mouse warping was fixed after 23-rc1 and that input locking patch, but alas, the third day it happened again. I would be happy just to know if it's bad hardware or a software bug, I don't ask for too much, do I? Greetings, Indan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux Kernel cfs scheduler and xorg kbd
On Wed, August 1, 2007 15:58, Rene Herman wrote: On 08/01/2007 03:33 PM, :::.. TeresaII ..::: wrote: Problem is, that i can't reproduce it after last reboot. Can this be different on each reboot ? Any idea ? About another kernel: i only had -ck kernels since last year or more, i can remember i had same behavier atleast 6 month ago, or maybe more. But it wasn't that hard and it dissapiered self after some time. Hmm. Now on first run of 2.6.22.1-cfs-v19.1 i got it again. And realy bad. It was even very hard to write report mail. I notice by the way that you are also using Thunderbird 2.0 -- that was my own suspect; I had just switched from Thunderbird 1.5. I did reboot, day after that it was back, but not that hard. And now rebooted yesterday and cant reproduce it at all. I believe that at least for now we can safely remove this from any list of possible CFS regresssions. I'll sit on it and see what I can find out when/if my stuck delete happens again. Indan -- under what circumstances/programs did you experience what? Different programs, different keys stuck, so hard to tell. The amount the pressed key is repeated differs too, from double to hundreds. As an apparently separate problem my mouse warps once in a while too, and that's not totally fixed yet, neither with that locking patch. Throw in that two of my USB ports stopped working after a rare power outage, I didn't totally trust my hardware any more, hence my initial reluctance to report these problems. We could have independent problems with more or less the same symptoms, at least Teresa's problem seems much worse than ours, and if you and Ingo only experience a stuck delete key, it might be something else. Greetings, Indan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC/RFT 1/5] Input: implement proper locking in input core
On Sun, July 29, 2007 05:50, Dmitry Torokhov wrote: > Hi Indan, > > On Friday 27 July 2007 19:28, Indan Zupancic wrote: >> Hi, >> >> Not real feedback, just some nitpicks. >> >> On Tue, July 24, 2007 06:45, Dmitry Torokhov wrote: >> > +static int input_defuzz_abs_event(int value, int old_val, int fuzz) >> > +{ >> > + if (fuzz) { >> > + if (value > old_val - fuzz / 2 && value < old_val + fuzz / 2) >> > + return value; >> > >> > - add_input_randomness(type, code, value); >> > + if (value > old_val - fuzz && value < old_val + fuzz) >> > + return (old_val * 3 + value) / 4; >> > >> > - switch (type) { >> > + if (value > old_val - fuzz * 2 && value < old_val + fuzz * 2) >> > + return (old_val + value) / 2; >> > + } >> >> Shouldn't the return values of the second and third case be reversed? >> In the 2nd check the new values is weighted for 1/4, while in the 3rd >> case it counts for 1/2, which breaks the "account new value more when >> it is closer to the old one" logic that I thought I saw here. So to sum up, >> should the second return be "return (old_val + value * 3) / 4"? > > Thank you for bringing this up. Actually the 1st return valus should be > "old_val", not value. The logic is to "gravitate towards old" when > difference is small. > >> >> >> > +/* >> > + * Generate software autorepeat event. Note that we take >> > + * dev->event_lock here to avoid racing with input_event >> > + * which may cause keys get "stuck". >> > + */ >> >> Hurray. :-) >> >> > - if (code > SW_MAX || !test_bit(code, dev->swbit) || >> > !!test_bit(code, dev->sw) == value) >> > - return; >> > + if (dev->rep[REP_PERIOD]) >> > + mod_timer(>timer, jiffies + >> > + msecs_to_jiffies(dev->rep[REP_PERIOD])); >> > + } >> >> Perhaps use a local var for the "msecs_to_jiffies(dev->rep[REP_PERIOD])" >> part. >> > > What would be the benefit of doing so? I was hoping it would make things more readable, and less staircase like. > >> >> > +static void input_start_autorepeat(struct input_dev *dev, int code) >> > +{ >> > + if (test_bit(EV_REP, dev->evbit) && >> > + dev->rep[REP_PERIOD] && dev->rep[REP_DELAY] && >> > + dev->timer.data) { >> > + dev->repeat_key = code; >> > + mod_timer(>timer, >> > +jiffies + msecs_to_jiffies(dev->rep[REP_DELAY])); >> > + } >> > +} >> >> Same here. >> >> >> > + case EV_KEY: >> > + if (is_event_supported(code, dev->keybit, KEY_MAX) && >> > + !!test_bit(code, dev->key) != value) { >> >> A bit confusing, test_bit(0 only returns 0 or 1 anyway, doesn't it? >> So "test_bit(code, dev->key) != value" should be all right. >> I noticed that the old code did it too, but still. > > Is it guaranteed? I only expect it to return 0/non-0 values, not necessarily > 0 and 1. Not sure, but it seems so. All the test_bit implementations I checked returned either 0 or 1. No idea where to get that guarantee though. > >> >> > - case EV_MSC: >> > + case EV_SW: >> > + if (is_event_supported(code, dev->swbit, SW_MAX) && >> > + !!test_bit(code, dev->sw) != value) { >> >> Same. >> >> > - break; >> > + case EV_LED: >> > + if (is_event_supported(code, dev->ledbit, LED_MAX) && >> > + !!test_bit(code, dev->led) != value) { >> >> And here. >> >> >> > +void input_inject_event(struct input_handle *handle, >> > + unsigned int type, unsigned int code, int value) >> > { >> > - struct input_dev *dev = (void *) data; >> > + struct input_dev *dev = handle->dev; >> > + struct input_handle *grab; >> > >> > - if (!test_bit(dev->repeat_key, dev->key)) >> > - return; >> > + if (is_event_supported(type, dev->evbit, EV_MAX)) { >> > + spin_lock_irq(>event_lock); >> > >>
Re: [RFC/RFT 0/5] Input locking patches
On Sun, July 29, 2007 05:38, Dmitry Torokhov wrote: > Hi Indan, > > On Friday 27 July 2007 18:25, Indan Zupancic wrote: >> Sorry for the babbling, just wanted to say that I've tested these >> patches and that they seem to fix real problems. >> > > Thank you for testing the patches. Unfortunately I spoke too soon and some mouse warps occurred yesterday. I suspect faulty hardware, so I'll try with my old non-optical mouse. If it still happens then it's either the mobo or the kernel. In the meantime I'll plug this one in another pc and see if the warps happen there. After that it's running old kernels to see if the warps happen there too. Any other ideas to check whether it's a software or hardware problem? The more annoying sticking key problem is gone for good though. Greetings, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC/RFT 0/5] Input locking patches
On Sun, July 29, 2007 05:38, Dmitry Torokhov wrote: Hi Indan, On Friday 27 July 2007 18:25, Indan Zupancic wrote: Sorry for the babbling, just wanted to say that I've tested these patches and that they seem to fix real problems. Thank you for testing the patches. Unfortunately I spoke too soon and some mouse warps occurred yesterday. I suspect faulty hardware, so I'll try with my old non-optical mouse. If it still happens then it's either the mobo or the kernel. In the meantime I'll plug this one in another pc and see if the warps happen there. After that it's running old kernels to see if the warps happen there too. Any other ideas to check whether it's a software or hardware problem? The more annoying sticking key problem is gone for good though. Greetings, Indan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC/RFT 1/5] Input: implement proper locking in input core
On Sun, July 29, 2007 05:50, Dmitry Torokhov wrote: Hi Indan, On Friday 27 July 2007 19:28, Indan Zupancic wrote: Hi, Not real feedback, just some nitpicks. On Tue, July 24, 2007 06:45, Dmitry Torokhov wrote: +static int input_defuzz_abs_event(int value, int old_val, int fuzz) +{ + if (fuzz) { + if (value old_val - fuzz / 2 value old_val + fuzz / 2) + return value; - add_input_randomness(type, code, value); + if (value old_val - fuzz value old_val + fuzz) + return (old_val * 3 + value) / 4; - switch (type) { + if (value old_val - fuzz * 2 value old_val + fuzz * 2) + return (old_val + value) / 2; + } Shouldn't the return values of the second and third case be reversed? In the 2nd check the new values is weighted for 1/4, while in the 3rd case it counts for 1/2, which breaks the account new value more when it is closer to the old one logic that I thought I saw here. So to sum up, should the second return be return (old_val + value * 3) / 4? Thank you for bringing this up. Actually the 1st return valus should be old_val, not value. The logic is to gravitate towards old when difference is small. +/* + * Generate software autorepeat event. Note that we take + * dev-event_lock here to avoid racing with input_event + * which may cause keys get stuck. + */ Hurray. :-) - if (code SW_MAX || !test_bit(code, dev-swbit) || !!test_bit(code, dev-sw) == value) - return; + if (dev-rep[REP_PERIOD]) + mod_timer(dev-timer, jiffies + + msecs_to_jiffies(dev-rep[REP_PERIOD])); + } Perhaps use a local var for the msecs_to_jiffies(dev-rep[REP_PERIOD]) part. What would be the benefit of doing so? I was hoping it would make things more readable, and less staircase like. +static void input_start_autorepeat(struct input_dev *dev, int code) +{ + if (test_bit(EV_REP, dev-evbit) + dev-rep[REP_PERIOD] dev-rep[REP_DELAY] + dev-timer.data) { + dev-repeat_key = code; + mod_timer(dev-timer, +jiffies + msecs_to_jiffies(dev-rep[REP_DELAY])); + } +} Same here. + case EV_KEY: + if (is_event_supported(code, dev-keybit, KEY_MAX) + !!test_bit(code, dev-key) != value) { A bit confusing, test_bit(0 only returns 0 or 1 anyway, doesn't it? So test_bit(code, dev-key) != value should be all right. I noticed that the old code did it too, but still. Is it guaranteed? I only expect it to return 0/non-0 values, not necessarily 0 and 1. Not sure, but it seems so. All the test_bit implementations I checked returned either 0 or 1. No idea where to get that guarantee though. - case EV_MSC: + case EV_SW: + if (is_event_supported(code, dev-swbit, SW_MAX) + !!test_bit(code, dev-sw) != value) { Same. - break; + case EV_LED: + if (is_event_supported(code, dev-ledbit, LED_MAX) + !!test_bit(code, dev-led) != value) { And here. +void input_inject_event(struct input_handle *handle, + unsigned int type, unsigned int code, int value) { - struct input_dev *dev = (void *) data; + struct input_dev *dev = handle-dev; + struct input_handle *grab; - if (!test_bit(dev-repeat_key, dev-key)) - return; + if (is_event_supported(type, dev-evbit, EV_MAX)) { + spin_lock_irq(dev-event_lock); - input_event(dev, EV_KEY, dev-repeat_key, 2); - input_sync(dev); + grab = rcu_dereference(dev-grab); + if (!grab || grab == handle) + input_handle_event(dev, type, code, value); 'handle' can't be NULL, so can drop the !grab check, as checking grab == handle should be sufficient. It is or, not and. The idea is to pass the event if device is not grabbed by anyone _or_ if source of event is handle that grabbed the device. Ah, oops. +/** + * input_open_device - open input device + * @handle: handle through which device is being accessed + * + * This function should be called by input handlers when they + * want to start receive events from given input device. + */ int input_open_device(struct input_handle *handle) { struct input_dev *dev = handle-dev; - int err; + int retval; - err = mutex_lock_interruptible(dev-mutex); - if (err) - return err; + retval = mutex_lock_interruptible(dev-mutex); + if (retval) + return retval; + + if (dev-going_away) { + retval = -ENODEV; + goto out; + } handle-open++; if (!dev-users++ dev-open) Ugh, not your code, and perhaps it's me, but that looks weird. The ++ hidden inthe if check is ugly, and would mean that users can be negative, which
Re: swap-prefetch: A smart way to make good use of idle resources (was: updatedb)
On Sat, July 28, 2007 01:34, grundig wrote: > El Fri, 27 Jul 2007 15:06:14 -0700, Arjan van de Ven <[EMAIL PROTECTED]> > escribi�: > >> how do you know there will be other activity? You start the IO and that >> basically blacks out the disk for 5 to 10 ms. If the "real" IO gets >> submitted in that time you add latency. You cannot predict that IO >> happening or not happening. > > If there hasn't be much IO for some time, it looks quite reasonable to expect > that there won't be more in the near future. Good argument. > As most of heuristics can fail, but > then this is a feature mostly for desktops, not servers. Bad argument. It doesn't matter for who the feature is intended, it matter what it does and if it does it well or not. In this case, prefetching swap without disturbing anything else. > There's an old saying that says something like "an open source project starts > dying when new people can't participate in the project no matter how hard > they try". It's hard to understand why there's so many people opposing to > this when other more controversial features are merged much faster, (like, fe. > the UIO driver framework). Could people please stop this emotional crap non-argumentation? At best it reduces the chance of swap-prefetch to be merged. Perhaps one of the reasons is that this is core kernel code. And that it isn't a new feature, but a performance improvement with doubtful trade-offs. The problem statement isn't clear either. It seems like a natural enhancement, but is that enough reason to merge it? Maybe, maybe not. But if slow swap-in is the problem, shouldn't that be fixed instead of bypassed? Yes, there are people that say that it works for them, but of those a lot claim updatedb damage is fixed by it too, while that can't be true. And how many of those people did test swap prefetch stand-alone? The ck kernel has other mm patches too, perhaps those are the real goodies... And there don't seem to be many people opposing swap prefetch either. A bunch seem in favour of it, and others seem unconvinced. Me, I don't know if it should be merged or not, it solves one very specific workload, and nothing else (swap is used, and memory becomes free which won't be used in the near future). All in all it seems good, but doubtful, and when in doubt, don't merge. Greetings, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC/RFT 1/5] Input: implement proper locking in input core
Hi, Not real feedback, just some nitpicks. On Tue, July 24, 2007 06:45, Dmitry Torokhov wrote: > +static int input_defuzz_abs_event(int value, int old_val, int fuzz) > +{ > + if (fuzz) { > + if (value > old_val - fuzz / 2 && value < old_val + fuzz / 2) > + return value; > > - add_input_randomness(type, code, value); > + if (value > old_val - fuzz && value < old_val + fuzz) > + return (old_val * 3 + value) / 4; > > - switch (type) { > + if (value > old_val - fuzz * 2 && value < old_val + fuzz * 2) > + return (old_val + value) / 2; > + } Shouldn't the return values of the second and third case be reversed? In the 2nd check the new values is weighted for 1/4, while in the 3rd case it counts for 1/2, which breaks the "account new value more when it is closer to the old one" logic that I thought I saw here. So to sum up, should the second return be "return (old_val + value * 3) / 4"? > +/* > + * Generate software autorepeat event. Note that we take > + * dev->event_lock here to avoid racing with input_event > + * which may cause keys get "stuck". > + */ Hurray. :-) > - if (code > SW_MAX || !test_bit(code, dev->swbit) || > !!test_bit(code, dev->sw) == value) > - return; > + if (dev->rep[REP_PERIOD]) > + mod_timer(>timer, jiffies + > + msecs_to_jiffies(dev->rep[REP_PERIOD])); > + } Perhaps use a local var for the "msecs_to_jiffies(dev->rep[REP_PERIOD])" part. > +static void input_start_autorepeat(struct input_dev *dev, int code) > +{ > + if (test_bit(EV_REP, dev->evbit) && > + dev->rep[REP_PERIOD] && dev->rep[REP_DELAY] && > + dev->timer.data) { > + dev->repeat_key = code; > + mod_timer(>timer, > + jiffies + msecs_to_jiffies(dev->rep[REP_DELAY])); > + } > +} Same here. > + case EV_KEY: > + if (is_event_supported(code, dev->keybit, KEY_MAX) && > + !!test_bit(code, dev->key) != value) { A bit confusing, test_bit(0 only returns 0 or 1 anyway, doesn't it? So "test_bit(code, dev->key) != value" should be all right. I noticed that the old code did it too, but still. > - case EV_MSC: > + case EV_SW: > + if (is_event_supported(code, dev->swbit, SW_MAX) && > + !!test_bit(code, dev->sw) != value) { Same. > - break; > + case EV_LED: > + if (is_event_supported(code, dev->ledbit, LED_MAX) && > + !!test_bit(code, dev->led) != value) { And here. > +void input_inject_event(struct input_handle *handle, > + unsigned int type, unsigned int code, int value) > { > - struct input_dev *dev = (void *) data; > + struct input_dev *dev = handle->dev; > + struct input_handle *grab; > > - if (!test_bit(dev->repeat_key, dev->key)) > - return; > + if (is_event_supported(type, dev->evbit, EV_MAX)) { > + spin_lock_irq(>event_lock); > > - input_event(dev, EV_KEY, dev->repeat_key, 2); > - input_sync(dev); > + grab = rcu_dereference(dev->grab); > + if (!grab || grab == handle) > + input_handle_event(dev, type, code, value); 'handle' can't be NULL, so can drop the "!grab" check, as checking "grab == handle" should be sufficient. > +/** > + * input_open_device - open input device > + * @handle: handle through which device is being accessed > + * > + * This function should be called by input handlers when they > + * want to start receive events from given input device. > + */ > int input_open_device(struct input_handle *handle) > { > struct input_dev *dev = handle->dev; > - int err; > + int retval; > > - err = mutex_lock_interruptible(>mutex); > - if (err) > - return err; > + retval = mutex_lock_interruptible(>mutex); > + if (retval) > + return retval; > + > + if (dev->going_away) { > + retval = -ENODEV; > + goto out; > + } > > handle->open++; > > if (!dev->users++ && dev->open) Ugh, not your code, and perhaps it's me, but that looks weird. The ++ hidden inthe if check is ugly, and would mean that "users" can be negative, which is strange. > - err = dev->open(dev); > + retval = dev->open(dev); > > - if (err) > - handle->open--; > + if (retval && !--handle->open) { Eek! That -- is hidden well there. Would it hurt to call synchronize_sched() unconditionally? Something like: if (retval) { handle->open--; It's a rare case anyway. > + /* > + * Make sure we are not delivering any more events > + * through this handle > + */ > + synchronize_sched(); > + } > >
Re: [RFC/RFT 0/5] Input locking patches
On Tue, July 24, 2007 06:45, Dmitry Torokhov wrote: > Hi everyone, > > I finally managed to put together some patches implementing > locking in input core and main input handles. Please look > over them and give them a spin. Since kernel 2.6.21 or so I was annoyed by a warping mouse, and one kernel version later also by "stuck" keys, causing repeated input at the most inconvenient moments (e.g. when opening a program by pressing F1). As it happened irregularly and unpredictable it was hard to debug, and I suspected faulty hardware. My cpu was quite hot, but after removing all the dust it seems all right again. Unfortunately that was about the same time I upgraded to 2.6.23-rc1, so all I can say is that the stuck key problem seems to be gone, though not sure thanks to what, but that neither the cleaning nor the upgrade fixed the warping mouse problem. I'm running with these locking patches for two days now and the mouse doesn't warp any more (it can also have fixed the stuck key problem, not sure). Normally it would warp several times a day, and that didn't happen yet, so I'm tempted to praise your patches. Sorry for the babbling, just wanted to say that I've tested these patches and that they seem to fix real problems. Thanks, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: swap-prefetch: A smart way to make good use of idle resources (was: updatedb)
On Sat, July 28, 2007 00:06, Arjan van de Ven wrote: > On Fri, 2007-07-27 at 23:51 +0200, Indan Zupanci >> > also, they take up seek time (5 to 10 msec), so if you were to read >> > something else at the time you get additional latency. >> >> If there's other disk activity swap prefetch shouldn't do much, so this isn't >> really true. > > how do you know there will be other activity? You start the IO and that > basically blacks out the disk for 5 to 10 ms. If the "real" IO gets > submitted in that time you add latency. You cannot predict that IO > happening or not happening. Ah, in that way. Yes, you right about that (though NCQ might help then?), but that's true for all disk activity. Though I think swap prefetch didn't want to run when there was CPU activity, so that would reduce the chance that new IO is submitted right at that moment. I think in practice this isn't worth worrying about, the real issue is the extra disk activity in the first place. Greetings, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: swap-prefetch: A smart way to make good use of idle resources (was: updatedb)
On Fri, July 27, 2007 22:34, Arjan van de Ven wrote: > On Fri, July 27, 2007 21:43, Al Boldi wrote: >> IMHO, what everybody agrees on, is that swap-prefetch has a positive effect >> in some cases, and nobody can prove an adverse effect (excluding power >> consumption). The reason for this positive effect is also crystal clear: >> It prefetches from swap on idle into free memory, ie: it doesn't force > > the fact that there is free memory is ... strange. IN principle, Linux > keeps almost no memory free (except some emergency buffers) so that > things you swap in prematurely will BY DEFINITION go at the expense of > other things that could be there It's not strange, the use case here is if something memory hungry process is shut down it leaves behind a lot of free memory. Having things swapped out while there's free memory is strange, so swap prefetch fills it up again. > also, they take up seek time (5 to 10 msec), so if you were to read > something else at the time you get additional latency. If there's other disk activity swap prefetch shouldn't do much, so this isn't really true. >> Conclusion: Either prove swap-prefetch is broken, or get this merged quick. There are a whole lot of other requirements too than that it isn't broken (of which most are fulfilled, but anyway). One reason could be that there's a better solution out there for the problem swap prefetch tries to solve. That said, as swap prefetch is here now for a while and that other solution not it's not such a great argument. Personally I think that a more generic solution would be better, one that prefetches the lastly evicted pages back in, not favouring either of swap or file data, like swap prefetch does now. Greetings, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: swap-prefetch: A smart way to make good use of idle resources (was: updatedb)
On Fri, July 27, 2007 22:34, Arjan van de Ven wrote: On Fri, July 27, 2007 21:43, Al Boldi wrote: IMHO, what everybody agrees on, is that swap-prefetch has a positive effect in some cases, and nobody can prove an adverse effect (excluding power consumption). The reason for this positive effect is also crystal clear: It prefetches from swap on idle into free memory, ie: it doesn't force the fact that there is free memory is ... strange. IN principle, Linux keeps almost no memory free (except some emergency buffers) so that things you swap in prematurely will BY DEFINITION go at the expense of other things that could be there It's not strange, the use case here is if something memory hungry process is shut down it leaves behind a lot of free memory. Having things swapped out while there's free memory is strange, so swap prefetch fills it up again. also, they take up seek time (5 to 10 msec), so if you were to read something else at the time you get additional latency. If there's other disk activity swap prefetch shouldn't do much, so this isn't really true. Conclusion: Either prove swap-prefetch is broken, or get this merged quick. There are a whole lot of other requirements too than that it isn't broken (of which most are fulfilled, but anyway). One reason could be that there's a better solution out there for the problem swap prefetch tries to solve. That said, as swap prefetch is here now for a while and that other solution not it's not such a great argument. Personally I think that a more generic solution would be better, one that prefetches the lastly evicted pages back in, not favouring either of swap or file data, like swap prefetch does now. Greetings, Indan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: swap-prefetch: A smart way to make good use of idle resources (was: updatedb)
On Sat, July 28, 2007 00:06, Arjan van de Ven wrote: On Fri, 2007-07-27 at 23:51 +0200, Indan Zupanci also, they take up seek time (5 to 10 msec), so if you were to read something else at the time you get additional latency. If there's other disk activity swap prefetch shouldn't do much, so this isn't really true. how do you know there will be other activity? You start the IO and that basically blacks out the disk for 5 to 10 ms. If the real IO gets submitted in that time you add latency. You cannot predict that IO happening or not happening. Ah, in that way. Yes, you right about that (though NCQ might help then?), but that's true for all disk activity. Though I think swap prefetch didn't want to run when there was CPU activity, so that would reduce the chance that new IO is submitted right at that moment. I think in practice this isn't worth worrying about, the real issue is the extra disk activity in the first place. Greetings, Indan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC/RFT 0/5] Input locking patches
On Tue, July 24, 2007 06:45, Dmitry Torokhov wrote: Hi everyone, I finally managed to put together some patches implementing locking in input core and main input handles. Please look over them and give them a spin. Since kernel 2.6.21 or so I was annoyed by a warping mouse, and one kernel version later also by stuck keys, causing repeated input at the most inconvenient moments (e.g. when opening a program by pressing F1). As it happened irregularly and unpredictable it was hard to debug, and I suspected faulty hardware. My cpu was quite hot, but after removing all the dust it seems all right again. Unfortunately that was about the same time I upgraded to 2.6.23-rc1, so all I can say is that the stuck key problem seems to be gone, though not sure thanks to what, but that neither the cleaning nor the upgrade fixed the warping mouse problem. I'm running with these locking patches for two days now and the mouse doesn't warp any more (it can also have fixed the stuck key problem, not sure). Normally it would warp several times a day, and that didn't happen yet, so I'm tempted to praise your patches. Sorry for the babbling, just wanted to say that I've tested these patches and that they seem to fix real problems. Thanks, Indan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC/RFT 1/5] Input: implement proper locking in input core
Hi, Not real feedback, just some nitpicks. On Tue, July 24, 2007 06:45, Dmitry Torokhov wrote: +static int input_defuzz_abs_event(int value, int old_val, int fuzz) +{ + if (fuzz) { + if (value old_val - fuzz / 2 value old_val + fuzz / 2) + return value; - add_input_randomness(type, code, value); + if (value old_val - fuzz value old_val + fuzz) + return (old_val * 3 + value) / 4; - switch (type) { + if (value old_val - fuzz * 2 value old_val + fuzz * 2) + return (old_val + value) / 2; + } Shouldn't the return values of the second and third case be reversed? In the 2nd check the new values is weighted for 1/4, while in the 3rd case it counts for 1/2, which breaks the account new value more when it is closer to the old one logic that I thought I saw here. So to sum up, should the second return be return (old_val + value * 3) / 4? +/* + * Generate software autorepeat event. Note that we take + * dev-event_lock here to avoid racing with input_event + * which may cause keys get stuck. + */ Hurray. :-) - if (code SW_MAX || !test_bit(code, dev-swbit) || !!test_bit(code, dev-sw) == value) - return; + if (dev-rep[REP_PERIOD]) + mod_timer(dev-timer, jiffies + + msecs_to_jiffies(dev-rep[REP_PERIOD])); + } Perhaps use a local var for the msecs_to_jiffies(dev-rep[REP_PERIOD]) part. +static void input_start_autorepeat(struct input_dev *dev, int code) +{ + if (test_bit(EV_REP, dev-evbit) + dev-rep[REP_PERIOD] dev-rep[REP_DELAY] + dev-timer.data) { + dev-repeat_key = code; + mod_timer(dev-timer, + jiffies + msecs_to_jiffies(dev-rep[REP_DELAY])); + } +} Same here. + case EV_KEY: + if (is_event_supported(code, dev-keybit, KEY_MAX) + !!test_bit(code, dev-key) != value) { A bit confusing, test_bit(0 only returns 0 or 1 anyway, doesn't it? So test_bit(code, dev-key) != value should be all right. I noticed that the old code did it too, but still. - case EV_MSC: + case EV_SW: + if (is_event_supported(code, dev-swbit, SW_MAX) + !!test_bit(code, dev-sw) != value) { Same. - break; + case EV_LED: + if (is_event_supported(code, dev-ledbit, LED_MAX) + !!test_bit(code, dev-led) != value) { And here. +void input_inject_event(struct input_handle *handle, + unsigned int type, unsigned int code, int value) { - struct input_dev *dev = (void *) data; + struct input_dev *dev = handle-dev; + struct input_handle *grab; - if (!test_bit(dev-repeat_key, dev-key)) - return; + if (is_event_supported(type, dev-evbit, EV_MAX)) { + spin_lock_irq(dev-event_lock); - input_event(dev, EV_KEY, dev-repeat_key, 2); - input_sync(dev); + grab = rcu_dereference(dev-grab); + if (!grab || grab == handle) + input_handle_event(dev, type, code, value); 'handle' can't be NULL, so can drop the !grab check, as checking grab == handle should be sufficient. +/** + * input_open_device - open input device + * @handle: handle through which device is being accessed + * + * This function should be called by input handlers when they + * want to start receive events from given input device. + */ int input_open_device(struct input_handle *handle) { struct input_dev *dev = handle-dev; - int err; + int retval; - err = mutex_lock_interruptible(dev-mutex); - if (err) - return err; + retval = mutex_lock_interruptible(dev-mutex); + if (retval) + return retval; + + if (dev-going_away) { + retval = -ENODEV; + goto out; + } handle-open++; if (!dev-users++ dev-open) Ugh, not your code, and perhaps it's me, but that looks weird. The ++ hidden inthe if check is ugly, and would mean that users can be negative, which is strange. - err = dev-open(dev); + retval = dev-open(dev); - if (err) - handle-open--; + if (retval !--handle-open) { Eek! That -- is hidden well there. Would it hurt to call synchronize_sched() unconditionally? Something like: if (retval) { handle-open--; It's a rare case anyway. + /* + * Make sure we are not delivering any more events + * through this handle + */ + synchronize_sched(); + } +/** + * input_close_device - close input device + * @handle: handle through which device is being accessed + * + * This function should be called by input handlers
Re: swap-prefetch: A smart way to make good use of idle resources (was: updatedb)
On Sat, July 28, 2007 01:34, grundig wrote: El Fri, 27 Jul 2007 15:06:14 -0700, Arjan van de Ven [EMAIL PROTECTED] escribi�: how do you know there will be other activity? You start the IO and that basically blacks out the disk for 5 to 10 ms. If the real IO gets submitted in that time you add latency. You cannot predict that IO happening or not happening. If there hasn't be much IO for some time, it looks quite reasonable to expect that there won't be more in the near future. Good argument. As most of heuristics can fail, but then this is a feature mostly for desktops, not servers. Bad argument. It doesn't matter for who the feature is intended, it matter what it does and if it does it well or not. In this case, prefetching swap without disturbing anything else. There's an old saying that says something like an open source project starts dying when new people can't participate in the project no matter how hard they try. It's hard to understand why there's so many people opposing to this when other more controversial features are merged much faster, (like, fe. the UIO driver framework). Could people please stop this emotional crap non-argumentation? At best it reduces the chance of swap-prefetch to be merged. Perhaps one of the reasons is that this is core kernel code. And that it isn't a new feature, but a performance improvement with doubtful trade-offs. The problem statement isn't clear either. It seems like a natural enhancement, but is that enough reason to merge it? Maybe, maybe not. But if slow swap-in is the problem, shouldn't that be fixed instead of bypassed? Yes, there are people that say that it works for them, but of those a lot claim updatedb damage is fixed by it too, while that can't be true. And how many of those people did test swap prefetch stand-alone? The ck kernel has other mm patches too, perhaps those are the real goodies... And there don't seem to be many people opposing swap prefetch either. A bunch seem in favour of it, and others seem unconvinced. Me, I don't know if it should be merged or not, it solves one very specific workload, and nothing else (swap is used, and memory becomes free which won't be used in the near future). All in all it seems good, but doubtful, and when in doubt, don't merge. Greetings, Indan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFH] Partition table recovery
On Sun, July 22, 2007 18:28, Theodore Tso wrote: > On Sun, Jul 22, 2007 at 07:10:31AM +0300, Al Boldi wrote: >> Sounds great, but it may be advisable to hook this into the partition >> modification routines instead of mkfs/fsck. Which would mean that the >> partition manager could ask the kernel to instruct its fs subsystem to >> update the backup partition table for each known fs-type that supports such >> a feature. > > Well, let's think about this a bit. What are the requirements? > > 1) The partition manager should be able explicitly request that a new > backup of the partition tables be stashed in each filesystem that has > room for such a backup. That way, when the user affirmatively makes a > partition table change, it can get backed up in all of the right > places automatically. > > 2) The fsck program should *only* stash a backup of the partition > table if there currently isn't one in the filesystem. It may be that > the partition table has been corrupted, and so merely doing an fsck > should not transfer a current copy of the partition table to the > filesystem-secpfic backup area. It could be that the partition table > was only partially recovered, and we don't want to overwrite the > previously existing backups except on an explicit request from the > system administrator. > > 3) The mkfs program should automatically create a backup of the > current partition table layout. That way we get a backup in the newly > created filesystem as soon as it is created. > > 4) The exact location of the backup may vary from filesystem to > filesystem. For ext2/3/4, bytes 512-1023 are always unused, and don't > interfere with the boot sector at bytes 0-511, so that's the obvious > location. Other filesystems may have that location in use, and some > other location might be a better place to store it. Ideally it will > be a well-known location, that isn't dependent on finding an inode > table, or some such, but that may not be possible for all filesystems. To be on the safe side, maybe also add a checksum, timestamp and something identifying the disk the filesystem was created on. Regards, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFH] Partition table recovery
On Sun, July 22, 2007 18:28, Theodore Tso wrote: On Sun, Jul 22, 2007 at 07:10:31AM +0300, Al Boldi wrote: Sounds great, but it may be advisable to hook this into the partition modification routines instead of mkfs/fsck. Which would mean that the partition manager could ask the kernel to instruct its fs subsystem to update the backup partition table for each known fs-type that supports such a feature. Well, let's think about this a bit. What are the requirements? 1) The partition manager should be able explicitly request that a new backup of the partition tables be stashed in each filesystem that has room for such a backup. That way, when the user affirmatively makes a partition table change, it can get backed up in all of the right places automatically. 2) The fsck program should *only* stash a backup of the partition table if there currently isn't one in the filesystem. It may be that the partition table has been corrupted, and so merely doing an fsck should not transfer a current copy of the partition table to the filesystem-secpfic backup area. It could be that the partition table was only partially recovered, and we don't want to overwrite the previously existing backups except on an explicit request from the system administrator. 3) The mkfs program should automatically create a backup of the current partition table layout. That way we get a backup in the newly created filesystem as soon as it is created. 4) The exact location of the backup may vary from filesystem to filesystem. For ext2/3/4, bytes 512-1023 are always unused, and don't interfere with the boot sector at bytes 0-511, so that's the obvious location. Other filesystems may have that location in use, and some other location might be a better place to store it. Ideally it will be a well-known location, that isn't dependent on finding an inode table, or some such, but that may not be possible for all filesystems. To be on the safe side, maybe also add a checksum, timestamp and something identifying the disk the filesystem was created on. Regards, Indan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: understanding firmware loader for speedtouch (kernel 2.6.21.5)
On Mon, July 9, 2007 16:40, mikie wrote: > 2007/7/9, Kay Sievers <[EMAIL PROTECTED]>: >> On 7/9/07, mikie <[EMAIL PROTECTED]> wrote: >> > 2007/7/9, Indan Zupancic <[EMAIL PROTECTED]>: >> > > On Mon, July 9, 2007 10:49, mikie wrote: >> > > > 2007/7/6, Indan Zupancic <[EMAIL PROTECTED]>: >> > > >> On Fri, July 6, 2007 16:20, Duncan Sands wrote: >> > > >> > On Friday 6 July 2007 14:54:18 mikie wrote: >> > > >> >> Hi, >> > > >> >> >> > > >> >> I experience some problems with the speedtch.c module, especially >> > > >> >> in >> > > >> >> regards to its firmware loader. >> > > >> >> I am not quite sure if this module is going to load the firmware >> > > >> >> itself or does it use some external software to do that ? >> > > >> > >> > > >> > It loads it itself, using some external software! >> > > >> >> > > >> For information, it generates a hotplug event and the kernel calls the >> > > >> program written at /proc/sys/kernel/hotplug with certain information. >> > > >> That used to be /sbin/hotplug, became later udev, but today in general >> > > >> udev reads the uevents generated by the kernel. >> > > > >> > > > On my system the /proc/sys/kernel/hotplug points to /sbin/hotplug. >> > > > I copied your script to /sbin/hotplug and also added simple logging, >> > > > so I can see whenever the script is being started. It turns out that >> > > > the script is not started at all by the kernel... >> > > > >> > > > I am afraid that the kernel generates uevents only, could this be true? >> > > >> > > Not really, it would break backward compatibility, and if it were true, >> > > they're remove the /proc/sys/kernel/hotplug setting too. As far as I >> > > know, >> > > if it's set, that program will be executed, and if zero is written to >> > > it, only >> > > uevents are generated. >> > > >> > > Make sure that the script is executable (chmod +x) >> > >> > Yes I have set it to be executable: >> > [EMAIL PROTECTED]:/sbin# ls -lah hotplug >> > -rwxr-xr-x1 root root 934 Jul 9 09:13 hotplug >> > >> > > and has "#!/bin/sh" >> > > at the top, or else it won't work. If that's already the case, I've no >> > > idea. >> > >> > [EMAIL PROTECTED]:/sbin# head hotplug >> > #!/bin/sh >> > set -e >> > >> > >> > Everything looks OK, but still it does not fire up the script... >> >> It should run, if the kernel creates events. Can you run "udevmonitor >> --env" while loading the driver? That way we can see if events are >> generated by the kernel (if you don't have it, no need to install >> udev, just download the udev tarball, type "make" and run >> "./udevmonitor --env"). > > It seems that uevents are generated: What if you run a very simple script like #!/bin/sh echo "test" >> /tmp/test is it called at all? If it is then it's probably something with the other script. Try removing the set -e and/or run it manually with the right parameters set and see if anything goes wrong. Or use a minimal udev and let it do the hotplugging, having only rules for speedtouch and firmware loading (attached, but make sure the paths and pppd call is correct for your system). /dev can stay static, just change udev.conf to handle /udev or something. Regards, Indan speedtch.rules Description: Binary data
Re: understanding firmware loader for speedtouch (kernel 2.6.21.5)
On Mon, July 9, 2007 10:49, mikie wrote: > 2007/7/6, Indan Zupancic <[EMAIL PROTECTED]>: >> On Fri, July 6, 2007 16:20, Duncan Sands wrote: >> > On Friday 6 July 2007 14:54:18 mikie wrote: >> >> Hi, >> >> >> >> I experience some problems with the speedtch.c module, especially in >> >> regards to its firmware loader. >> >> I am not quite sure if this module is going to load the firmware >> >> itself or does it use some external software to do that ? >> > >> > It loads it itself, using some external software! >> >> For information, it generates a hotplug event and the kernel calls the >> program written at /proc/sys/kernel/hotplug with certain information. >> That used to be /sbin/hotplug, became later udev, but today in general >> udev reads the uevents generated by the kernel. > > On my system the /proc/sys/kernel/hotplug points to /sbin/hotplug. > I copied your script to /sbin/hotplug and also added simple logging, > so I can see whenever the script is being started. It turns out that > the script is not started at all by the kernel... > > I am afraid that the kernel generates uevents only, could this be true? Not really, it would break backward compatibility, and if it were true, they're remove the /proc/sys/kernel/hotplug setting too. As far as I know, if it's set, that program will be executed, and if zero is written to it, only uevents are generated. Make sure that the script is executable (chmod +x) and has "#!/bin/sh" at the top, or else it won't work. If that's already the case, I've no idea. Good luck, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: understanding firmware loader for speedtouch (kernel 2.6.21.5)
On Mon, July 9, 2007 10:49, mikie wrote: 2007/7/6, Indan Zupancic [EMAIL PROTECTED]: On Fri, July 6, 2007 16:20, Duncan Sands wrote: On Friday 6 July 2007 14:54:18 mikie wrote: Hi, I experience some problems with the speedtch.c module, especially in regards to its firmware loader. I am not quite sure if this module is going to load the firmware itself or does it use some external software to do that ? It loads it itself, using some external software! For information, it generates a hotplug event and the kernel calls the program written at /proc/sys/kernel/hotplug with certain information. That used to be /sbin/hotplug, became later udev, but today in general udev reads the uevents generated by the kernel. On my system the /proc/sys/kernel/hotplug points to /sbin/hotplug. I copied your script to /sbin/hotplug and also added simple logging, so I can see whenever the script is being started. It turns out that the script is not started at all by the kernel... I am afraid that the kernel generates uevents only, could this be true? Not really, it would break backward compatibility, and if it were true, they're remove the /proc/sys/kernel/hotplug setting too. As far as I know, if it's set, that program will be executed, and if zero is written to it, only uevents are generated. Make sure that the script is executable (chmod +x) and has #!/bin/sh at the top, or else it won't work. If that's already the case, I've no idea. Good luck, Indan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: understanding firmware loader for speedtouch (kernel 2.6.21.5)
On Mon, July 9, 2007 16:40, mikie wrote: 2007/7/9, Kay Sievers [EMAIL PROTECTED]: On 7/9/07, mikie [EMAIL PROTECTED] wrote: 2007/7/9, Indan Zupancic [EMAIL PROTECTED]: On Mon, July 9, 2007 10:49, mikie wrote: 2007/7/6, Indan Zupancic [EMAIL PROTECTED]: On Fri, July 6, 2007 16:20, Duncan Sands wrote: On Friday 6 July 2007 14:54:18 mikie wrote: Hi, I experience some problems with the speedtch.c module, especially in regards to its firmware loader. I am not quite sure if this module is going to load the firmware itself or does it use some external software to do that ? It loads it itself, using some external software! For information, it generates a hotplug event and the kernel calls the program written at /proc/sys/kernel/hotplug with certain information. That used to be /sbin/hotplug, became later udev, but today in general udev reads the uevents generated by the kernel. On my system the /proc/sys/kernel/hotplug points to /sbin/hotplug. I copied your script to /sbin/hotplug and also added simple logging, so I can see whenever the script is being started. It turns out that the script is not started at all by the kernel... I am afraid that the kernel generates uevents only, could this be true? Not really, it would break backward compatibility, and if it were true, they're remove the /proc/sys/kernel/hotplug setting too. As far as I know, if it's set, that program will be executed, and if zero is written to it, only uevents are generated. Make sure that the script is executable (chmod +x) Yes I have set it to be executable: [EMAIL PROTECTED]:/sbin# ls -lah hotplug -rwxr-xr-x1 root root 934 Jul 9 09:13 hotplug and has #!/bin/sh at the top, or else it won't work. If that's already the case, I've no idea. [EMAIL PROTECTED]:/sbin# head hotplug #!/bin/sh set -e Everything looks OK, but still it does not fire up the script... It should run, if the kernel creates events. Can you run udevmonitor --env while loading the driver? That way we can see if events are generated by the kernel (if you don't have it, no need to install udev, just download the udev tarball, type make and run ./udevmonitor --env). It seems that uevents are generated: What if you run a very simple script like #!/bin/sh echo test /tmp/test is it called at all? If it is then it's probably something with the other script. Try removing the set -e and/or run it manually with the right parameters set and see if anything goes wrong. Or use a minimal udev and let it do the hotplugging, having only rules for speedtouch and firmware loading (attached, but make sure the paths and pppd call is correct for your system). /dev can stay static, just change udev.conf to handle /udev or something. Regards, Indan speedtch.rules Description: Binary data
Re: understanding firmware loader for speedtouch (kernel 2.6.21.5)
On Fri, July 6, 2007 16:20, Duncan Sands wrote: > On Friday 6 July 2007 14:54:18 mikie wrote: >> Hi, >> >> I experience some problems with the speedtch.c module, especially in >> regards to its firmware loader. >> I am not quite sure if this module is going to load the firmware >> itself or does it use some external software to do that ? > > It loads it itself, using some external software! For information, it generates a hotplug event and the kernel calls the program written at /proc/sys/kernel/hotplug with certain information. That used to be /sbin/hotplug, became later udev, but today in general udev reads the uevents generated by the kernel. >> I have read and checked the firmware.agent script from some hotplug >> package, and noticed that it makes something like simple file copy of >> the firmware to the "data" and echoing 1 or 0 to the "loading" file. >> But what is the location of these data/loading ? Also the firmware is >> split into 2 parts -- should I upload them one after another? Is that >> all that should be done to get the firmware loaded? > > You need to write the file that was asked for: the modem starts by asking > for the first file, uploads it, then asks for the second file and uploads it. > The file is written to the "data" file, but note that the data file is created > on the fly when the modem requests a firmware load so you can't write to it > at some random time. If you really don't want to use udevd, you could use the below script I wrote (not really tested, it's an bash version of the Udev rules I've have, more or less.) Echo the script pathname to /proc/sys/kernel/hotplug early at bootup and all should work, assuming your pppd is setup correctly and the script isn't buggy. #!/bin/sh set -e p() { # For debugging/logging, edit for your needs. echo "$*" echo "$*" > /dev/vc/1; echo "$*" >> /tmp/bla; } if [ "$SUBSYSTEM" = "firmware" ]; then p "Loading firmware..." cd /sys/$DEVPATH if [ -f "/lib/firmware/$FIRMWARE" ]; then echo 1 > loading cat "/lib/firmware/$FIRMWARE" > data echo 0 > loading p "Done." else echo -1 > loading p "$FIRMWARE not found." fi elif [ "$SUBSYSTEM" = "atm" ]; then if [[ "$KERNEL" != speedtch* ]]; then exit fi if [ "$ACTION" = "add" ]; then p "Starting pppd." pppd call adsl elif [ "$ACTION" = "remove" ]; then p "Stopping pppd." pkill pppd fi fi Greetings, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: understanding firmware loader for speedtouch (kernel 2.6.21.5)
On Fri, July 6, 2007 16:20, Duncan Sands wrote: On Friday 6 July 2007 14:54:18 mikie wrote: Hi, I experience some problems with the speedtch.c module, especially in regards to its firmware loader. I am not quite sure if this module is going to load the firmware itself or does it use some external software to do that ? It loads it itself, using some external software! For information, it generates a hotplug event and the kernel calls the program written at /proc/sys/kernel/hotplug with certain information. That used to be /sbin/hotplug, became later udev, but today in general udev reads the uevents generated by the kernel. I have read and checked the firmware.agent script from some hotplug package, and noticed that it makes something like simple file copy of the firmware to the data and echoing 1 or 0 to the loading file. But what is the location of these data/loading ? Also the firmware is split into 2 parts -- should I upload them one after another? Is that all that should be done to get the firmware loaded? You need to write the file that was asked for: the modem starts by asking for the first file, uploads it, then asks for the second file and uploads it. The file is written to the data file, but note that the data file is created on the fly when the modem requests a firmware load so you can't write to it at some random time. If you really don't want to use udevd, you could use the below script I wrote (not really tested, it's an bash version of the Udev rules I've have, more or less.) Echo the script pathname to /proc/sys/kernel/hotplug early at bootup and all should work, assuming your pppd is setup correctly and the script isn't buggy. #!/bin/sh set -e p() { # For debugging/logging, edit for your needs. echo $* echo $* /dev/vc/1; echo $* /tmp/bla; } if [ $SUBSYSTEM = firmware ]; then p Loading firmware... cd /sys/$DEVPATH if [ -f /lib/firmware/$FIRMWARE ]; then echo 1 loading cat /lib/firmware/$FIRMWARE data echo 0 loading p Done. else echo -1 loading p $FIRMWARE not found. fi elif [ $SUBSYSTEM = atm ]; then if [[ $KERNEL != speedtch* ]]; then exit fi if [ $ACTION = add ]; then p Starting pppd. pppd call adsl elif [ $ACTION = remove ]; then p Stopping pppd. pkill pppd fi fi Greetings, Indan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: blink driver power saving
On Mon, July 2, 2007 13:51, Andi Kleen wrote: > On Monday 02 July 2007 13:43:23 Indan Zupancic wrote: >> On Mon, July 2, 2007 01:59, Andi Kleen wrote: >> > Well only those that could be already hung from user space >> > with setleds (that was also confirmed). Actually I thought >> > they didn't hang completely, but just stopped reacting to >> > the keyboard (which is actually pretty bad for every user >> > to be able to trigger) >> >> Pavel's lost key events, mine stopped reacting altogether. > > Did you try if the network was still alive? Perhaps it was > just a locked up keyboard. No, it wasn't locked up, because the leds were blinking. I think if I waited long enough it eventually reacted to other key presses too, sometimes. At least flooding it with Ctrl+C stopped the setleds loop long after I had any hope left. >> >> > Anyways, Stephen's patch just doesn't make sense: >> > he clearly didn't understand the code at all. Before you >> > apply it and cripple it better drop the driver completely. >> >> CC'ing Dmitry, as I think he doesn't like the blink driver much either. ;-) > > Perhaps one of you geniuses who all hate it can find a better way to > solve the "video output dead after kexec; but need visual feedback to the user > while crash dumping" problem. I'm waiting for your patches. While someone's at it, perhaps USB keyboards could be made working too. And make it generic infrastructure too so everyone can (ab)use it? What about adopting the approach here and share the code: http://lkml.org/lkml/2007/6/19/112 At least it works with all keyboards... Greetings, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: blink driver power saving
On Mon, July 2, 2007 01:59, Andi Kleen wrote: > Well only those that could be already hung from user space > with setleds (that was also confirmed). Actually I thought > they didn't hang completely, but just stopped reacting to > the keyboard (which is actually pretty bad for every user > to be able to trigger) Pavel's lost key events, mine stopped reacting altogether. > I guess the better way to handle those would be to find out the > minimum frequency of blinking that is still ok and rate limit it to that in > the keyboard driver. Dmitry already has a patch for that. He limited it to one event each 50 ms. > Anyways, Stephen's patch just doesn't make sense: > he clearly didn't understand the code at all. Before you > apply it and cripple it better drop the driver completely. CC'ing Dmitry, as I think he doesn't like the blink driver much either. ;-) Greetings, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: blink driver power saving
On Mon, July 2, 2007 01:59, Andi Kleen wrote: Well only those that could be already hung from user space with setleds (that was also confirmed). Actually I thought they didn't hang completely, but just stopped reacting to the keyboard (which is actually pretty bad for every user to be able to trigger) Pavel's lost key events, mine stopped reacting altogether. I guess the better way to handle those would be to find out the minimum frequency of blinking that is still ok and rate limit it to that in the keyboard driver. Dmitry already has a patch for that. He limited it to one event each 50 ms. Anyways, Stephen's patch just doesn't make sense: he clearly didn't understand the code at all. Before you apply it and cripple it better drop the driver completely. CC'ing Dmitry, as I think he doesn't like the blink driver much either. ;-) Greetings, Indan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: blink driver power saving
On Mon, July 2, 2007 13:51, Andi Kleen wrote: On Monday 02 July 2007 13:43:23 Indan Zupancic wrote: On Mon, July 2, 2007 01:59, Andi Kleen wrote: Well only those that could be already hung from user space with setleds (that was also confirmed). Actually I thought they didn't hang completely, but just stopped reacting to the keyboard (which is actually pretty bad for every user to be able to trigger) Pavel's lost key events, mine stopped reacting altogether. Did you try if the network was still alive? Perhaps it was just a locked up keyboard. No, it wasn't locked up, because the leds were blinking. I think if I waited long enough it eventually reacted to other key presses too, sometimes. At least flooding it with Ctrl+C stopped the setleds loop long after I had any hope left. Anyways, Stephen's patch just doesn't make sense: he clearly didn't understand the code at all. Before you apply it and cripple it better drop the driver completely. CC'ing Dmitry, as I think he doesn't like the blink driver much either. ;-) Perhaps one of you geniuses who all hate it can find a better way to solve the video output dead after kexec; but need visual feedback to the user while crash dumping problem. I'm waiting for your patches. While someone's at it, perhaps USB keyboards could be made working too. And make it generic infrastructure too so everyone can (ab)use it? What about adopting the approach here and share the code: http://lkml.org/lkml/2007/6/19/112 At least it works with all keyboards... Greetings, Indan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc[23]: blinking capslock led, stuck keys?
On Sat, June 16, 2007 05:34, Dmitry Torokhov wrote: > On Friday 15 June 2007 22:04, Indan Zupancic wrote: >> On Fri, June 15, 2007 07:41, Dmitry Torokhov wrote: >> > /* >> > + * Schedule switch for execution. We need to throttle requests, >> > + * otherwise keyboard may become unresponsive. >> > + */ >> > +static void atkbd_schedule_event_work(struct atkbd *atkbd, int event_bit) >> > +{ >> > + unsigned long delay = msecs_to_jiffies(50); >> > + >> > + if (time_after(jiffies, atkbd->event_jiffies + delay)) >> > + delay = 0; >> > + >> > + atkbd->event_jiffies = jiffies; >> > + set_bit(event_bit, >event_mask); >> > + wmb(); >> > + schedule_delayed_work(>event_work, delay); >> > +} >> >> I don't know whether schedule_delayed_work() requeues event_work, or if >> it adds more work, but both seem to give wrong behaviour: > > Well, my advise would be to research the matter before saying that > it will not work. Good advise. My advise to others is, that if you do research it, then don't do it hastily like I did. ;-) My quick search didn't find the exact behaviour of schedule_delayed_work(), when looking in LDD, but I should've digged through to the description of queue_delayed_work(). >> In the first case >> event_work can be postponed forever if atkbd_schedule_event_work() is >> called repeatedly each time within 50 ms, and for the second case there's a >> delay added, but the number of times the LED is switched stays the same, >> so it's not being throttled. >> > > No, once work is queued for execution subsequent attempts to queue the > same work will be ignored (until work starts executing). Therefore first > time work will be scheduled for execution immediately and then execution > be spaced by ~50 ms. Thank you for the explanation. I applied the patch, and it compiles and runs as expected, can't lockup the keyboard anymore with it applied. Greetings, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc[23]: blinking capslock led, stuck keys?
On Sat, June 16, 2007 05:34, Dmitry Torokhov wrote: On Friday 15 June 2007 22:04, Indan Zupancic wrote: On Fri, June 15, 2007 07:41, Dmitry Torokhov wrote: /* + * Schedule switch for execution. We need to throttle requests, + * otherwise keyboard may become unresponsive. + */ +static void atkbd_schedule_event_work(struct atkbd *atkbd, int event_bit) +{ + unsigned long delay = msecs_to_jiffies(50); + + if (time_after(jiffies, atkbd-event_jiffies + delay)) + delay = 0; + + atkbd-event_jiffies = jiffies; + set_bit(event_bit, atkbd-event_mask); + wmb(); + schedule_delayed_work(atkbd-event_work, delay); +} I don't know whether schedule_delayed_work() requeues event_work, or if it adds more work, but both seem to give wrong behaviour: Well, my advise would be to research the matter before saying that it will not work. Good advise. My advise to others is, that if you do research it, then don't do it hastily like I did. ;-) My quick search didn't find the exact behaviour of schedule_delayed_work(), when looking in LDD, but I should've digged through to the description of queue_delayed_work(). In the first case event_work can be postponed forever if atkbd_schedule_event_work() is called repeatedly each time within 50 ms, and for the second case there's a delay added, but the number of times the LED is switched stays the same, so it's not being throttled. No, once work is queued for execution subsequent attempts to queue the same work will be ignored (until work starts executing). Therefore first time work will be scheduled for execution immediately and then execution be spaced by ~50 ms. Thank you for the explanation. I applied the patch, and it compiles and runs as expected, can't lockup the keyboard anymore with it applied. Greetings, Indan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc[23]: blinking capslock led, stuck keys?
On Fri, June 15, 2007 07:41, Dmitry Torokhov wrote: > Does the patch below help? Didn't try it yet, but will tomorrow. > Input: atkbd - throttle LED switching > > On some boxes keyboard controllers are too slow to withstand > continuous flow of requests to turn keyboard LEDs on and off > and start losing some keypresses or even all of them. > > Delay executing of LED switching request if we had another one > withing 50 ms thus easing load on the controller. > > Signed-off-by: Dmitry Torokhov <[EMAIL PROTECTED]> > --- > > drivers/input/keyboard/atkbd.c | 40 > ++-- > 1 files changed, 26 insertions(+), 14 deletions(-) > > Index: work/drivers/input/keyboard/atkbd.c > === > --- work.orig/drivers/input/keyboard/atkbd.c > +++ work/drivers/input/keyboard/atkbd.c > @@ -219,7 +219,8 @@ struct atkbd { > unsigned long time; > unsigned long err_count; > > - struct work_struct event_work; > + struct delayed_work event_work; > + unsigned long event_jiffies; > struct mutex event_mutex; > unsigned long event_mask; > }; > @@ -565,7 +566,7 @@ static int atkbd_set_leds(struct atkbd * > > static void atkbd_event_work(struct work_struct *work) > { > - struct atkbd *atkbd = container_of(work, struct atkbd, event_work); > + struct atkbd *atkbd = container_of(work, struct atkbd, event_work.work); > > mutex_lock(>event_mutex); > > @@ -579,12 +580,30 @@ static void atkbd_event_work(struct work > } > > /* > + * Schedule switch for execution. We need to throttle requests, > + * otherwise keyboard may become unresponsive. > + */ > +static void atkbd_schedule_event_work(struct atkbd *atkbd, int event_bit) > +{ > + unsigned long delay = msecs_to_jiffies(50); > + > + if (time_after(jiffies, atkbd->event_jiffies + delay)) > + delay = 0; > + > + atkbd->event_jiffies = jiffies; > + set_bit(event_bit, >event_mask); > + wmb(); > + schedule_delayed_work(>event_work, delay); > +} I don't know whether schedule_delayed_work() requeues event_work, or if it adds more work, but both seem to give wrong behaviour: In the first case event_work can be postponed forever if atkbd_schedule_event_work() is called repeatedly each time within 50 ms, and for the second case there's a delay added, but the number of times the LED is switched stays the same, so it's not being throttled. Greetings, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc[23]: blinking capslock led, stuck keys?
On Fri, June 15, 2007 07:41, Dmitry Torokhov wrote: Does the patch below help? Didn't try it yet, but will tomorrow. Input: atkbd - throttle LED switching On some boxes keyboard controllers are too slow to withstand continuous flow of requests to turn keyboard LEDs on and off and start losing some keypresses or even all of them. Delay executing of LED switching request if we had another one withing 50 ms thus easing load on the controller. Signed-off-by: Dmitry Torokhov [EMAIL PROTECTED] --- drivers/input/keyboard/atkbd.c | 40 ++-- 1 files changed, 26 insertions(+), 14 deletions(-) Index: work/drivers/input/keyboard/atkbd.c === --- work.orig/drivers/input/keyboard/atkbd.c +++ work/drivers/input/keyboard/atkbd.c @@ -219,7 +219,8 @@ struct atkbd { unsigned long time; unsigned long err_count; - struct work_struct event_work; + struct delayed_work event_work; + unsigned long event_jiffies; struct mutex event_mutex; unsigned long event_mask; }; @@ -565,7 +566,7 @@ static int atkbd_set_leds(struct atkbd * static void atkbd_event_work(struct work_struct *work) { - struct atkbd *atkbd = container_of(work, struct atkbd, event_work); + struct atkbd *atkbd = container_of(work, struct atkbd, event_work.work); mutex_lock(atkbd-event_mutex); @@ -579,12 +580,30 @@ static void atkbd_event_work(struct work } /* + * Schedule switch for execution. We need to throttle requests, + * otherwise keyboard may become unresponsive. + */ +static void atkbd_schedule_event_work(struct atkbd *atkbd, int event_bit) +{ + unsigned long delay = msecs_to_jiffies(50); + + if (time_after(jiffies, atkbd-event_jiffies + delay)) + delay = 0; + + atkbd-event_jiffies = jiffies; + set_bit(event_bit, atkbd-event_mask); + wmb(); + schedule_delayed_work(atkbd-event_work, delay); +} I don't know whether schedule_delayed_work() requeues event_work, or if it adds more work, but both seem to give wrong behaviour: In the first case event_work can be postponed forever if atkbd_schedule_event_work() is called repeatedly each time within 50 ms, and for the second case there's a delay added, but the number of times the LED is switched stays the same, so it's not being throttled. Greetings, Indan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc[23]: blinking capslock led, stuck keys?
On Wed, June 13, 2007 10:18, Pavel Machek wrote: >> Well, as I said before, I've the "stuck key"/repeated output too (as well >> as a warping PS/2 mouse), but no blinking led problem, so I believe the >> two things are totally unrelated. > > Well, after turning off CONFIG_BLINK, my problems went away, and with > a fast-blink done from userspace, I can make them way worse. They > _are_ related here. I missed you saying that before, so to me it looked like everyone just assumed that. So, just for fun, I tried running: while true; do setleds +num; setleds -num; done and it totally locked up my keyboard. Even SysRq didn't work. On the bright side, the numlock LED was indeed blinking. Though running the same with a sleep 0.1 added doesn't produce any problems. So maybe my problem is indeed a bit related to this after all, somehow. Anyone any ideas how to debug this problem? Greetings, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc[23]: blinking capslock led, stuck keys?
On Wed, June 13, 2007 10:18, Pavel Machek wrote: Well, as I said before, I've the stuck key/repeated output too (as well as a warping PS/2 mouse), but no blinking led problem, so I believe the two things are totally unrelated. Well, after turning off CONFIG_BLINK, my problems went away, and with a fast-blink done from userspace, I can make them way worse. They _are_ related here. I missed you saying that before, so to me it looked like everyone just assumed that. So, just for fun, I tried running: while true; do setleds +num; setleds -num; done and it totally locked up my keyboard. Even SysRq didn't work. On the bright side, the numlock LED was indeed blinking. Though running the same with a sleep 0.1 added doesn't produce any problems. So maybe my problem is indeed a bit related to this after all, somehow. Anyone any ideas how to debug this problem? Greetings, Indan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc[23]: blinking capslock led, stuck keys?
On Tue, June 12, 2007 07:42, Dmitry Torokhov wrote: > For what it worth I finally tried that setleds loop on my laptop. I am > not getting any lost keypresses/releases. But then I don't have EC > (or at least it is not exported via ACPI). This is an old Dell notebook. Well, as I said before, I've the "stuck key"/repeated output too (as well as a warping PS/2 mouse), but no blinking led problem, so I believe the two things are totally unrelated. Greetings, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc[23]: blinking capslock led, stuck keys?
On Tue, June 12, 2007 07:42, Dmitry Torokhov wrote: For what it worth I finally tried that setleds loop on my laptop. I am not getting any lost keypresses/releases. But then I don't have EC (or at least it is not exported via ACPI). This is an old Dell notebook. Well, as I said before, I've the stuck key/repeated output too (as well as a warping PS/2 mouse), but no blinking led problem, so I believe the two things are totally unrelated. Greetings, Indan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc[23]: blinking capslock led, stuck keys?
On Mon, June 4, 2007 15:12, Jiri Kosina wrote: > Are you sure that it's this dummy blink driver that makes the kernel > stuck? I can't see how it could be causing any hogs - see commit f038f9. To make it clear, I'm not using the blink driver and get the stuck key problem too. (And also a warpy PS/2 mouse.) Wildly guessing: As the problem doesn't seem to be widespread, perhaps it's suspend to ram related. I'll try running a while without suspending and see if it crops up too then. Greetings, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc[23]: blinking capslock led, stuck keys?
On Mon, June 4, 2007 14:38, Jiri Kosina wrote: > On Mon, 4 Jun 2007, Indan Zupancic wrote: > >> > I also get some stuck keys I was not getting before. I hope my >> > userland did not go crazy... >> I get that too, and wasn't sure about it either. It happens irregularly, >> so it's hard to debug, but often enough to be annoying. > > What kind of keyboards are they? I guess the x60 case is serio/ps2, what > keyboard do you experience this with, Indan? Also serio/ps2: input: AT Translated Set 2 keyboard as /class/input/input0 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc[23]: blinking capslock led, stuck keys?
On Mon, June 4, 2007 13:24, Pavel Machek wrote: > I also get some stuck keys I was not getting before. I hope my > userland did not go crazy... I get that too, and wasn't sure about it either. It happens irregularly, so it's hard to debug, but often enough to be annoying. My mouse pointer tends to warp too in the same irregular way, no idea if that's related (it seemed to happen in an earlier kernel version than the sticky key thing, but not sure, as I don't have this mouse that long.) Greetings, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc[23]: blinking capslock led, stuck keys?
On Mon, June 4, 2007 13:24, Pavel Machek wrote: I also get some stuck keys I was not getting before. I hope my userland did not go crazy... I get that too, and wasn't sure about it either. It happens irregularly, so it's hard to debug, but often enough to be annoying. My mouse pointer tends to warp too in the same irregular way, no idea if that's related (it seemed to happen in an earlier kernel version than the sticky key thing, but not sure, as I don't have this mouse that long.) Greetings, Indan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc[23]: blinking capslock led, stuck keys?
On Mon, June 4, 2007 14:38, Jiri Kosina wrote: On Mon, 4 Jun 2007, Indan Zupancic wrote: I also get some stuck keys I was not getting before. I hope my userland did not go crazy... I get that too, and wasn't sure about it either. It happens irregularly, so it's hard to debug, but often enough to be annoying. What kind of keyboards are they? I guess the x60 case is serio/ps2, what keyboard do you experience this with, Indan? Also serio/ps2: input: AT Translated Set 2 keyboard as /class/input/input0 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc[23]: blinking capslock led, stuck keys?
On Mon, June 4, 2007 15:12, Jiri Kosina wrote: Are you sure that it's this dummy blink driver that makes the kernel stuck? I can't see how it could be causing any hogs - see commit f038f9. To make it clear, I'm not using the blink driver and get the stuck key problem too. (And also a warpy PS/2 mouse.) Wildly guessing: As the problem doesn't seem to be widespread, perhaps it's suspend to ram related. I'll try running a while without suspending and see if it crops up too then. Greetings, Indan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.22-rc3][ACPI?] Resume from s2r doesn't work.
On Sat, June 2, 2007 00:17, Olaf Dietsche wrote: > It doesn't work. I tried all options "s2ram -f (-s, -p, -m, -r, -a 1, -a 2, > -a 3)" one after the other. > > Since the screen (or any other device) works without problems, when I > skip acpi_enter_sleep_state(), I don't think it's screen related. I use "s2ram -f -p -s -a 3", maybe you need some exotic combination too. Greetings, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.22-rc3][ACPI?] Resume from s2r doesn't work.
On Sat, June 2, 2007 00:17, Olaf Dietsche wrote: It doesn't work. I tried all options s2ram -f (-s, -p, -m, -r, -a 1, -a 2, -a 3) one after the other. Since the screen (or any other device) works without problems, when I skip acpi_enter_sleep_state(), I don't think it's screen related. I use s2ram -f -p -s -a 3, maybe you need some exotic combination too. Greetings, Indan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: fix section mismatch warnings in mtrr
On Mon, May 21, 2007 17:11, Sam Ravnborg wrote: > On Mon, May 21, 2007 at 02:52:39PM +0100, Jeremy Fitzhardinge wrote: >> Sam Ravnborg wrote: >> > There was another patch that removed the __init marker to _fix_ section >> > mismatch errors. >> > I have lost the actual mail but I asked the submitter to send me a copy of >> > the configuration so I could take a closer look. >> > Obviously it was the wrong fix to remove the _init marker. >> >> Hm, I think there were two overlapping patches to address the problem. >> One removed the __init markers, and was a bit of a hack. Mine >> restructured all the functions so that it was all done properly (ie, >> init called init; cpuinit called cpuinit). Looks like both got applied. > > OK, then all is good. Did the patches reach 2.6.22-rc3 yet? If they did, then the following warnings might need fixing: WARNING: arch/i386/kernel/built-in.o(.text+0x84bc): Section mismatch: reference to .init.text:amd_init_mtrr (between 'mtrr_bp_init' and 'mtrr_attrib_to_str') WARNING: arch/i386/kernel/built-in.o(.text+0x84c1): Section mismatch: reference to .init.text:cyrix_init_mtrr (between 'mtrr_bp_init' and 'mtrr_attrib_to_str') WARNING: arch/i386/kernel/built-in.o(.text+0x84c6): Section mismatch: reference to .init.text:centaur_init_mtrr (between 'mtrr_bp_init' and 'mtrr_attrib_to_str') WARNING: arch/i386/kernel/built-in.o(.text+0x92d8): Section mismatch: reference to .init.text: (between 'get_mtrr_state' and 'generic_get_mtrr') WARNING: arch/i386/kernel/built-in.o(.text+0x92ef): Section mismatch: reference to .init.text: (between 'get_mtrr_state' and 'generic_get_mtrr') WARNING: arch/i386/kernel/built-in.o(.text+0x9313): Section mismatch: reference to .init.text: (between 'get_mtrr_state' and 'generic_get_mtrr') Config attached. Greetings, Indan config.gz Description: application/gzip
Re: [PATCH] x86: fix section mismatch warnings in mtrr
On Mon, May 21, 2007 17:11, Sam Ravnborg wrote: On Mon, May 21, 2007 at 02:52:39PM +0100, Jeremy Fitzhardinge wrote: Sam Ravnborg wrote: There was another patch that removed the __init marker to _fix_ section mismatch errors. I have lost the actual mail but I asked the submitter to send me a copy of the configuration so I could take a closer look. Obviously it was the wrong fix to remove the _init marker. Hm, I think there were two overlapping patches to address the problem. One removed the __init markers, and was a bit of a hack. Mine restructured all the functions so that it was all done properly (ie, init called init; cpuinit called cpuinit). Looks like both got applied. OK, then all is good. Did the patches reach 2.6.22-rc3 yet? If they did, then the following warnings might need fixing: WARNING: arch/i386/kernel/built-in.o(.text+0x84bc): Section mismatch: reference to .init.text:amd_init_mtrr (between 'mtrr_bp_init' and 'mtrr_attrib_to_str') WARNING: arch/i386/kernel/built-in.o(.text+0x84c1): Section mismatch: reference to .init.text:cyrix_init_mtrr (between 'mtrr_bp_init' and 'mtrr_attrib_to_str') WARNING: arch/i386/kernel/built-in.o(.text+0x84c6): Section mismatch: reference to .init.text:centaur_init_mtrr (between 'mtrr_bp_init' and 'mtrr_attrib_to_str') WARNING: arch/i386/kernel/built-in.o(.text+0x92d8): Section mismatch: reference to .init.text: (between 'get_mtrr_state' and 'generic_get_mtrr') WARNING: arch/i386/kernel/built-in.o(.text+0x92ef): Section mismatch: reference to .init.text: (between 'get_mtrr_state' and 'generic_get_mtrr') WARNING: arch/i386/kernel/built-in.o(.text+0x9313): Section mismatch: reference to .init.text: (between 'get_mtrr_state' and 'generic_get_mtrr') Config attached. Greetings, Indan config.gz Description: application/gzip
Re: sd_resume redundant? [was: [PATCH] libata: implement ata_wait_after_reset()]
On Sun, May 20, 2007 19:17, Tejun Heo wrote: > Indan Zupancic wrote: >> So over all it takes half a second longer to detect the disk, but >> because everything waits on it, it takes more than three seconds >> longer to resume. > > Eeeek. Extra three secs doesn't sound too hot. :-( Hey, without your reset fix it was 11 seconds slower. ;-) Well, it used to be around 1.7 seconds to resume, and now (after the fixes) it's 2.8 (according to the dmesg timestamps). But the first timestamp printed was 0.7, and now it's 2.1, so I don't know if it's a real slowdown or that the timer is set correctly earlier. That was with 2.19 btw, I don't know what happened in the meantime. I suspect the timer changes. To know for sure I'd need to boot an older kernel and measure resume time with a stopwatch. >> Setting manage_start_stop to 0 fixes it and is good enough for me, I >> didn't notice anything bad yet because of the unmanaged >> stop. Implementing background spin up will fix it too. > > Just commenting out sd_resume() would be a better solution for your > case tho. I like your idea below more. ;-) >>>> Everything seems to work fine without sd_resume(), so why is it needed? >>> Because not all disks spin up without being told to do so and like it or >>> not spinning disks up on resume is the default behavior. As I wrote in >>> the other reply, it would be worthwhile to make it configurable. >> >> Not even after they receive a read command? Ugh. > > After receiving a command which requires media access, they do. What > I was saying is that the current default behavior is to spin up all > devices on resume and part of that is achieved by sd_resume(). > > Hmmm... skipping START_STOP during sd_resume() actually is a pretty > good solution for ATA devices. I'll think about it. I can't speak for others, but for me this would fix all regressions and even disk spin up seems half a second faster without the START_STOP. It would also solve the laptopmode thing, not unnecessarily spinning up disks that don't need to be (if the hardware doesn't do it). Greetings, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] libata: implement ata_wait_after_reset()
Hello Tejun, On Sun, May 20, 2007 19:09, Tejun Heo wrote: > Indan Zupancic wrote: >> Don't controllers support spread spin up of disks, to avoid the peak load? >> I think I saw something about that in the SiI 3512 spec I downloaded. So >> maybe something like a special spin up method which can be implemented >> by drivers, and if it isn't, the current start stop thing is used? > > What they support is allowing drivers to control spin up, and it needs > support from the HDD (many seem to do these days) && BIOS (so that it > doesn't spin up the disks during POST) too. To do that properly, we > need a repository which distributes spinup tokens such that we can do > parallel, not thundering herd, spinups. More and more do I get the impression that with full hardware control for Linux, from BIOS to firmware, great things could be done. Alas. > The deadline table sets limits to the maximum time a try can take && > when the next attempt is made. So, the initial deadline of 10 secs > means that the first try is allowed to take upto 10secs and no matter > when or why it fails, the next attempt is made after 10sec is passed. I > would love to bang devices harder but a lot of creepy device are out > there and I'm pretty sure some of them would crap themselves if pushed > too hard. :-( Well, if it's going to be retried later on, then it's not really a deadline, is it? Perhaps it's better to give up after the deadline, and only try again after an interrupt is received from the controller? > Anyways, in your case, the problem was that sata_sil wasted the first > try for no good reason. I'm not determined whether we need more > aggressive deadlines for early tries for cases like this. It's > something to think about. The annoying part is that all this seems to depend on the harddisk, and not really on the controller (or the combination of the two). >> Bugger. So it seems like a good idea to do the reset and spinup async >> together. > > There are a lot of cases. Depending on various configuration and whims, > some drives spin up when power is reapplied and don't respond to reset > till it's ready. Some drives don't spin up when power is reapplied but > spin up when PHY is woken up and don't respond to reset till it's ready. > Yet others just sit quiet and respond to reset nicely but don't spin up > till told to do so. > > So, well, I agree that the best strategy is doing it all asynchronously. Another one would be to do it interrupt driven instead of doing polling, if the controller supports it well. >> The whole resetting, or just the retry after 10 seconds? But it becomes >> clear now that >> you're right that the spinup problem is solved if sd_resume does background >> spin up. > > The whole resetting. ATA controller/port resume itself is completely > asynchronous. The resume method just kicks EH thread and tells it to > wake the thing up and returns. The EH thread asynchronously resets the > port, revalidates all the devices and see if any new ones are attached, etc. > > The culprit is that while EH thread is active all commands are deferred, > so the START_STOP command sd_resume() issues waits in the SCSI midlayer > queue till EH is complete. Are you saying that right now there is only one EH thread? In that case an improvent would be to have one EH thread per controller/device, so that at least independent scsi devices can't block each other. > That's why doing it asynchronously is > tricky. I'm not sure whether we would be able to do it without > separating libata from SCSI so that libata has its own suspend/resume > functions. :-( Well, it seems you just moved the other direction, with leaning on sd_resume. >> Thanks for the pointer, I'll fiddle with it. What do you mean with "proper" >> here? >> Not at all, or in a forced way because power goes away? > > Well, basically, your drive isn't told that the power is gonna go off. > Power just goes off and your drive has to do emergency head unload which > sometimes causes the funny noise. Hm, I suspected something like that. Though the shutdown sound of my HD seems to sound the same as usual. Perhaps the BIOS gives it a sign as part of the suspend? (Or to the controller, which delegates it.) >> Perhaps a silly question, but how can you ever set manage_start_stop to 0 >> before >> the disk is started at bootup? You can change it before doing suspend, but >> not at >> bootup. So is this option at the right place? > > The problem is sorta complicated because it involves all devices which > use SCSI midlayer. libata, USB, all the regular SCSI disks, whatnot, so > it's really hard to sell to linux-scsi people if you choose a default > behavior wh
Re: sd_resume redundant? [was: [PATCH] libata: implement ata_wait_after_reset()]
On Sun, May 20, 2007 11:54, Tejun Heo wrote: > Indan Zupancic wrote: >> On Sat, May 19, 2007 21:04, Tejun Heo wrote: >>> Tejun Heo wrote: >>>> Yeah, if SCR registers are accessible, 0xff doesn't indicate the device >>>> isn't there, so the whole skip-0xff logic probably shouldn't apply in >>>> such cases, but we can also achieve pretty good result by just making >>>> the first reset tries a bit more aggressive. >>> So, here's the patch. >>> >>> Paul, can you please test this patch without the previous patch? Indan, >>> this should reduce the resume delay. Please test. But you'll still >>> feel some added delay compared to 2.6.20 due to the mentioned >>> suspend/resume change. >> >> This removed the COMRESET errors indeed, and with sd_resume() >> disabled everything is speedy again (2s or so. Still a desktop pc). >> I didn't try with sd_resume enabled. > > Can you try to measure with sd_resume in place? [2.173366] sd 0:0:0:0: [sda] Starting disk [2.475422] ata2: SATA link down (SStatus 0 SControl 310) [5.478403] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) [5.481928] ata1.00: ata_hpa_resize 1: sectors = 234441648, hpa_sectors = 234441648 [5.485904] ata1.00: ata_hpa_resize 1: sectors = 234441648, hpa_sectors = 234441648 [5.485913] ata1.00: configured for UDMA/100 [5.505109] sd 0:0:0:0: [sda] 234441648 512-byte hardware sectors (120034 MB) [5.505461] sd 0:0:0:0: [sda] Write Protect is off [5.505465] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 [5.505612] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA ... [6.157259] Restarting tasks ... done. And with echo 0 > /sys/class/scsi_disk/0\:0\:0\:0/manage_start_stop: [2.476476] ata2: SATA link down (SStatus 0 SControl 310) ... [2.825479] Restarting tasks ... done. ... [5.022076] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) [5.025605] ata1.00: ata_hpa_resize 1: sectors = 234441648, hpa_sectors = 234441648 [5.028594] ata1.00: ata_hpa_resize 1: sectors = 234441648, hpa_sectors = 234441648 [5.028606] ata1.00: configured for UDMA/100 [5.028720] sd 0:0:0:0: [sda] 234441648 512-byte hardware sectors (120034 MB) [5.028767] sd 0:0:0:0: [sda] Write Protect is off [5.028773] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 [5.028831] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA So over all it takes half a second longer to detect the disk, but because everything waits on it, it takes more than three seconds longer to resume. Setting manage_start_stop to 0 fixes it and is good enough for me, I didn't notice anything bad yet because of the unmanaged stop. Implementing background spin up will fix it too. >> Everything seems to work fine without sd_resume(), so why is it needed? > > Because not all disks spin up without being told to do so and like it or > not spinning disks up on resume is the default behavior. As I wrote in > the other reply, it would be worthwhile to make it configurable. Not even after they receive a read command? Ugh. Greeting, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sata_sil: Greatly improve DMA support
On Sun, May 20, 2007 12:19, Tejun Heo wrote: > Indan Zupancic wrote: >> On Sun, May 20, 2007 00:03, Jeff Garzik wrote: >>> Indan Zupancic wrote: >>>> This patch seems to work with my SiI 3512, though I don't notice any >>>> difference, neither a speedup, nor a slowdown. Hdparm gives the same >>>> speeds (-tT), and cp -a'ing kernel sources is abysmal slow in both cases, >>>> (need to look into that one) so I didn't really test it that well. >>> >>> It won't result in much of a speedup, except in situations where IOMMU >>> or other situation that causes you to run into the 64k boundary being an >>> issue -- generally only on huge transfers. >>> >>> A good measure is to dd(1) to/from the block device, rather than using a >>> filesystem. As has been shown on LKML, the filesystem can really slow >>> things down in some cases. >> >> I didn't really expect a speedup, it's more that I've no regression to >> report. >> >> I could benchmark the patch more thoroughly, but right now I'm more worried >> about the crawling cp I just discovered. Talking about filesystems slowing >> down >> things... >> >> Test: >> >> $ cp -a linux-2.6/ /tmp/ >> >> done on the same ext3 partition. linux-2.6 contains source and git repo only, >> I'm compiling stuff with O=../obj. >> >> $ vmstat 10 >> procs ---memory-- ---swap-- -io -system-- cpu >> r b swpd free buff cache si sobibo in cs us sy id wa >> 0 1 0 4168 3316 19570000 739 494 530 393 15 3 66 16 >> 0 3 4 4120 2040 19819600 14677 14111 1247 435 0 17 0 83 >> 0 1 4 3588 1444 19969600 8892 9472 1362 438 0 12 0 88 >> 1 0 4 3772 4228 19601200 764 454 1161 345 0 4 0 96 >> 0 1 4 3548 6156 19308800 793 851 1158 340 0 4 0 96 >> 0 1 4 3852 7608 18909600 798 523 1160 474 1 4 0 95 >> 1 1 4 3612 8684 18604800 1244 864 1178 430 2 5 0 93 >> 0 1 4 90660 9308 9639600 853 906 1244 578 7 6 0 87 >> 0 1 4 72280 9816 11236800 830 854 1278 429 12 5 0 83 >> 1 0 4 52488 10296 13056000 935 861 1178 418 1 6 0 94 >> 0 1 4 30500 10788 14977600 977 858 1178 371 0 6 0 94 >> 0 1 4 9792 11244 16785600 918 1394 1182 350 1 5 0 94 >> 0 1 4 4016 11216 17250400 1017 858 1181 382 1 6 0 94 >> 0 1 4 3660 11484 17148400 966 861 1182 410 1 6 0 94 >> >> It never finished, as I had no patience to copy about 900 Mb with this rate. >> >> As it's a git tree, I suppose it's heavily fragmented, but this is still >> rather >> pathetic. Should I blame cp, or is something else wrong? Any ideas how >> to figure this one out would be appreciated. Sorry for the off-topicness. > > Do things improve if you change the io scheduler to deadline? > > # echo deadline > /sys/block/sda/queue/scheduler I also tried noop, anticipatory, and now deadline, but it doesn't matter. > Also worth looking at is the following bug entry. > > http://bugzilla.kernel.org/show_bug.cgi?id=7372 > > There seems to be weird interaction among the scheduler / VM / IO. The > exact cause is still not verified. :-( I know, I posted a bugreport too, but for starvation with the anticipatory scheduler. Anyway, that bug seems unrelated to my case, as they have system unresponsiveness or other nastiness, while I only have a crawling cp, which I blame on weaknesses within ext3, a badly designed cp program and very fragmented filesystem. I just need to verify that, somehow. I'll try with older kernels later. Greetings, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] libata: implement ata_wait_after_reset()
Hello Tejun, Thanks for your answers. On Sun, May 20, 2007 11:50, Tejun Heo wrote: > Indan Zupancic wrote: >>> 1. We dropped libata specific suspend/resume implementation in favor of >>> sd driven one. Unfortunately, the new implementation currently can't do >>> background spinup, so that's probably why you can feel the delay. We >>> need to figure out how to do background spinup with the new implementation. >> >> What are the advantages of that, except slower resume? ;-) > > The change was made primarily to make libata spin down disks properly on > power down and hibernate. I don't like the added delay either but it's > better than shortening the lifespan of harddisks. Ah, it fixes that "weird noise on shutdown" problem? Fair enough. > Making resuming > asynchronous is planned but we need more infrastructure to do that > properly. The previous implementation worked but it also might try to > spin up a lot of disks at once which can't be good for PSU. I'm not > sure yet how to do that properly with sd driving suspend/resume. Don't controllers support spread spin up of disks, to avoid the peak load? I think I saw something about that in the SiI 3512 spec I downloaded. So maybe something like a special spin up method which can be implemented by drivers, and if it isn't, the current start stop thing is used? >>> 2. To make reset finish in defined time, each reset try now has defined >>> deadlines. I was trying to be conservative so I chose 10sec for the >>> first try. >> >> Right. So to me it seems that failed, because the undefined reset time is >> just >> replaced with a fixed ad hoc sleep, which is longer than any undefined reset >> mechanism. So where did the advantage go? Better to go back to how it was, >> is my impression. Or make that deadline 10 seconds and after that give up, >> instead of the other way round. Or just move to async reset. > > Well, yeah, your case degraded for sure but a lot of hotplug or error > handling cases are now much better which often used to take more than 30 > secs in many cases. There are just a lot of cases and a lot of weird > devices. Aiming generally acceptable delay on most cases is of higher > priority than optimizing specific cases. What I meant is that the deadline isn't effective because if it can't be done within that time, it's just retried after 10 seconds or whatever, basically rendering the deadline useless. But now I understand that the retry is done in the background, and that it was just the sd_resume that caused everything to wait on it. > That said, the 10 sec delayed > retry you're seeing is primarily caused by incorrect interpretation of > 0xff status on sata_sil, so with that fixed, it shouldn't be too much of > a problem. True. >>> It's driven by timing table near the top of libata-eh.c, so >>> adjusting the table is easy and maybe we can be a bit more aggressive >>> with the first two tries. But if we solve #1, this shouldn't matter too >>> much. >> >> #2 seems totally unrelated to #1, as #1 is about spinup, and #2 about reset. >> Only relation is that #2 slows down #1 because #1 needs to wait on #2. > > Not that easy. Many drives don't respond to reset while they're > spinning up. Bugger. So it seems like a good idea to do the reset and spinup async together. >>>> Maybe a silly question, but why do we wait for the harddisk to show up? >>>> Maybe that >>>> makes a little bit of sense at bootup, but for resume from ram, where >>>> nothing is >>>> supposed to have changed, it seems a bit strange. Why not wait when >>>> something tries >>>> to access the harddisk or something? >>> I hope it's explained now. >> >> Almost. Explained is why we wait on the disk, but that's only the sd_resume >> part. >> It's still unclear why resume must wait till the reset is done. > > Port reset itself is asynchronous. The wait is solely from sd_resume. The whole resetting, or just the retry after 10 seconds? But it becomes clear now that you're right that the spinup problem is solved if sd_resume does background spin up. >>>> I wonder if sil_pci_device_resume() and sd_resume() know about each >>>> other... >>>> I'll disable sd_resume() and see how that goes. >>> Yeap, that should work. In most configurations, devices spin up >>> immediately after power is restored. sd_resume() just waits for the >>> device to be revalidated in such cases. >> >> And it kicks the disk into action. Why? Only thing it does is sending a >> START_STOP >> command. I assume that causes the dis
Re: [PATCH] libata: implement ata_wait_after_reset()
Hello Tejun, Thanks for your answers. On Sun, May 20, 2007 11:50, Tejun Heo wrote: Indan Zupancic wrote: 1. We dropped libata specific suspend/resume implementation in favor of sd driven one. Unfortunately, the new implementation currently can't do background spinup, so that's probably why you can feel the delay. We need to figure out how to do background spinup with the new implementation. What are the advantages of that, except slower resume? ;-) The change was made primarily to make libata spin down disks properly on power down and hibernate. I don't like the added delay either but it's better than shortening the lifespan of harddisks. Ah, it fixes that weird noise on shutdown problem? Fair enough. Making resuming asynchronous is planned but we need more infrastructure to do that properly. The previous implementation worked but it also might try to spin up a lot of disks at once which can't be good for PSU. I'm not sure yet how to do that properly with sd driving suspend/resume. Don't controllers support spread spin up of disks, to avoid the peak load? I think I saw something about that in the SiI 3512 spec I downloaded. So maybe something like a special spin up method which can be implemented by drivers, and if it isn't, the current start stop thing is used? 2. To make reset finish in defined time, each reset try now has defined deadlines. I was trying to be conservative so I chose 10sec for the first try. Right. So to me it seems that failed, because the undefined reset time is just replaced with a fixed ad hoc sleep, which is longer than any undefined reset mechanism. So where did the advantage go? Better to go back to how it was, is my impression. Or make that deadline 10 seconds and after that give up, instead of the other way round. Or just move to async reset. Well, yeah, your case degraded for sure but a lot of hotplug or error handling cases are now much better which often used to take more than 30 secs in many cases. There are just a lot of cases and a lot of weird devices. Aiming generally acceptable delay on most cases is of higher priority than optimizing specific cases. What I meant is that the deadline isn't effective because if it can't be done within that time, it's just retried after 10 seconds or whatever, basically rendering the deadline useless. But now I understand that the retry is done in the background, and that it was just the sd_resume that caused everything to wait on it. That said, the 10 sec delayed retry you're seeing is primarily caused by incorrect interpretation of 0xff status on sata_sil, so with that fixed, it shouldn't be too much of a problem. True. It's driven by timing table near the top of libata-eh.c, so adjusting the table is easy and maybe we can be a bit more aggressive with the first two tries. But if we solve #1, this shouldn't matter too much. #2 seems totally unrelated to #1, as #1 is about spinup, and #2 about reset. Only relation is that #2 slows down #1 because #1 needs to wait on #2. Not that easy. Many drives don't respond to reset while they're spinning up. Bugger. So it seems like a good idea to do the reset and spinup async together. Maybe a silly question, but why do we wait for the harddisk to show up? Maybe that makes a little bit of sense at bootup, but for resume from ram, where nothing is supposed to have changed, it seems a bit strange. Why not wait when something tries to access the harddisk or something? I hope it's explained now. Almost. Explained is why we wait on the disk, but that's only the sd_resume part. It's still unclear why resume must wait till the reset is done. Port reset itself is asynchronous. The wait is solely from sd_resume. The whole resetting, or just the retry after 10 seconds? But it becomes clear now that you're right that the spinup problem is solved if sd_resume does background spin up. I wonder if sil_pci_device_resume() and sd_resume() know about each other... I'll disable sd_resume() and see how that goes. Yeap, that should work. In most configurations, devices spin up immediately after power is restored. sd_resume() just waits for the device to be revalidated in such cases. And it kicks the disk into action. Why? Only thing it does is sending a START_STOP command. I assume that causes the disk to spin up. But for e.g. laptopmode I can imagine that's quite annoying. I think sd_resume() is totally unnecessary and can be removed, because it seems wrong to always force the harddisk to spin up, background spin up or not. Most ATA drives would spin up immediately when power is reapplied unless configured otherwise and you can configure whether sd_resume issues START_STOP or not by echoing to sysfs node /sys/class/scsi_disk/h:c:i:l/manage_start_stop. The drawback here is you won't get proper spindown if you turn it off. Thanks for the pointer, I'll fiddle with it. What do you mean
Re: [PATCH] sata_sil: Greatly improve DMA support
On Sun, May 20, 2007 12:19, Tejun Heo wrote: Indan Zupancic wrote: On Sun, May 20, 2007 00:03, Jeff Garzik wrote: Indan Zupancic wrote: This patch seems to work with my SiI 3512, though I don't notice any difference, neither a speedup, nor a slowdown. Hdparm gives the same speeds (-tT), and cp -a'ing kernel sources is abysmal slow in both cases, (need to look into that one) so I didn't really test it that well. It won't result in much of a speedup, except in situations where IOMMU or other situation that causes you to run into the 64k boundary being an issue -- generally only on huge transfers. A good measure is to dd(1) to/from the block device, rather than using a filesystem. As has been shown on LKML, the filesystem can really slow things down in some cases. I didn't really expect a speedup, it's more that I've no regression to report. I could benchmark the patch more thoroughly, but right now I'm more worried about the crawling cp I just discovered. Talking about filesystems slowing down things... Test: $ cp -a linux-2.6/ /tmp/ done on the same ext3 partition. linux-2.6 contains source and git repo only, I'm compiling stuff with O=../obj. $ vmstat 10 procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 0 1 0 4168 3316 19570000 739 494 530 393 15 3 66 16 0 3 4 4120 2040 19819600 14677 14111 1247 435 0 17 0 83 0 1 4 3588 1444 19969600 8892 9472 1362 438 0 12 0 88 1 0 4 3772 4228 19601200 764 454 1161 345 0 4 0 96 0 1 4 3548 6156 19308800 793 851 1158 340 0 4 0 96 0 1 4 3852 7608 18909600 798 523 1160 474 1 4 0 95 1 1 4 3612 8684 18604800 1244 864 1178 430 2 5 0 93 0 1 4 90660 9308 9639600 853 906 1244 578 7 6 0 87 0 1 4 72280 9816 11236800 830 854 1278 429 12 5 0 83 1 0 4 52488 10296 13056000 935 861 1178 418 1 6 0 94 0 1 4 30500 10788 14977600 977 858 1178 371 0 6 0 94 0 1 4 9792 11244 16785600 918 1394 1182 350 1 5 0 94 0 1 4 4016 11216 17250400 1017 858 1181 382 1 6 0 94 0 1 4 3660 11484 17148400 966 861 1182 410 1 6 0 94 It never finished, as I had no patience to copy about 900 Mb with this rate. As it's a git tree, I suppose it's heavily fragmented, but this is still rather pathetic. Should I blame cp, or is something else wrong? Any ideas how to figure this one out would be appreciated. Sorry for the off-topicness. Do things improve if you change the io scheduler to deadline? # echo deadline /sys/block/sda/queue/scheduler I also tried noop, anticipatory, and now deadline, but it doesn't matter. Also worth looking at is the following bug entry. http://bugzilla.kernel.org/show_bug.cgi?id=7372 There seems to be weird interaction among the scheduler / VM / IO. The exact cause is still not verified. :-( I know, I posted a bugreport too, but for starvation with the anticipatory scheduler. Anyway, that bug seems unrelated to my case, as they have system unresponsiveness or other nastiness, while I only have a crawling cp, which I blame on weaknesses within ext3, a badly designed cp program and very fragmented filesystem. I just need to verify that, somehow. I'll try with older kernels later. Greetings, Indan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sd_resume redundant? [was: [PATCH] libata: implement ata_wait_after_reset()]
On Sun, May 20, 2007 11:54, Tejun Heo wrote: Indan Zupancic wrote: On Sat, May 19, 2007 21:04, Tejun Heo wrote: Tejun Heo wrote: Yeah, if SCR registers are accessible, 0xff doesn't indicate the device isn't there, so the whole skip-0xff logic probably shouldn't apply in such cases, but we can also achieve pretty good result by just making the first reset tries a bit more aggressive. So, here's the patch. Paul, can you please test this patch without the previous patch? Indan, this should reduce the resume delay. Please test. But you'll still feel some added delay compared to 2.6.20 due to the mentioned suspend/resume change. This removed the COMRESET errors indeed, and with sd_resume() disabled everything is speedy again (2s or so. Still a desktop pc). I didn't try with sd_resume enabled. Can you try to measure with sd_resume in place? [2.173366] sd 0:0:0:0: [sda] Starting disk [2.475422] ata2: SATA link down (SStatus 0 SControl 310) [5.478403] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) [5.481928] ata1.00: ata_hpa_resize 1: sectors = 234441648, hpa_sectors = 234441648 [5.485904] ata1.00: ata_hpa_resize 1: sectors = 234441648, hpa_sectors = 234441648 [5.485913] ata1.00: configured for UDMA/100 [5.505109] sd 0:0:0:0: [sda] 234441648 512-byte hardware sectors (120034 MB) [5.505461] sd 0:0:0:0: [sda] Write Protect is off [5.505465] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 [5.505612] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA ... [6.157259] Restarting tasks ... done. And with echo 0 /sys/class/scsi_disk/0\:0\:0\:0/manage_start_stop: [2.476476] ata2: SATA link down (SStatus 0 SControl 310) ... [2.825479] Restarting tasks ... done. ... [5.022076] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) [5.025605] ata1.00: ata_hpa_resize 1: sectors = 234441648, hpa_sectors = 234441648 [5.028594] ata1.00: ata_hpa_resize 1: sectors = 234441648, hpa_sectors = 234441648 [5.028606] ata1.00: configured for UDMA/100 [5.028720] sd 0:0:0:0: [sda] 234441648 512-byte hardware sectors (120034 MB) [5.028767] sd 0:0:0:0: [sda] Write Protect is off [5.028773] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 [5.028831] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA So over all it takes half a second longer to detect the disk, but because everything waits on it, it takes more than three seconds longer to resume. Setting manage_start_stop to 0 fixes it and is good enough for me, I didn't notice anything bad yet because of the unmanaged stop. Implementing background spin up will fix it too. Everything seems to work fine without sd_resume(), so why is it needed? Because not all disks spin up without being told to do so and like it or not spinning disks up on resume is the default behavior. As I wrote in the other reply, it would be worthwhile to make it configurable. Not even after they receive a read command? Ugh. Greeting, Indan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] libata: implement ata_wait_after_reset()
Hello Tejun, On Sun, May 20, 2007 19:09, Tejun Heo wrote: Indan Zupancic wrote: Don't controllers support spread spin up of disks, to avoid the peak load? I think I saw something about that in the SiI 3512 spec I downloaded. So maybe something like a special spin up method which can be implemented by drivers, and if it isn't, the current start stop thing is used? What they support is allowing drivers to control spin up, and it needs support from the HDD (many seem to do these days) BIOS (so that it doesn't spin up the disks during POST) too. To do that properly, we need a repository which distributes spinup tokens such that we can do parallel, not thundering herd, spinups. More and more do I get the impression that with full hardware control for Linux, from BIOS to firmware, great things could be done. Alas. The deadline table sets limits to the maximum time a try can take when the next attempt is made. So, the initial deadline of 10 secs means that the first try is allowed to take upto 10secs and no matter when or why it fails, the next attempt is made after 10sec is passed. I would love to bang devices harder but a lot of creepy device are out there and I'm pretty sure some of them would crap themselves if pushed too hard. :-( Well, if it's going to be retried later on, then it's not really a deadline, is it? Perhaps it's better to give up after the deadline, and only try again after an interrupt is received from the controller? Anyways, in your case, the problem was that sata_sil wasted the first try for no good reason. I'm not determined whether we need more aggressive deadlines for early tries for cases like this. It's something to think about. The annoying part is that all this seems to depend on the harddisk, and not really on the controller (or the combination of the two). Bugger. So it seems like a good idea to do the reset and spinup async together. There are a lot of cases. Depending on various configuration and whims, some drives spin up when power is reapplied and don't respond to reset till it's ready. Some drives don't spin up when power is reapplied but spin up when PHY is woken up and don't respond to reset till it's ready. Yet others just sit quiet and respond to reset nicely but don't spin up till told to do so. So, well, I agree that the best strategy is doing it all asynchronously. Another one would be to do it interrupt driven instead of doing polling, if the controller supports it well. The whole resetting, or just the retry after 10 seconds? But it becomes clear now that you're right that the spinup problem is solved if sd_resume does background spin up. The whole resetting. ATA controller/port resume itself is completely asynchronous. The resume method just kicks EH thread and tells it to wake the thing up and returns. The EH thread asynchronously resets the port, revalidates all the devices and see if any new ones are attached, etc. The culprit is that while EH thread is active all commands are deferred, so the START_STOP command sd_resume() issues waits in the SCSI midlayer queue till EH is complete. Are you saying that right now there is only one EH thread? In that case an improvent would be to have one EH thread per controller/device, so that at least independent scsi devices can't block each other. That's why doing it asynchronously is tricky. I'm not sure whether we would be able to do it without separating libata from SCSI so that libata has its own suspend/resume functions. :-( Well, it seems you just moved the other direction, with leaning on sd_resume. Thanks for the pointer, I'll fiddle with it. What do you mean with proper here? Not at all, or in a forced way because power goes away? Well, basically, your drive isn't told that the power is gonna go off. Power just goes off and your drive has to do emergency head unload which sometimes causes the funny noise. Hm, I suspected something like that. Though the shutdown sound of my HD seems to sound the same as usual. Perhaps the BIOS gives it a sign as part of the suspend? (Or to the controller, which delegates it.) Perhaps a silly question, but how can you ever set manage_start_stop to 0 before the disk is started at bootup? You can change it before doing suspend, but not at bootup. So is this option at the right place? The problem is sorta complicated because it involves all devices which use SCSI midlayer. libata, USB, all the regular SCSI disks, whatnot, so it's really hard to sell to linux-scsi people if you choose a default behavior which is specific to libata devices. What can be done is adding a configurable option which doesn't change the default behavior and configure it in the low level driver. This is how manage_start_stop is configured. The default value is zero but libata sets it to one while registering each device, so the default value for all libata devices is 1 (this should answer your