[PATCH 0/9] Add blockconsole version 1.1 (try 2)
Blockconsole is a console driver very roughly similar to netconsole. Instead of sending messages out via UDP, they are written to a block device. Typically a USB stick is chosen, although in principle any block device will do. In most cases blockconsole is useful where netconsole is not, i.e. single machines without network access or without an accessable netconsole capture server. When using both blockconsole and netconsole, I have found netconsole to sometimes create a mess under high message load (sysrq-t, etc.) while blockconsole does not. Most importantly, a number of bugs were identified and fixed that would have been unexplained machine reboots without blockconsole. More highlights: * reasonably small and self-contained code, * some 100+ machine years of runtime, * nice tutorial with a 30-sec guide for the impatient. Special thanks to Borislav Petkov for many improvements and kicking my behind to provide a proper git tree and resend patches. A number of cleanup patches could be folded into the main patch, but I decided not to mess with git history and leave any further mistakes for the world to laugh at: git://git.kernel.org/pub/scm/linux/kernel/git/joern/bcon2.git Joern Engel (8): do_mounts: constify name_to_dev_t parameter add blockconsole version 1.1 printk: add CON_ALLDATA console flag netconsole: use CON_ALLDATA blockconsole: use CON_ALLDATA bcon: add a release work struct bcon: check for hdparm in bcon_tail bcon: remove version 1.0 support Takashi Iwai (1): blockconsole: Allow to pass a device file path to bcon_tail Documentation/block/blockconsole.txt| 94 Documentation/block/blockconsole/bcon_tail | 82 +++ Documentation/block/blockconsole/mkblockconsole | 29 ++ block/partitions/Makefile |1 + block/partitions/blockconsole.c | 22 + block/partitions/check.c|3 + block/partitions/check.h|3 + drivers/block/Kconfig |6 + drivers/block/Makefile |1 + drivers/block/blockconsole.c| 617 +++ drivers/net/netconsole.c|2 +- include/linux/blockconsole.h|7 + include/linux/console.h |1 + include/linux/mount.h |2 +- init/do_mounts.c|2 +- kernel/printk.c |5 +- 16 files changed, 872 insertions(+), 5 deletions(-) create mode 100644 Documentation/block/blockconsole.txt create mode 100755 Documentation/block/blockconsole/bcon_tail create mode 100755 Documentation/block/blockconsole/mkblockconsole create mode 100644 block/partitions/blockconsole.c create mode 100644 drivers/block/blockconsole.c create mode 100644 include/linux/blockconsole.h -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/9] do_mounts: constify name_to_dev_t parameter
Signed-off-by: Joern Engel --- include/linux/mount.h |2 +- init/do_mounts.c |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/mount.h b/include/linux/mount.h index d7029f4..6b5fa77 100644 --- a/include/linux/mount.h +++ b/include/linux/mount.h @@ -74,6 +74,6 @@ extern struct vfsmount *vfs_kern_mount(struct file_system_type *type, extern void mnt_set_expiry(struct vfsmount *mnt, struct list_head *expiry_list); extern void mark_mounts_for_expiry(struct list_head *mounts); -extern dev_t name_to_dev_t(char *name); +extern dev_t name_to_dev_t(const char *name); #endif /* _LINUX_MOUNT_H */ diff --git a/init/do_mounts.c b/init/do_mounts.c index 1d1b634..da96f85 100644 --- a/init/do_mounts.c +++ b/init/do_mounts.c @@ -202,7 +202,7 @@ done: * bangs. */ -dev_t name_to_dev_t(char *name) +dev_t name_to_dev_t(const char *name) { char s[32]; char *p; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/9] printk: add CON_ALLDATA console flag
For consoles like netconsole and blockconsole the loglevel filtering really doesn't make any sense. If a line gets printed at all, please send it down to that console, no questions asked. For vga_con, it is a completely different matter, as the user sitting in front of his console could get spammed by messages while trying to login or similar. So ignore_loglevel doesn't work as a one-size-fits-all approach. Add a per-console flag instead so that netconsole and blockconsole can opt-in. Signed-off-by: Joern Engel --- include/linux/console.h |1 + kernel/printk.c |5 +++-- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/include/linux/console.h b/include/linux/console.h index dedb082..eed92ad 100644 --- a/include/linux/console.h +++ b/include/linux/console.h @@ -116,6 +116,7 @@ static inline int con_debug_leave(void) #define CON_BOOT (8) #define CON_ANYTIME(16) /* Safe to call when cpu is offline */ #define CON_BRL(32) /* Used for a braille device */ +#define CON_ALLDATA(64) /* per-console ignore_loglevel */ struct console { charname[16]; diff --git a/kernel/printk.c b/kernel/printk.c index 267ce78..5221c59 100644 --- a/kernel/printk.c +++ b/kernel/printk.c @@ -1261,8 +1261,6 @@ static void call_console_drivers(int level, const char *text, size_t len) trace_console(text, 0, len, len); - if (level >= console_loglevel && !ignore_loglevel) - return; if (!console_drivers) return; @@ -1276,6 +1274,9 @@ static void call_console_drivers(int level, const char *text, size_t len) if (!cpu_online(smp_processor_id()) && !(con->flags & CON_ANYTIME)) continue; + if (level >= console_loglevel && !ignore_loglevel && + !(con->flags & CON_ALLDATA)) + continue; con->write(con, text, len); } } -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/9] add blockconsole version 1.1
Console driver similar to netconsole, except it writes to a block device. Can be useful in a setup where netconsole, for whatever reasons, is impractical. Changes since version 1.0: - Header format overhaul, addressing several annoyances when actually using blockconsole for production. - Steve Hodgson added a panic notifier. - Added improvements and cleanups from Borislav Petkov. Signed-off-by: Steve Hodgson Signed-off-by: Borislav Petkov Signed-off-by: Joern Engel --- Documentation/block/blockconsole.txt| 94 Documentation/block/blockconsole/bcon_tail | 62 +++ Documentation/block/blockconsole/mkblockconsole | 29 ++ block/partitions/Makefile |1 + block/partitions/blockconsole.c | 22 + block/partitions/check.c|3 + block/partitions/check.h|3 + drivers/block/Kconfig |6 + drivers/block/Makefile |1 + drivers/block/blockconsole.c| 621 +++ include/linux/blockconsole.h|7 + 11 files changed, 849 insertions(+) create mode 100644 Documentation/block/blockconsole.txt create mode 100755 Documentation/block/blockconsole/bcon_tail create mode 100755 Documentation/block/blockconsole/mkblockconsole create mode 100644 block/partitions/blockconsole.c create mode 100644 drivers/block/blockconsole.c create mode 100644 include/linux/blockconsole.h diff --git a/Documentation/block/blockconsole.txt b/Documentation/block/blockconsole.txt new file mode 100644 index 000..2b45516 --- /dev/null +++ b/Documentation/block/blockconsole.txt @@ -0,0 +1,94 @@ +started by Jörn Engel 2012.03.17 + +Blocksonsole for the impatient +== + +1. Find an unused USB stick and prepare it for blockconsole by writing + the blockconsole signature to it: + $ ./mkblockconsole /dev/ + +2. USB stick is ready for use, replug it so that the kernel can start + logging to it. + +3. After you've done logging, read out the logs from it like this: + $ ./bcon_tail + + This creates a file called /var/log/bcon. which contains the + last 16M of the logs. Open it with a sane editor like vim which + can display zeroed gaps as a single line and start staring at the + logs. + For the really impatient, use: + $ vi `./bcon_tail` + +Introduction: += + +This module logs kernel printk messages to block devices, e.g. usb +sticks. It allows after-the-fact debugging when the main +disk/filesystem fails and serial consoles and netconsole are +impractical. + +It can currently only be used built-in. Blockconsole hooks into the +partition scanning code and will bring up configured block devices as +soon as possible. While this doesn't allow capture of early kernel +panics, it does capture most of the boot process. + +Block device configuration: +== + +Blockconsole has no configuration parameter. In order to use a block +device for logging, the blockconsole header has to be written to the +device in question. Logging to partitions is not supported. + +The example program mkblockconsole can be used to generate such a +header on a device. + +Header format: +== + +A legal header looks like this: + +Linux blockconsole version 1.1 +818cf322 + + + +It consists of a newline, the "Linux blockconsole version 1.1" string +plus three numbers on separate lines each. Numbers are all 32bit, +represented as 8-byte hex strings, with letters in lowercase. The +first number is a uuid for this particular console device. Just pick +a random number when generating the device. The second number is a +wrap counter and unlikely to ever increment. The third is a tile +counter, with a tile being one megabyte in size. + +Miscellaneous notes: + + +Blockconsole will write a new header for every tile or once every +megabyte. The header starts with a newline in order to ensure the +"Linux blockconsole...' string always ends up at the beginning of a +line if you read the blockconsole in a text editor. + +The blockconsole header is constructed such that opening the log +device in a text editor, ignoring memory constraints due to large +devices, should just work and be reasonably non-confusing to readers. +However, the example program bcon_tail can be used to copy the last 16 +tiles of the log device to /var/log/bcon., which should be much +easier to handle. + +The wrap counter is used by blockconsole to determine where to +continue logging after a reboot. New logs will be written to the +first tile that wasn't written to by the last instance of +blockconsole. Similarly bcon_tail is doing a binary search to find +the end of the log. + +Writing to the log device is strictly circular. This should give +optimal performance and reliability on cheap devices, l
[PATCH 6/9] bcon: add a release work struct
The final bcon_put() can be called from atomic context, by way of bio_endio(). In that case we would sleep in invalidate_mapping_pages(), with the usual unhappy results. In nearly a year of production use, I have only seen a matching backtrace once. There was a second known issue that could be reproduced by "yes h > /proc/sysrq-trigger" and concurrently pulling and replugging the blockconsole device. It took be somewhere around 30 pulls and sore thumbs to reproduce and I never found the time to get to the bottom of it. Quite likely the two issues are identical. Signed-off-by: Joern Engel --- drivers/block/blockconsole.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/block/blockconsole.c b/drivers/block/blockconsole.c index 32f6c62..b4730f8 100644 --- a/drivers/block/blockconsole.c +++ b/drivers/block/blockconsole.c @@ -65,6 +65,7 @@ struct blockconsole { struct block_device *bdev; struct console console; struct work_struct unregister_work; + struct work_struct release_work; struct task_struct *writeback_thread; struct notifier_block panic_block; }; @@ -74,9 +75,10 @@ static void bcon_get(struct blockconsole *bc) kref_get(&bc->kref); } -static void bcon_release(struct kref *kref) +static void __bcon_release(struct work_struct *work) { - struct blockconsole *bc = container_of(kref, struct blockconsole, kref); + struct blockconsole *bc = container_of(work, struct blockconsole, + release_work); __free_pages(bc->zero_page, 0); __free_pages(bc->pages, 8); @@ -85,6 +87,14 @@ static void bcon_release(struct kref *kref) kfree(bc); } +static void bcon_release(struct kref *kref) +{ + struct blockconsole *bc = container_of(kref, struct blockconsole, kref); + + /* bcon_release can be called from atomic context */ + schedule_work(&bc->release_work); +} + static void bcon_put(struct blockconsole *bc) { kref_put(&bc->kref, bcon_release); @@ -512,6 +522,7 @@ static int bcon_create(const char *devname) if (IS_ERR(bc->writeback_thread)) goto out2; INIT_WORK(&bc->unregister_work, bcon_unregister); + INIT_WORK(&bc->release_work, __bcon_release); register_console(&bc->console); bc->panic_block.notifier_call = blockconsole_panic; bc->panic_block.priority = INT_MAX; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 9/9] bcon: remove version 1.0 support
Very few machines ever ran with 1.0 format and by now I doubt whether a single one still does. No need to carry that code along. Signed-off-by: Joern Engel --- drivers/block/blockconsole.c | 19 ++- 1 file changed, 2 insertions(+), 17 deletions(-) diff --git a/drivers/block/blockconsole.c b/drivers/block/blockconsole.c index b4730f8..e88b8ee 100644 --- a/drivers/block/blockconsole.c +++ b/drivers/block/blockconsole.c @@ -18,7 +18,6 @@ #include #include -#define BLOCKCONSOLE_MAGIC_OLD "\nLinux blockconsole version 1.0\n" #define BLOCKCONSOLE_MAGIC "\nLinux blockconsole version 1.1\n" #define BCON_UUID_OFS (32) #define BCON_ROUND_OFS (41) @@ -234,18 +233,6 @@ static void bcon_advance_write_bytes(struct blockconsole *bc, int bytes) } } -static int bcon_convert_old_format(struct blockconsole *bc) -{ - bc->uuid = get_random_int(); - bc->round = 0; - bc->console_bytes = bc->write_bytes = 0; - bcon_advance_console_bytes(bc, 0); /* To skip the header */ - bcon_advance_write_bytes(bc, 0); /* To wrap around, if necessary */ - bcon_erase_segment(bc); - pr_info("converted %s from old format\n", bc->devname); - return 0; -} - static int bcon_find_end_of_log(struct blockconsole *bc) { u64 start = 0, end = bc->max_bytes, middle; @@ -258,8 +245,8 @@ static int bcon_find_end_of_log(struct blockconsole *bc) return err; /* Second sanity check, out of sheer paranoia */ version = bcon_magic_present(sec0); - if (version == 10) - return bcon_convert_old_format(bc); + if (!version) + return -EINVAL; bc->uuid = simple_strtoull(sec0 + BCON_UUID_OFS, NULL, 16); bc->round = simple_strtoull(sec0 + BCON_ROUND_OFS, NULL, 16); @@ -618,8 +605,6 @@ int bcon_magic_present(const void *data) { size_t len = strlen(BLOCKCONSOLE_MAGIC); - if (!memcmp(data, BLOCKCONSOLE_MAGIC_OLD, len)) - return 10; if (memcmp(data, BLOCKCONSOLE_MAGIC, len)) return 0; if (!is_four_byte_hex(data + BCON_UUID_OFS)) -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 8/9] blockconsole: Allow to pass a device file path to bcon_tail
From: Takashi Iwai ... instead of always looking through all devices. Minor tweak: Moved the "CANDIDATES=" line below Takashi's new code. Signed-off-by: Takashi Iwai Signed-off-by: Joern Engel --- Documentation/block/blockconsole/bcon_tail | 15 +++ 1 file changed, 15 insertions(+) diff --git a/Documentation/block/blockconsole/bcon_tail b/Documentation/block/blockconsole/bcon_tail index eb3524b..70926c6 100755 --- a/Documentation/block/blockconsole/bcon_tail +++ b/Documentation/block/blockconsole/bcon_tail @@ -58,6 +58,21 @@ end_of_log() { # HEADER contains a newline, so the funny quoting is necessary HEADER=' Linux blockconsole version 1.1' + +DEV="$1" +if [ -n "$DEV" ]; then + if [ ! -b "$DEV" ]; then + echo "bcon_tail: No block device file $DEV" + exit 1 + fi + if [ "`head -c32 $DEV`" != "$HEADER" ]; then + echo "bcon_tail: Invalid device file $DEV" + exit 1 + fi + end_of_log $DEV + exit 0 +fi + CANDIDATES=`lsscsi |sed 's|.*/dev|/dev|'` for DEV in $CANDIDATES; do -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/9] blockconsole: use CON_ALLDATA
Blockconsole should really see every message ever printed. The alternative is to try debugging with information like this: [166135.633974] Stack: [166135.634016] Call Trace: [166135.634029] [166135.634156] [166135.634177] Code: 00 00 55 48 89 e5 0f 1f 44 00 00 ff 15 31 49 80 00 c9 c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 [166135.634384] 48 8b 14 25 98 24 01 00 48 8d 14 92 48 8d 04 bd 00 00 00 00 Signed-off-by: Joern Engel --- drivers/block/blockconsole.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/block/blockconsole.c b/drivers/block/blockconsole.c index 7f8ac5b..32f6c62 100644 --- a/drivers/block/blockconsole.c +++ b/drivers/block/blockconsole.c @@ -481,7 +481,7 @@ static int bcon_create(const char *devname) strlcpy(bc->devname, devname, sizeof(bc->devname)); spin_lock_init(&bc->end_io_lock); strcpy(bc->console.name, "bcon"); - bc->console.flags = CON_PRINTBUFFER | CON_ENABLED; + bc->console.flags = CON_PRINTBUFFER | CON_ENABLED | CON_ALLDATA; bc->console.write = bcon_write; bc->bdev = blkdev_get_by_path(devname, mode, NULL); #ifndef MODULE -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/9] netconsole: use CON_ALLDATA
Netconsole should really see every message ever printed. The alternative is to try debugging with information like this: [166135.633974] Stack: [166135.634016] Call Trace: [166135.634029] [166135.634156] [166135.634177] Code: 00 00 55 48 89 e5 0f 1f 44 00 00 ff 15 31 49 80 00 c9 c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 [166135.634384] 48 8b 14 25 98 24 01 00 48 8d 14 92 48 8d 04 bd 00 00 00 00 Signed-off-by: Joern Engel --- drivers/net/netconsole.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c index 6989ebe..77783fe 100644 --- a/drivers/net/netconsole.c +++ b/drivers/net/netconsole.c @@ -718,7 +718,7 @@ static void write_msg(struct console *con, const char *msg, unsigned int len) static struct console netconsole = { .name = "netcon", - .flags = CON_ENABLED, + .flags = CON_ENABLED | CON_ALLDATA, .write = write_msg, }; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 7/9] bcon: check for hdparm in bcon_tail
From: Borislav Petkov Signed-off-by: Joern Engel --- Documentation/block/blockconsole/bcon_tail |5 + 1 file changed, 5 insertions(+) diff --git a/Documentation/block/blockconsole/bcon_tail b/Documentation/block/blockconsole/bcon_tail index b4bd660..eb3524b 100755 --- a/Documentation/block/blockconsole/bcon_tail +++ b/Documentation/block/blockconsole/bcon_tail @@ -14,6 +14,11 @@ if [ -z "$(which lsscsi)" ]; then exit 1 fi +if [ -z "$(which hdparm)" ]; then + echo "You need to install the hdparm package on your distro." + exit 1 +fi + end_of_log() { DEV=$1 UUID=`head -c40 $DEV|tail -c8` -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/2] hugetlb fixes
As everyone knows, hugetlbfs sucks. But it is also necessary for large memory machines, so we should make it suck less. Top of my list were lack of rss accounting and refusing mmap with MAP_HUGETLB when using hugetlbfs. The latter generally created a know in every brain I explained this to. Test program below is failing before these two patches and passing after. Joern Engel (2): hugetlb: properly account rss mmap: allow MAP_HUGETLB for hugetlbfs files mm/hugetlb.c |4 mm/mmap.c| 12 ++-- 2 files changed, 14 insertions(+), 2 deletions(-) -- 1.7.10.4 #define _GNU_SOURCE #include #include #include #include #include #include #include #include #include typedef unsigned long long u64; static size_t length = 1 << 24; static u64 read_rss(void) { char buf[4096], *s = buf; int i, fd; u64 rss; fd = open("/proc/self/statm", O_RDONLY); assert(fd > 2); memset(buf, 0, sizeof(buf)); read(fd, buf, sizeof(buf) - 1); for (i = 0; i < 1; i++) s = strchr(s, ' ') + 1; rss = strtoull(s, NULL, 10); return rss << 12; /* assumes 4k pagesize */ } static void do_mmap(int fd, int extra_flags, int unmap) { int *p; int flags = MAP_PRIVATE | MAP_POPULATE | extra_flags; u64 before, after; before = read_rss(); p = mmap(NULL, length, PROT_READ | PROT_WRITE, flags, fd, 0); assert(p != MAP_FAILED || !"mmap returned an unexpected error"); after = read_rss(); assert(llabs(after - before - length) < 0x4 || !"rss didn't grow as expected"); if (!unmap) return; munmap(p, length); after = read_rss(); assert(llabs(after - before) < 0x4 || !"rss didn't shrink as expected"); } static int open_file(const char *path) { int fd, err; unlink(path); fd = open(path, O_CREAT | O_RDWR | O_TRUNC | O_EXCL | O_LARGEFILE | O_CLOEXEC, 0600); assert(fd > 2); unlink(path); err = ftruncate(fd, length); assert(!err); return fd; } int main(void) { int hugefd, fd; fd = open_file("/dev/shm/hugetlbhog"); hugefd = open_file("/hugepages/hugetlbhog"); system("echo 100 > /proc/sys/vm/nr_hugepages"); do_mmap(-1, MAP_ANONYMOUS, 1); do_mmap(fd, 0, 1); do_mmap(-1, MAP_ANONYMOUS | MAP_HUGETLB, 1); do_mmap(hugefd, 0, 1); do_mmap(hugefd, MAP_HUGETLB, 1); /* Leak the last one to test do_exit() */ do_mmap(-1, MAP_ANONYMOUS | MAP_HUGETLB, 0); printf("oll korrekt.\n"); return 0; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] hugetlb: properly account rss
When moving a program from mmap'ing small pages to mmap'ing huge pages, a remarkable drop in rss ensues. For some reason hugepages were never accounted for in rss, which in my book is a clear bug. Sadly this bug has been present in hugetlbfs since it was merged back in 2002. There is every chance existing programs depend on hugepages not being counted as rss. I think the correct solution is to fix the bug and wait for someone to complain. It is just as likely that noone cares - as evidenced by the fact that noone seems to have noticed for ten years. Signed-off-by: Joern Engel --- mm/hugetlb.c |4 1 file changed, 4 insertions(+) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 1a12f5b..705036c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1174,6 +1174,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, set_page_private(page, (unsigned long)spool); vma_commit_reservation(h, vma, addr); + add_mm_counter(vma->vm_mm, MM_ANONPAGES, pages_per_huge_page(h)); return page; } @@ -2406,6 +2407,9 @@ again: if (pte_dirty(pte)) set_page_dirty(page); + /* -pages_per_huge_page(h) wouldn't get sign-extended */ + add_mm_counter(vma->vm_mm, MM_ANONPAGES, -1 << h->order); + page_remove_rmap(page); force_flush = !__tlb_remove_page(tlb, page); if (force_flush) -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] mmap: allow MAP_HUGETLB for hugetlbfs files
It is counterintuitive at best that mmap'ing a hugetlbfs file with MAP_HUGETLB fails, while mmap'ing it without will a) succeed and b) return huge pages. Signed-off-by: Joern Engel --- mm/mmap.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/mm/mmap.c b/mm/mmap.c index 2a594246..76eb6df 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -33,6 +33,7 @@ #include #include #include +#include #include #include @@ -1313,6 +1314,11 @@ unsigned long do_mmap_pgoff(struct file *file, unsigned long addr, return addr; } +static inline int is_hugetlb_file(struct file *file) +{ + return file->f_inode->i_sb->s_magic == HUGETLBFS_MAGIC; +} + SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len, unsigned long, prot, unsigned long, flags, unsigned long, fd, unsigned long, pgoff) @@ -1322,11 +1328,12 @@ SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len, if (!(flags & MAP_ANONYMOUS)) { audit_mmap_fd(fd, flags); - if (unlikely(flags & MAP_HUGETLB)) - return -EINVAL; file = fget(fd); if (!file) goto out; + retval = -EINVAL; + if (unlikely(flags & MAP_HUGETLB && !is_hugetlb_file(file))) + goto out_fput; } else if (flags & MAP_HUGETLB) { struct user_struct *user = NULL; /* @@ -1346,6 +1353,7 @@ SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len, flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE); retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff); +out_fput: if (file) fput(file); out: -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/3] self-test: fix make clean
thuge-gen was forgotten. Fix it by removing the duplication, so we don't get too many repeats. Signed-off-by: Joern Engel --- tools/testing/selftests/vm/Makefile |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index 7d47927..cb3f5f2 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -2,8 +2,9 @@ CC = $(CROSS_COMPILE)gcc CFLAGS = -Wall +BINARIES = hugepage-mmap hugepage-shm map_hugetlb thuge-gen -all: hugepage-mmap hugepage-shm map_hugetlb thuge-gen +all: $(BINARIES) %: %.c $(CC) $(CFLAGS) -o $@ $^ @@ -11,4 +12,4 @@ run_tests: all @/bin/sh ./run_vmtests || (echo "vmtests: [FAIL]"; exit 1) clean: - $(RM) hugepage-mmap hugepage-shm map_hugetlb + $(RM) $(BINARIES) -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3] selftests: exit 1 on failure
In case this ever gets scripted, it should return 0 on success and 1 on failure. Parsing the output should be left to meatbags. Signed-off-by: Joern Engel --- tools/testing/selftests/vm/Makefile|2 +- tools/testing/selftests/vm/run_vmtests |5 + 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index 436d2e8..7d47927 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -8,7 +8,7 @@ all: hugepage-mmap hugepage-shm map_hugetlb thuge-gen $(CC) $(CFLAGS) -o $@ $^ run_tests: all - @/bin/sh ./run_vmtests || echo "vmtests: [FAIL]" + @/bin/sh ./run_vmtests || (echo "vmtests: [FAIL]"; exit 1) clean: $(RM) hugepage-mmap hugepage-shm map_hugetlb diff --git a/tools/testing/selftests/vm/run_vmtests b/tools/testing/selftests/vm/run_vmtests index 4c53cae..7a9072d 100644 --- a/tools/testing/selftests/vm/run_vmtests +++ b/tools/testing/selftests/vm/run_vmtests @@ -4,6 +4,7 @@ #we need 256M, below is the size in kB needmem=262144 mnt=./huge +exitcode=0 #get pagesize and freepages from /proc/meminfo while read name size unit; do @@ -41,6 +42,7 @@ echo "" ./hugepage-mmap if [ $? -ne 0 ]; then echo "[FAIL]" + exitcode=1 else echo "[PASS]" fi @@ -55,6 +57,7 @@ echo "" ./hugepage-shm if [ $? -ne 0 ]; then echo "[FAIL]" + exitcode=1 else echo "[PASS]" fi @@ -67,6 +70,7 @@ echo "" ./map_hugetlb if [ $? -ne 0 ]; then echo "[FAIL]" + exitcode=1 else echo "[PASS]" fi @@ -75,3 +79,4 @@ fi umount $mnt rm -rf $mnt echo $nr_hugepgs > /proc/sys/vm/nr_hugepages +exit $exitcode -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/3] selftests: add hugetlbfstest
As the confusing naming indicates, this test has some overlap with pre-existing tests. Would be nice to merge them eventually. But since it is only test code, cleanliness is much less important than mere existence. Signed-off-by: Joern Engel --- tools/testing/selftests/vm/Makefile|2 +- tools/testing/selftests/vm/hugetlbfstest.c | 84 tools/testing/selftests/vm/run_vmtests | 11 3 files changed, 96 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/vm/hugetlbfstest.c diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index cb3f5f2..3f94e1a 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -2,7 +2,7 @@ CC = $(CROSS_COMPILE)gcc CFLAGS = -Wall -BINARIES = hugepage-mmap hugepage-shm map_hugetlb thuge-gen +BINARIES = hugepage-mmap hugepage-shm map_hugetlb thuge-gen hugetlbfstest all: $(BINARIES) %: %.c diff --git a/tools/testing/selftests/vm/hugetlbfstest.c b/tools/testing/selftests/vm/hugetlbfstest.c new file mode 100644 index 000..ea40ff8 --- /dev/null +++ b/tools/testing/selftests/vm/hugetlbfstest.c @@ -0,0 +1,84 @@ +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include + +typedef unsigned long long u64; + +static size_t length = 1 << 24; + +static u64 read_rss(void) +{ + char buf[4096], *s = buf; + int i, fd; + u64 rss; + + fd = open("/proc/self/statm", O_RDONLY); + assert(fd > 2); + memset(buf, 0, sizeof(buf)); + read(fd, buf, sizeof(buf) - 1); + for (i = 0; i < 1; i++) + s = strchr(s, ' ') + 1; + rss = strtoull(s, NULL, 10); + return rss << 12; /* assumes 4k pagesize */ +} + +static void do_mmap(int fd, int extra_flags, int unmap) +{ + int *p; + int flags = MAP_PRIVATE | MAP_POPULATE | extra_flags; + u64 before, after; + + before = read_rss(); + p = mmap(NULL, length, PROT_READ | PROT_WRITE, flags, fd, 0); + assert(p != MAP_FAILED || + !"mmap returned an unexpected error"); + after = read_rss(); + assert(llabs(after - before - length) < 0x4 || + !"rss didn't grow as expected"); + if (!unmap) + return; + munmap(p, length); + after = read_rss(); + assert(llabs(after - before) < 0x4 || + !"rss didn't shrink as expected"); +} + +static int open_file(const char *path) +{ + int fd, err; + + unlink(path); + fd = open(path, O_CREAT | O_RDWR | O_TRUNC | O_EXCL + | O_LARGEFILE | O_CLOEXEC, 0600); + assert(fd > 2); + unlink(path); + err = ftruncate(fd, length); + assert(!err); + return fd; +} + +int main(void) +{ + int hugefd, fd; + + fd = open_file("/dev/shm/hugetlbhog"); + hugefd = open_file("/hugepages/hugetlbhog"); + + system("echo 100 > /proc/sys/vm/nr_hugepages"); + do_mmap(-1, MAP_ANONYMOUS, 1); + do_mmap(fd, 0, 1); + do_mmap(-1, MAP_ANONYMOUS | MAP_HUGETLB, 1); + do_mmap(hugefd, 0, 1); + do_mmap(hugefd, MAP_HUGETLB, 1); + /* Leak the last one to test do_exit() */ + do_mmap(-1, MAP_ANONYMOUS | MAP_HUGETLB, 0); + printf("oll korrekt.\n"); + return 0; +} diff --git a/tools/testing/selftests/vm/run_vmtests b/tools/testing/selftests/vm/run_vmtests index 7a9072d..c87b681 100644 --- a/tools/testing/selftests/vm/run_vmtests +++ b/tools/testing/selftests/vm/run_vmtests @@ -75,6 +75,17 @@ else echo "[PASS]" fi +echo "" +echo "running hugetlbfstest" +echo "" +./hugetlbfstest +if [ $? -ne 0 ]; then + echo "[FAIL]" + exitcode=1 +else + echo "[PASS]" +fi + #cleanup umount $mnt rm -rf $mnt -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/3] Improve selftests
First two are cleanups, third adds hugetlbfstest. This test fails on current kernels, but I have previously sent a patchset to fix the two failures. Joern Engel (3): selftests: exit 1 on failure self-test: fix make clean selftests: add hugetlbfstest tools/testing/selftests/vm/Makefile|7 ++- tools/testing/selftests/vm/hugetlbfstest.c | 84 tools/testing/selftests/vm/run_vmtests | 16 ++ 3 files changed, 104 insertions(+), 3 deletions(-) create mode 100644 tools/testing/selftests/vm/hugetlbfstest.c -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] softirq: weaken warning in local_bh_enable_ip()
This reliably triggers with the following backtrace: local_bh_enable_ip+0x128/0x140 _raw_spin_unlock_bh+0x15/0x20 iscsit_inc_conn_usage_count+0x37/0x50 [iscsi_target_mod] iscsit_stop_session+0x1db/0x280 [iscsi_target_mod] lio_tpg_shutdown_session+0xb2/0xf0 [iscsi_target_mod] core_tpg_set_initiator_node_queue_depth+0x119/0x2f0 [target_core_mod] iscsit_tpg_set_initiator_node_queue_depth+0x12/0x20 [iscsi_target_mod] lio_target_nacl_store_cmdsn_depth+0x110/0x1e0 [iscsi_target_mod] target_fabric_nacl_base_attr_store+0x39/0x40 [target_core_mod] configfs_write_file+0xbd/0x120 vfs_write+0xc6/0x180 sys_write+0x51/0x90 system_call_fastpath+0x16/0x1b core_tpg_set_initiator_node_queue_depth() calls lio_tpg_shutdown_session() inside a spin_lock_irqsave-protected block. Calling spin_unlock_bh later in the call chain always triggers the warning. Signed-off-by: Joern Engel Cc: Johannes Berg Cc: Michael Buesch Cc: David Ellingsworth Cc: Linus Torvalds Cc: Ingo Molnar Cc: "Nicholas A. Bellinger" --- kernel/softirq.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/softirq.c b/kernel/softirq.c index 14d7758..d4ee1c6 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -157,7 +157,7 @@ EXPORT_SYMBOL(_local_bh_enable); static inline void _local_bh_enable_ip(unsigned long ip) { - WARN_ON_ONCE(in_irq() || irqs_disabled()); + WARN_ON_ONCE(in_irq()); #ifdef CONFIG_TRACE_IRQFLAGS local_irq_disable(); #endif -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] list: add list_for_each_entry_del
I have seen a lot of boilerplate code that either follows the pattern of while (!list_empty(head)) { pos = list_entry(head->next, struct foo, list); list_del(pos->list); ... } or some variant thereof. With this patch in, people can use list_for_each_entry_del(pos, head, list) { ... } The patch also adds a list_for_each_del variant, even though I have only found a single user for that one so far. Signed-off-by: Joern Engel --- include/linux/list.h | 33 + 1 file changed, 33 insertions(+) diff --git a/include/linux/list.h b/include/linux/list.h index 6a1f8df..e09fe10 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -361,6 +361,17 @@ static inline void list_splice_tail_init(struct list_head *list, #define list_first_entry(ptr, type, member) \ list_entry((ptr)->next, type, member) +static inline struct list_head *list_first_del(struct list_head *head) +{ + struct list_head *item; + + if (list_empty(head)) + return NULL; + item = head->next; + list_del(item); + return item; +} + /** * list_for_each - iterate over a list * @pos: the &struct list_head to use as a loop cursor. @@ -483,6 +494,28 @@ static inline void list_splice_tail_init(struct list_head *list, pos = list_entry(pos->member.next, typeof(*pos), member)) /** + * list_for_each_remove - iterate over a list, deleting each entry + * @pos: the &struct list_head to use as a loop cursor. + * @head: the head of your list. + * + * Calls list_del() on pos on each iteration + */ +#define list_for_each_del(pos, head) \ + while ((pos = list_first_del(head))) + +/** + * list_for_each_entry_remove - iterate over a list of given type, deleting each entry + * @pos: the type * to use as loop cursor. + * @head: the head of your list. + * @member:the name of the list_struct within the struct + * + * Calls list_del() on pos on each iteration + */ +#define list_for_each_entry_del(pos, head, member) \ + while (pos = list_entry(list_first_del(head), typeof(*pos), member), \ + &pos->member) + +/** * list_for_each_entry_safe - iterate over list of given type safe against removal of list entry * @pos: the type * to use as a loop cursor. * @n: another type * to use as temporary storage -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/2] introduce list_for_each_entry_del
A purely janitorial patchset. A fairly common pattern is to take a list, remove every object from it and do something with this object - usually kfree() some variant. A stupid grep identified roughly 300 instances, with many more hidden behind more complicated patterns to achieve the same end results. This patchset moves the boilerplate code into list.h and uses it in a few example places. Diffstat is pretty clear and imo the code is improved as well. Drawback is that object size is growing. I think an ideal compiler should be able to optimize all the overhead away, but 4.7 just isn't there yet. Or maybe I just messed up - patches are only compile-tested after all. Comments/ideas are welcome. Joern Engel (2): list: add list_for_each_entry_del btrfs: use list_for_each_entry_del fs/btrfs/backref.c | 15 +++ fs/btrfs/compression.c |4 +--- fs/btrfs/disk-io.c |6 +- fs/btrfs/extent-tree.c | 17 +++-- fs/btrfs/extent_io.c|8 ++-- fs/btrfs/inode.c| 16 +++- fs/btrfs/ordered-data.c |7 +-- fs/btrfs/qgroup.c | 22 -- fs/btrfs/relocation.c |6 +- fs/btrfs/scrub.c|9 +++-- fs/btrfs/transaction.c |5 + fs/btrfs/volumes.c | 11 ++- include/linux/list.h| 33 + 13 files changed, 58 insertions(+), 101 deletions(-) -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] btrfs: use list_for_each_entry_del
Signed-off-by: Joern Engel --- fs/btrfs/backref.c | 15 +++ fs/btrfs/compression.c |4 +--- fs/btrfs/disk-io.c |6 +- fs/btrfs/extent-tree.c | 17 +++-- fs/btrfs/extent_io.c|8 ++-- fs/btrfs/inode.c| 16 +++- fs/btrfs/ordered-data.c |7 +-- fs/btrfs/qgroup.c | 22 -- fs/btrfs/relocation.c |6 +- fs/btrfs/scrub.c|9 +++-- fs/btrfs/transaction.c |5 + fs/btrfs/volumes.c | 11 ++- 12 files changed, 25 insertions(+), 101 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index bd605c8..ab51655 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -893,9 +893,7 @@ again: if (ret) goto out; - while (!list_empty(&prefs)) { - ref = list_first_entry(&prefs, struct __prelim_ref, list); - list_del(&ref->list); + list_for_each_entry_del(ref, &prefs, list) { WARN_ON(ref->count < 0); if (ref->count && ref->root_id && ref->parent == 0) { /* no parent == root of tree */ @@ -937,17 +935,10 @@ again: out: btrfs_free_path(path); - while (!list_empty(&prefs)) { - ref = list_first_entry(&prefs, struct __prelim_ref, list); - list_del(&ref->list); + list_for_each_entry_del(ref, &prefs, list) kfree(ref); - } - while (!list_empty(&prefs_delayed)) { - ref = list_first_entry(&prefs_delayed, struct __prelim_ref, - list); - list_del(&ref->list); + list_for_each_entry_del(ref, &prefs_delayed, list) kfree(ref); - } return ret; } diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 15b9408..c8a890b 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -841,9 +841,7 @@ static void free_workspaces(void) int i; for (i = 0; i < BTRFS_COMPRESS_TYPES; i++) { - while (!list_empty(&comp_idle_workspace[i])) { - workspace = comp_idle_workspace[i].next; - list_del(workspace); + list_for_each_del(workspace, &comp_idle_workspace[i]) { btrfs_compress_op[i]->free_workspace(workspace); atomic_dec(&comp_alloc_workspace[i]); } diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 6d19a0a..66d99c9 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3289,11 +3289,7 @@ static void del_fs_roots(struct btrfs_fs_info *fs_info) struct btrfs_root *gang[8]; int i; - while (!list_empty(&fs_info->dead_roots)) { - gang[0] = list_entry(fs_info->dead_roots.next, -struct btrfs_root, root_list); - list_del(&gang[0]->root_list); - + list_for_each_entry_del(gang[0], &fs_info->dead_roots, root_list) { if (gang[0]->in_radix) { btrfs_free_fs_root(fs_info, gang[0]); } else { diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 3d55123..42de094 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2435,10 +2435,7 @@ int btrfs_delayed_refs_qgroup_accounting(struct btrfs_trans_handle *trans, if (!trans->delayed_ref_elem.seq) return 0; - while (!list_empty(&trans->qgroup_ref_list)) { - qgroup_update = list_first_entry(&trans->qgroup_ref_list, -struct qgroup_update, list); - list_del(&qgroup_update->list); + list_for_each_entry_del(qgroup_update, &trans->qgroup_ref_list, list) { if (!ret) ret = btrfs_qgroup_account_ref( trans, fs_info, qgroup_update->node, @@ -7821,12 +7818,8 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info) struct rb_node *n; down_write(&info->extent_commit_sem); - while (!list_empty(&info->caching_block_groups)) { - caching_ctl = list_entry(info->caching_block_groups.next, -struct btrfs_caching_control, list); - list_del(&caching_ctl->list); + list_for_each_entry_del(caching_ctl, &info->caching_block_groups, list) put_caching_control(caching_ctl); - } up_write(&info->extent_commit_sem); spin_lock(&info->block_group_cache_lock); @@ -7868,10 +7861,7 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info) release_global_block_rsv(info); -
[PATCH 2/3] target: remove unused codes from enum tcm_tmrsp_table
Three have been checked for but were never set. Remove the dead code. Also renumbers the remaining ones to a) get rid of the holes after the removal and b) avoid a collision between TMR_FUNCTION_COMPLETE==0 and the uninitialized case. If we failed to set a code, we should rather fall into the default case then return success. Signed-off-by: Joern Engel --- drivers/target/iscsi/iscsi_target.c |2 -- drivers/target/tcm_fc/tfc_cmd.c |3 --- include/target/target_core_base.h | 13 + 3 files changed, 5 insertions(+), 13 deletions(-) diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c index 49346b3..25d5567 100644 --- a/drivers/target/iscsi/iscsi_target.c +++ b/drivers/target/iscsi/iscsi_target.c @@ -3136,8 +3136,6 @@ static u8 iscsit_convert_tcm_tmr_rsp(struct se_tmr_req *se_tmr) return ISCSI_TMF_RSP_NO_LUN; case TMR_TASK_MGMT_FUNCTION_NOT_SUPPORTED: return ISCSI_TMF_RSP_NOT_SUPPORTED; - case TMR_FUNCTION_AUTHORIZATION_FAILED: - return ISCSI_TMF_RSP_AUTH_FAILED; case TMR_FUNCTION_REJECTED: default: return ISCSI_TMF_RSP_REJECTED; diff --git a/drivers/target/tcm_fc/tfc_cmd.c b/drivers/target/tcm_fc/tfc_cmd.c index b406f17..7b6bb72 100644 --- a/drivers/target/tcm_fc/tfc_cmd.c +++ b/drivers/target/tcm_fc/tfc_cmd.c @@ -413,10 +413,7 @@ int ft_queue_tm_resp(struct se_cmd *se_cmd) code = FCP_TMF_REJECTED; break; case TMR_TASK_DOES_NOT_EXIST: - case TMR_TASK_STILL_ALLEGIANT: - case TMR_TASK_FAILOVER_NOT_SUPPORTED: case TMR_TASK_MGMT_FUNCTION_NOT_SUPPORTED: - case TMR_FUNCTION_AUTHORIZATION_FAILED: default: code = FCP_TMF_FAILED; break; diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h index c4af592..4e7dd74 100644 --- a/include/target/target_core_base.h +++ b/include/target/target_core_base.h @@ -218,14 +218,11 @@ enum tcm_tmreq_table { /* fabric independent task management response values */ enum tcm_tmrsp_table { - TMR_FUNCTION_COMPLETE = 0, - TMR_TASK_DOES_NOT_EXIST = 1, - TMR_LUN_DOES_NOT_EXIST = 2, - TMR_TASK_STILL_ALLEGIANT= 3, - TMR_TASK_FAILOVER_NOT_SUPPORTED = 4, - TMR_TASK_MGMT_FUNCTION_NOT_SUPPORTED= 5, - TMR_FUNCTION_AUTHORIZATION_FAILED = 6, - TMR_FUNCTION_REJECTED = 255, + TMR_FUNCTION_COMPLETE = 1, + TMR_TASK_DOES_NOT_EXIST = 2, + TMR_LUN_DOES_NOT_EXIST = 3, + TMR_TASK_MGMT_FUNCTION_NOT_SUPPORTED= 4, + TMR_FUNCTION_REJECTED = 5, }; /* -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3] target: remove iscsit_find_cmd_from_itt_or_dump()
Code is a copy of iscsit_find_cmd_from_itt(). Afaics this is debug code from at least two years ago. Either the bug in question has been long fixed or this debug code doesn't help fixing it. Whichever way you look at it, we should remove the debug code. Signed-off-by: Joern Engel --- drivers/target/iscsi/iscsi_target.c |3 +-- drivers/target/iscsi/iscsi_target_util.c | 24 drivers/target/iscsi/iscsi_target_util.h |2 -- 3 files changed, 1 insertion(+), 28 deletions(-) diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c index 7ea246a..49346b3 100644 --- a/drivers/target/iscsi/iscsi_target.c +++ b/drivers/target/iscsi/iscsi_target.c @@ -1213,8 +1213,7 @@ static int iscsit_handle_data_out(struct iscsi_conn *conn, unsigned char *buf) buf, conn); } - cmd = iscsit_find_cmd_from_itt_or_dump(conn, hdr->itt, - payload_length); + cmd = iscsit_find_cmd_from_itt(conn, hdr->itt); if (!cmd) return 0; diff --git a/drivers/target/iscsi/iscsi_target_util.c b/drivers/target/iscsi/iscsi_target_util.c index 7ce3505..e59dec0 100644 --- a/drivers/target/iscsi/iscsi_target_util.c +++ b/drivers/target/iscsi/iscsi_target_util.c @@ -369,30 +369,6 @@ struct iscsi_cmd *iscsit_find_cmd_from_itt( return NULL; } -struct iscsi_cmd *iscsit_find_cmd_from_itt_or_dump( - struct iscsi_conn *conn, - itt_t init_task_tag, - u32 length) -{ - struct iscsi_cmd *cmd; - - spin_lock_bh(&conn->cmd_lock); - list_for_each_entry(cmd, &conn->conn_cmd_list, i_conn_node) { - if (cmd->init_task_tag == init_task_tag) { - spin_unlock_bh(&conn->cmd_lock); - return cmd; - } - } - spin_unlock_bh(&conn->cmd_lock); - - pr_err("Unable to locate ITT: 0x%08x on CID: %hu," - " dumping payload\n", init_task_tag, conn->cid); - if (length) - iscsit_dump_data_payload(conn, length, 1); - - return NULL; -} - struct iscsi_cmd *iscsit_find_cmd_from_ttt( struct iscsi_conn *conn, u32 targ_xfer_tag) diff --git a/drivers/target/iscsi/iscsi_target_util.h b/drivers/target/iscsi/iscsi_target_util.h index 894d0f8..9614cb9 100644 --- a/drivers/target/iscsi/iscsi_target_util.h +++ b/drivers/target/iscsi/iscsi_target_util.h @@ -15,8 +15,6 @@ extern struct iscsi_r2t *iscsit_get_holder_for_r2tsn(struct iscsi_cmd *, u32); int iscsit_sequence_cmd(struct iscsi_conn *conn, struct iscsi_cmd *cmd, __be32 cmdsn); extern int iscsit_check_unsolicited_dataout(struct iscsi_cmd *, unsigned char *); extern struct iscsi_cmd *iscsit_find_cmd_from_itt(struct iscsi_conn *, itt_t); -extern struct iscsi_cmd *iscsit_find_cmd_from_itt_or_dump(struct iscsi_conn *, - itt_t, u32); extern struct iscsi_cmd *iscsit_find_cmd_from_ttt(struct iscsi_conn *, u32); extern int iscsit_find_cmd_for_recovery(struct iscsi_session *, struct iscsi_cmd **, struct iscsi_conn_recovery **, itt_t); -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/3] target: three cleanup patches
These things caught my eye while debugging something else. Nothing too exciting. Joern Engel (3): target: remove iscsit_find_cmd_from_itt_or_dump() target: remove unused codes from enum tcm_tmrsp_table target: make queue_tm_rsp() return void drivers/infiniband/ulp/srpt/ib_srpt.c| 14 +- drivers/scsi/qla2xxx/tcm_qla2xxx.c |4 +--- drivers/target/iscsi/iscsi_target.c |5 + drivers/target/iscsi/iscsi_target_configfs.c |3 +-- drivers/target/iscsi/iscsi_target_util.c | 24 drivers/target/iscsi/iscsi_target_util.h |2 -- drivers/target/loopback/tcm_loop.c |3 +-- drivers/target/sbp/sbp_target.c |3 +-- drivers/target/tcm_fc/tcm_fc.h |2 +- drivers/target/tcm_fc/tfc_cmd.c |8 ++-- drivers/usb/gadget/tcm_usb_gadget.c |3 +-- drivers/vhost/tcm_vhost.c|3 +-- include/target/target_core_base.h| 13 + include/target/target_core_fabric.h |2 +- 14 files changed, 21 insertions(+), 68 deletions(-) -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/3] target: make queue_tm_rsp() return void
The return value wasn't checked by any of the callers. Assuming this is correct behaviour, we can simplify some code by not bothering to generate it. Signed-off-by: Joern Engel --- drivers/infiniband/ulp/srpt/ib_srpt.c| 14 +- drivers/scsi/qla2xxx/tcm_qla2xxx.c |4 +--- drivers/target/iscsi/iscsi_target_configfs.c |3 +-- drivers/target/loopback/tcm_loop.c |3 +-- drivers/target/sbp/sbp_target.c |3 +-- drivers/target/tcm_fc/tcm_fc.h |2 +- drivers/target/tcm_fc/tfc_cmd.c |5 ++--- drivers/usb/gadget/tcm_usb_gadget.c |3 +-- drivers/vhost/tcm_vhost.c|3 +-- include/target/target_core_fabric.h |2 +- 10 files changed, 15 insertions(+), 27 deletions(-) diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c index c09d41b..2c61a28 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -2987,7 +2987,7 @@ static u8 tcm_to_srp_tsk_mgmt_status(const int tcm_mgmt_status) * Callback function called by the TCM core. Must not block since it can be * invoked on the context of the IB completion handler. */ -static int srpt_queue_response(struct se_cmd *cmd) +static void srpt_queue_response(struct se_cmd *cmd) { struct srpt_rdma_ch *ch; struct srpt_send_ioctx *ioctx; @@ -2998,8 +2998,6 @@ static int srpt_queue_response(struct se_cmd *cmd) int resp_len; u8 srp_tm_status; - ret = 0; - ioctx = container_of(cmd, struct srpt_send_ioctx, cmd); ch = ioctx->ch; BUG_ON(!ch); @@ -3025,7 +3023,7 @@ static int srpt_queue_response(struct se_cmd *cmd) || WARN_ON_ONCE(state == SRPT_STATE_CMD_RSP_SENT))) { atomic_inc(&ch->req_lim_delta); srpt_abort_cmd(ioctx); - goto out; + return; } dir = ioctx->cmd.data_direction; @@ -3037,7 +3035,7 @@ static int srpt_queue_response(struct se_cmd *cmd) if (ret) { printk(KERN_ERR "xfer_data failed for tag %llu\n", ioctx->tag); - goto out; + return; } } @@ -3058,9 +3056,6 @@ static int srpt_queue_response(struct se_cmd *cmd) srpt_set_cmd_state(ioctx, SRPT_STATE_DONE); target_put_sess_cmd(ioctx->ch->sess, &ioctx->cmd); } - -out: - return ret; } static int srpt_queue_status(struct se_cmd *cmd) @@ -3073,7 +3068,8 @@ static int srpt_queue_status(struct se_cmd *cmd) (SCF_TRANSPORT_TASK_SENSE | SCF_EMULATED_TASK_SENSE)) WARN_ON(cmd->scsi_status != SAM_STAT_CHECK_CONDITION); ioctx->queue_status_only = true; - return srpt_queue_response(cmd); + srpt_queue_response(cmd); + return 0; } static void srpt_refresh_port_work(struct work_struct *work) diff --git a/drivers/scsi/qla2xxx/tcm_qla2xxx.c b/drivers/scsi/qla2xxx/tcm_qla2xxx.c index 2313f06..9b75ec9 100644 --- a/drivers/scsi/qla2xxx/tcm_qla2xxx.c +++ b/drivers/scsi/qla2xxx/tcm_qla2xxx.c @@ -699,7 +699,7 @@ static int tcm_qla2xxx_queue_status(struct se_cmd *se_cmd) return qlt_xmit_response(cmd, xmit_type, se_cmd->scsi_status); } -static int tcm_qla2xxx_queue_tm_rsp(struct se_cmd *se_cmd) +static void tcm_qla2xxx_queue_tm_rsp(struct se_cmd *se_cmd) { struct se_tmr_req *se_tmr = se_cmd->se_tmr_req; struct qla_tgt_mgmt_cmd *mcmd = container_of(se_cmd, @@ -731,8 +731,6 @@ static int tcm_qla2xxx_queue_tm_rsp(struct se_cmd *se_cmd) * CTIO response packet. */ qlt_xmit_tm_rsp(mcmd); - - return 0; } /* Local pointer to allocated TCM configfs fabric module */ diff --git a/drivers/target/iscsi/iscsi_target_configfs.c b/drivers/target/iscsi/iscsi_target_configfs.c index 6bd5d01..0cfa96e 100644 --- a/drivers/target/iscsi/iscsi_target_configfs.c +++ b/drivers/target/iscsi/iscsi_target_configfs.c @@ -1582,13 +1582,12 @@ static int lio_queue_status(struct se_cmd *se_cmd) return 0; } -static int lio_queue_tm_rsp(struct se_cmd *se_cmd) +static void lio_queue_tm_rsp(struct se_cmd *se_cmd) { struct iscsi_cmd *cmd = container_of(se_cmd, struct iscsi_cmd, se_cmd); cmd->i_state = ISTATE_SEND_TASKMGTRSP; iscsit_add_cmd_to_response_queue(cmd, cmd->conn, cmd->i_state); - return 0; } static char *lio_tpg_get_endpoint_wwn(struct se_portal_group *se_tpg) diff --git a/drivers/target/loopback/tcm_loop.c b/drivers/target/loopback/tcm_loop.c index 2d444b1..673e4fa 100644 --- a/drivers/target/loopback/tcm_loop.c +++ b/drivers/target/loopback/tcm_loop.c @@ -787,7 +787,7 @@ static int tcm_loop_queue_status(struct se_cmd *se_cmd) return 0; } -sta
[PATCH 1/3] target: removed unused transport_state flag
Signed-off-by: Joern Engel --- include/target/target_core_base.h |1 - 1 file changed, 1 deletion(-) diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h index c4af592..068ec0f 100644 --- a/include/target/target_core_base.h +++ b/include/target/target_core_base.h @@ -463,7 +463,6 @@ struct se_cmd { #define CMD_T_ABORTED (1 << 0) #define CMD_T_ACTIVE (1 << 1) #define CMD_T_COMPLETE (1 << 2) -#define CMD_T_QUEUED (1 << 3) #define CMD_T_SENT (1 << 4) #define CMD_T_STOP (1 << 5) #define CMD_T_FAILED (1 << 6) -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/3] target: Fix two races leading to use-after-free
In our testing we've encountered use-after-free bugs, usually in the shape of double list_del, at a rate of 2-10 per week. Patches 2 and 3 fix two races that can both lead to use-after-free and after applying both of those patches, we have been bug-free for some weeks now. Patch 1 is an unrelated trivial cleanup. I just happened to spot it while I was in the area. Joern Engel (3): target: removed unused transport_state flag target: close target_put_sess_cmd() vs. core_tmr_abort_task() race v5 target: simplify target_wait_for_sess_cmds() drivers/infiniband/ulp/srpt/ib_srpt.c |2 +- drivers/scsi/qla2xxx/tcm_qla2xxx.c |2 +- drivers/target/target_core_transport.c | 73 +--- include/linux/kref.h | 33 +++ include/target/target_core_base.h |3 -- include/target/target_core_fabric.h|2 +- 6 files changed, 57 insertions(+), 58 deletions(-) -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/3] target: simplify target_wait_for_sess_cmds()
The second parameter was always 0, leading to effectively dead code. It called list_del() and se_cmd->se_tfo->release_cmd(), and had to set a flag to prevent target_release_cmd_kref() from doing the same. But most of all, it iterated the list without taking se_sess->sess_cmd_lock, leading to races against ABORT and LUN_RESET. Since the whole point of the function is to wait for the list to drain, and potentially print a bit of debug information in case that never happens, I've replaced the wait_for_completion() with 100ms sleep. The only callpath that would get delayed by this is rmmod, afaics, so I didn't want the overhead of a waitqueue. Signed-off-by: Joern Engel --- drivers/infiniband/ulp/srpt/ib_srpt.c |2 +- drivers/scsi/qla2xxx/tcm_qla2xxx.c |2 +- drivers/target/target_core_transport.c | 64 +--- include/target/target_core_base.h |2 - include/target/target_core_fabric.h|2 +- 5 files changed, 20 insertions(+), 52 deletions(-) diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c index c09d41b..c318f7c 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -2328,7 +2328,7 @@ static void srpt_release_channel_work(struct work_struct *w) se_sess = ch->sess; BUG_ON(!se_sess); - target_wait_for_sess_cmds(se_sess, 0); + target_wait_for_sess_cmds(se_sess); transport_deregister_session_configfs(se_sess); transport_deregister_session(se_sess); diff --git a/drivers/scsi/qla2xxx/tcm_qla2xxx.c b/drivers/scsi/qla2xxx/tcm_qla2xxx.c index d182c96..7a3870f 100644 --- a/drivers/scsi/qla2xxx/tcm_qla2xxx.c +++ b/drivers/scsi/qla2xxx/tcm_qla2xxx.c @@ -1370,7 +1370,7 @@ static void tcm_qla2xxx_free_session(struct qla_tgt_sess *sess) dump_stack(); return; } - target_wait_for_sess_cmds(se_sess, 0); + target_wait_for_sess_cmds(se_sess); transport_deregister_session_configfs(sess->se_sess); transport_deregister_session(sess->se_sess); diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index 0d46276..5b6dbf9 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -1043,7 +1043,6 @@ void transport_init_se_cmd( init_completion(&cmd->transport_lun_fe_stop_comp); init_completion(&cmd->transport_lun_stop_comp); init_completion(&cmd->t_transport_stop_comp); - init_completion(&cmd->cmd_wait_comp); init_completion(&cmd->task_stop_comp); spin_lock_init(&cmd->t_state_lock); cmd->transport_state = CMD_T_DEV_ACTIVE; @@ -2219,11 +2218,6 @@ static void target_release_cmd_kref(struct kref *kref) se_cmd->se_tfo->release_cmd(se_cmd); return; } - if (se_sess->sess_tearing_down && se_cmd->cmd_wait_set) { - spin_unlock(&se_sess->sess_cmd_lock); - complete(&se_cmd->cmd_wait_comp); - return; - } list_del(&se_cmd->se_cmd_list); spin_unlock(&se_sess->sess_cmd_lock); @@ -2241,68 +2235,44 @@ int target_put_sess_cmd(struct se_session *se_sess, struct se_cmd *se_cmd) } EXPORT_SYMBOL(target_put_sess_cmd); -/* target_sess_cmd_list_set_waiting - Flag all commands in - * sess_cmd_list to complete cmd_wait_comp. Set +/* target_sess_cmd_list_set_waiting - Set * sess_tearing_down so no more commands are queued. * @se_sess: session to flag */ void target_sess_cmd_list_set_waiting(struct se_session *se_sess) { - struct se_cmd *se_cmd; unsigned long flags; spin_lock_irqsave(&se_sess->sess_cmd_lock, flags); - WARN_ON(se_sess->sess_tearing_down); se_sess->sess_tearing_down = 1; - - list_for_each_entry(se_cmd, &se_sess->sess_cmd_list, se_cmd_list) - se_cmd->cmd_wait_set = 1; - spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags); } EXPORT_SYMBOL(target_sess_cmd_list_set_waiting); /* target_wait_for_sess_cmds - Wait for outstanding descriptors * @se_sess:session to wait for active I/O - * @wait_for_tasks:Make extra transport_wait_for_tasks call */ -void target_wait_for_sess_cmds( - struct se_session *se_sess, - int wait_for_tasks) +void target_wait_for_sess_cmds(struct se_session *se_sess) { - struct se_cmd *se_cmd, *tmp_cmd; - bool rc = false; - - list_for_each_entry_safe(se_cmd, tmp_cmd, - &se_sess->sess_cmd_list, se_cmd_list) { - list_del(&se_cmd->se_cmd_list); - - pr_debug("Waiting for se_cmd: %p t_state: %d, fabric state:" - &quo
[PATCH 2/3] target: close target_put_sess_cmd() vs. core_tmr_abort_task() race v5
It is possible for one thread to to take se_sess->sess_cmd_lock in core_tmr_abort_task() before taking a reference count on se_cmd->cmd_kref, while another thread in target_put_sess_cmd() drops se_cmd->cmd_kref before taking se_sess->sess_cmd_lock. This introduces kref_put_spinlock_irqsave() and uses it in target_put_sess_cmd() to close the race window. Signed-off-by: Joern Engel --- drivers/target/target_core_transport.c | 11 +-- include/linux/kref.h | 33 2 files changed, 38 insertions(+), 6 deletions(-) diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index 3243ea7..0d46276 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -2213,21 +2213,19 @@ static void target_release_cmd_kref(struct kref *kref) { struct se_cmd *se_cmd = container_of(kref, struct se_cmd, cmd_kref); struct se_session *se_sess = se_cmd->se_sess; - unsigned long flags; - spin_lock_irqsave(&se_sess->sess_cmd_lock, flags); if (list_empty(&se_cmd->se_cmd_list)) { - spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags); + spin_unlock(&se_sess->sess_cmd_lock); se_cmd->se_tfo->release_cmd(se_cmd); return; } if (se_sess->sess_tearing_down && se_cmd->cmd_wait_set) { - spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags); + spin_unlock(&se_sess->sess_cmd_lock); complete(&se_cmd->cmd_wait_comp); return; } list_del(&se_cmd->se_cmd_list); - spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags); + spin_unlock(&se_sess->sess_cmd_lock); se_cmd->se_tfo->release_cmd(se_cmd); } @@ -2238,7 +2236,8 @@ static void target_release_cmd_kref(struct kref *kref) */ int target_put_sess_cmd(struct se_session *se_sess, struct se_cmd *se_cmd) { - return kref_put(&se_cmd->cmd_kref, target_release_cmd_kref); + return kref_put_spinlock_irqsave(&se_cmd->cmd_kref, target_release_cmd_kref, + &se_sess->sess_cmd_lock); } EXPORT_SYMBOL(target_put_sess_cmd); diff --git a/include/linux/kref.h b/include/linux/kref.h index 4972e6e..7419c02 100644 --- a/include/linux/kref.h +++ b/include/linux/kref.h @@ -19,6 +19,7 @@ #include #include #include +#include struct kref { atomic_t refcount; @@ -95,6 +96,38 @@ static inline int kref_put(struct kref *kref, void (*release)(struct kref *kref) return kref_sub(kref, 1, release); } +/** + * kref_put_spinlock_irqsave - decrement refcount for object. + * @kref: object. + * @release: pointer to the function that will clean up the object when the + * last reference to the object is released. + * This pointer is required, and it is not acceptable to pass kfree + * in as this function. + * @lock: lock to take in release case + * + * Behaves identical to kref_put with one exception. If the reference count + * drops to zero, the lock will be taken atomically wrt dropping the reference + * count. The release function has to call spin_unlock() without _irqrestore. + */ +static inline int kref_put_spinlock_irqsave(struct kref *kref, + void (*release)(struct kref *kref), + spinlock_t *lock) +{ + unsigned long flags; + + WARN_ON(release == NULL); + if (atomic_add_unless(&kref->refcount, -1, 1)) + return 0; + spin_lock_irqsave(lock, flags); + if (atomic_dec_and_test(&kref->refcount)) { + release(kref); + local_irq_restore(flags); + return 1; + } + spin_unlock_irqrestore(lock, flags); + return 0; +} + static inline int kref_put_mutex(struct kref *kref, void (*release)(struct kref *kref), struct mutex *lock) -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 10/14] blockconsole: Fix undefined MAX_RT_PRIO
From: Takashi Iwai Signed-off-by: Takashi Iwai --- drivers/block/blockconsole.c |1 + 1 file changed, 1 insertion(+) diff --git a/drivers/block/blockconsole.c b/drivers/block/blockconsole.c index e88b8ee..c22272f 100644 --- a/drivers/block/blockconsole.c +++ b/drivers/block/blockconsole.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #define BLOCKCONSOLE_MAGIC "\nLinux blockconsole version 1.1\n" -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mpt2sas/mpt3sas: prevent double free on error path
I noticed this one when list_del was called with poisoned list pointers, but the real problem is a double-free (and a use-after-free just before that). Both _scsih_probe_boot_devices() and _scsih_sas_device_add() put the sas_device onto a list, thereby giving up control. Next they call mpt2sas_transport_port_add() and will list_del and free the object on errors. If some other function already did the list_del and free, it will happen again. This patch adds reference counting to prevent the double free. One reference count goes to the caller of mpt2sas_transport_port_add(), the second to the list. Whoever removes the object from the list gets to drop one reference count. _scsih_probe_boot_devices() and _scsih_sas_device_add() get a second reference count to ensure the object is not freed while they are still accessing it. To prevent the double list_del(), I changed the code to list_del_init() and added a list_empty() check before that. Since the list_empty/list_del_init is always called under a lock, this should be safe. I hate the complexity this patch adds, but see no alternative to it. mpt2sas0: failure at drivers/scsi/mpt2sas/mpt2sas_transport.c:708/mpt2sas_transport_port_add()! general protection fault: [#1] SMP CPU 9 Pid: 3097, comm: kworker/u:11 Tainted: GW O 3.6.10+ #31392.trunk /0JP31P RIP: 0010:[] [] _scsih_sas_device_remove+0x54/0x90 [mpt2sas] RSP: 0018:881fed4d7ab0 EFLAGS: 00010046 RAX: dead00200200 RBX: 881ff6a5cd88 RCX: 10e8 RDX: 881ff7dab800 RSI: 881ff7daba00 RDI: dead00100100 RBP: 881fed4d7ad0 R08: dead00200200 R09: 880fff802200 R10: a0317980 R11: R12: 881ff7daba00 R13: 0286 R14: 500605ba006c9d09 R15: 881ff7daba00 FS: () GS:88203fc8() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 7f8ac89ec458 CR3: 001ff4c5c000 CR4: 000407e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process kworker/u:11 (pid: 3097, threadinfo 881fed4d6000, task 881402f3d9c0) Stack: 0401 881ff6a5c6b0 0401 0016 881fed4d7bb0 a030f93e 881ff6a5cd88 0012000e0f08 006c9d090002000b 00180009500605ba 04010016 Call Trace: [] _scsih_add_device.clone.32+0x2fe/0x420 [mpt2sas] [] _scsih_sas_topology_change_event.clone.38+0x285/0x620 [mpt2sas] [] ? load_balance+0x100/0x7a0 [] ? _scsih_sas_topology_change_event.clone.38+0x620/0x620 [mpt2sas] [] _firmware_event_work+0x30a/0xfc0 [mpt2sas] [] ? __switch_to+0x14c/0x410 [] ? finish_task_switch+0x4b/0xf0 [] ? _scsih_sas_topology_change_event.clone.38+0x620/0x620 [mpt2sas] [] process_one_work+0x140/0x500 [] worker_thread+0x194/0x510 [] ? finish_task_switch+0x4b/0xf0 [] ? manage_workers+0x320/0x320 [] kthread+0x9e/0xb0 [] kernel_thread_helper+0x4/0x10 [] ? retint_restore_args+0x13/0x13 [] ? kthread_freezable_should_stop+0x70/0x70 [] ? gs_change+0x13/0x13 Signed-off-by: Joern Engel --- drivers/scsi/mpt2sas/mpt2sas_base.h |1 + drivers/scsi/mpt2sas/mpt2sas_scsih.c | 57 -- drivers/scsi/mpt3sas/mpt3sas_base.h |1 + drivers/scsi/mpt3sas/mpt3sas_scsih.c | 57 -- 4 files changed, 98 insertions(+), 18 deletions(-) diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.h b/drivers/scsi/mpt2sas/mpt2sas_base.h index 543d8d6..ceb7d41 100644 --- a/drivers/scsi/mpt2sas/mpt2sas_base.h +++ b/drivers/scsi/mpt2sas/mpt2sas_base.h @@ -367,6 +367,7 @@ struct _sas_device { u16 slot; u8 phy; u8 responding; + struct kref kref; }; /** diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c b/drivers/scsi/mpt2sas/mpt2sas_scsih.c index c6bdc92..217660c 100644 --- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c +++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c @@ -570,6 +570,19 @@ _scsih_sas_device_find_by_handle(struct MPT2SAS_ADAPTER *ioc, u16 handle) return NULL; } +static void free_sas_device(struct kref *kref) +{ + struct _sas_device *sas_device = container_of(kref, struct _sas_device, + kref); + + kfree(sas_device); +} + +static void put_sas_device(struct _sas_device *sas_device) +{ + kref_put(&sas_device->kref, free_sas_device); +} + /** * _scsih_sas_device_remove - remove sas_device from list. * @ioc: per adapter object @@ -583,14 +596,19 @@ _scsih_sas_device_remove(struct MPT2SAS_ADAPTER *ioc, struct _sas_device *sas_device) { unsigned long flags; + int was_on_list = 0; if (!sas_device) return; spin_lock_irqsave(&ioc->sas_device_lock, flags); - list_del(&sas_device->list); - kfree(sas_device); + if (!list_empty(&sas_device->list)) { + lis
[PATCH 00/14] Add blockconsole version 1.1 (try 3)
Blockconsole is a console driver very roughly similar to netconsole. Instead of sending messages out via UDP, they are written to a block device. Typically a USB stick is chosen, although in principle any block device will do. In most cases blockconsole is useful where netconsole is not, i.e. single machines without network access or without an accessable netconsole capture server. When using both blockconsole and netconsole, I have found netconsole to sometimes create a mess under high message load (sysrq-t, etc.) while blockconsole does not. Most importantly, a number of bugs were identified and fixed that would have been unexplained machine reboots without blockconsole. More highlights: * reasonably small and self-contained code, * some 100+ machine years of runtime, * nice tutorial with a 30-sec guide for the impatient. Special thanks to Borislav Petkov for many improvements and kicking my behind to provide a proper git tree and resend patches. Git tree is on kernel.org and I intend to keep it stable, as people seem to be using it already. It has been in -next since Mar 7. git://git.kernel.org/pub/scm/linux/kernel/git/joern/bcon2.git Joern Engel (10): do_mounts: constify name_to_dev_t parameter add blockconsole version 1.1 printk: add CON_ALLDATA console flag netconsole: use CON_ALLDATA blockconsole: use CON_ALLDATA bcon: add a release work struct bcon: check for hdparm in bcon_tail bcon: remove version 1.0 support bcon: Fix wrap-around behaviour netconsole: s/syslogd/cancd/ in documentation Takashi Iwai (4): blockconsole: Allow to pass a device file path to bcon_tail blockconsole: Fix undefined MAX_RT_PRIO blockconsole: Rename device_lock with bc_device_lock blockconsole: Mark a local work struct static Documentation/block/blockconsole.txt| 94 Documentation/block/blockconsole/bcon_tail | 82 +++ Documentation/block/blockconsole/mkblockconsole | 29 ++ Documentation/networking/netconsole.txt | 16 +- block/partitions/Makefile |1 + block/partitions/blockconsole.c | 22 + block/partitions/check.c|3 + block/partitions/check.h|3 + drivers/block/Kconfig |6 + drivers/block/Makefile |1 + drivers/block/blockconsole.c| 618 +++ drivers/net/netconsole.c|2 +- include/linux/blockconsole.h|7 + include/linux/console.h |1 + include/linux/mount.h |2 +- init/do_mounts.c|2 +- kernel/printk.c |5 +- 17 files changed, 885 insertions(+), 9 deletions(-) create mode 100644 Documentation/block/blockconsole.txt create mode 100755 Documentation/block/blockconsole/bcon_tail create mode 100755 Documentation/block/blockconsole/mkblockconsole create mode 100644 block/partitions/blockconsole.c create mode 100644 drivers/block/blockconsole.c create mode 100644 include/linux/blockconsole.h -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 09/14] bcon: remove version 1.0 support
Very few machines ever ran with 1.0 format and by now I doubt whether a single one still does. No need to carry that code along. Signed-off-by: Joern Engel --- drivers/block/blockconsole.c | 19 ++- 1 file changed, 2 insertions(+), 17 deletions(-) diff --git a/drivers/block/blockconsole.c b/drivers/block/blockconsole.c index b4730f8..e88b8ee 100644 --- a/drivers/block/blockconsole.c +++ b/drivers/block/blockconsole.c @@ -18,7 +18,6 @@ #include #include -#define BLOCKCONSOLE_MAGIC_OLD "\nLinux blockconsole version 1.0\n" #define BLOCKCONSOLE_MAGIC "\nLinux blockconsole version 1.1\n" #define BCON_UUID_OFS (32) #define BCON_ROUND_OFS (41) @@ -234,18 +233,6 @@ static void bcon_advance_write_bytes(struct blockconsole *bc, int bytes) } } -static int bcon_convert_old_format(struct blockconsole *bc) -{ - bc->uuid = get_random_int(); - bc->round = 0; - bc->console_bytes = bc->write_bytes = 0; - bcon_advance_console_bytes(bc, 0); /* To skip the header */ - bcon_advance_write_bytes(bc, 0); /* To wrap around, if necessary */ - bcon_erase_segment(bc); - pr_info("converted %s from old format\n", bc->devname); - return 0; -} - static int bcon_find_end_of_log(struct blockconsole *bc) { u64 start = 0, end = bc->max_bytes, middle; @@ -258,8 +245,8 @@ static int bcon_find_end_of_log(struct blockconsole *bc) return err; /* Second sanity check, out of sheer paranoia */ version = bcon_magic_present(sec0); - if (version == 10) - return bcon_convert_old_format(bc); + if (!version) + return -EINVAL; bc->uuid = simple_strtoull(sec0 + BCON_UUID_OFS, NULL, 16); bc->round = simple_strtoull(sec0 + BCON_ROUND_OFS, NULL, 16); @@ -618,8 +605,6 @@ int bcon_magic_present(const void *data) { size_t len = strlen(BLOCKCONSOLE_MAGIC); - if (!memcmp(data, BLOCKCONSOLE_MAGIC_OLD, len)) - return 10; if (memcmp(data, BLOCKCONSOLE_MAGIC, len)) return 0; if (!is_four_byte_hex(data + BCON_UUID_OFS)) -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mpt2sas: prevent double free on error path
I noticed this one when list_del was called with poisoned list pointers, but the real problem is a double-free (and a use-after-free just before that). Both _scsih_probe_boot_devices() and _scsih_sas_device_add() put the sas_device onto a list, thereby giving up control. Next they call mpt2sas_transport_port_add() and will list_del and free the object on errors. If some other function already did the list_del and free, it will happen again. This patch adds reference counting to prevent the double free. One reference count goes to the caller of mpt2sas_transport_port_add(), the second to the list. Whoever removes the object from the list gets to drop one reference count. _scsih_probe_boot_devices() and _scsih_sas_device_add() get a second reference count to ensure the object is not freed while they are still accessing it. To prevent the double list_del(), I changed the code to list_del_init() and added a list_empty() check before that. Since the list_empty/list_del_init is always called under a lock, this should be safe. I hate the complexity this patch adds, but see no alternative to it. mpt2sas0: failure at drivers/scsi/mpt2sas/mpt2sas_transport.c:708/mpt2sas_transport_port_add()! general protection fault: [#1] SMP CPU 9 Pid: 3097, comm: kworker/u:11 Tainted: GW O 3.6.10+ #31392.trunk /0JP31P RIP: 0010:[] [] _scsih_sas_device_remove+0x54/0x90 [mpt2sas] RSP: 0018:881fed4d7ab0 EFLAGS: 00010046 RAX: dead00200200 RBX: 881ff6a5cd88 RCX: 10e8 RDX: 881ff7dab800 RSI: 881ff7daba00 RDI: dead00100100 RBP: 881fed4d7ad0 R08: dead00200200 R09: 880fff802200 R10: a0317980 R11: R12: 881ff7daba00 R13: 0286 R14: 500605ba006c9d09 R15: 881ff7daba00 FS: () GS:88203fc8() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 7f8ac89ec458 CR3: 001ff4c5c000 CR4: 000407e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process kworker/u:11 (pid: 3097, threadinfo 881fed4d6000, task 881402f3d9c0) Stack: 0401 881ff6a5c6b0 0401 0016 881fed4d7bb0 a030f93e 881ff6a5cd88 0012000e0f08 006c9d090002000b 00180009500605ba 04010016 Call Trace: [] _scsih_add_device.clone.32+0x2fe/0x420 [mpt2sas] [] _scsih_sas_topology_change_event.clone.38+0x285/0x620 [mpt2sas] [] ? load_balance+0x100/0x7a0 [] ? _scsih_sas_topology_change_event.clone.38+0x620/0x620 [mpt2sas] [] _firmware_event_work+0x30a/0xfc0 [mpt2sas] [] ? __switch_to+0x14c/0x410 [] ? finish_task_switch+0x4b/0xf0 [] ? _scsih_sas_topology_change_event.clone.38+0x620/0x620 [mpt2sas] [] process_one_work+0x140/0x500 [] worker_thread+0x194/0x510 [] ? finish_task_switch+0x4b/0xf0 [] ? manage_workers+0x320/0x320 [] kthread+0x9e/0xb0 [] kernel_thread_helper+0x4/0x10 [] ? retint_restore_args+0x13/0x13 [] ? kthread_freezable_should_stop+0x70/0x70 [] ? gs_change+0x13/0x13 Signed-off-by: Joern Engel --- drivers/scsi/mpt2sas/mpt2sas_base.h |1 + drivers/scsi/mpt2sas/mpt2sas_scsih.c | 55 -- 2 files changed, 47 insertions(+), 9 deletions(-) diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.h b/drivers/scsi/mpt2sas/mpt2sas_base.h index 543d8d6..ceb7d41 100644 --- a/drivers/scsi/mpt2sas/mpt2sas_base.h +++ b/drivers/scsi/mpt2sas/mpt2sas_base.h @@ -367,6 +367,7 @@ struct _sas_device { u16 slot; u8 phy; u8 responding; + struct kref kref; }; /** diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c b/drivers/scsi/mpt2sas/mpt2sas_scsih.c index c6bdc92..43b3a98 100644 --- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c +++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c @@ -570,6 +570,18 @@ _scsih_sas_device_find_by_handle(struct MPT2SAS_ADAPTER *ioc, u16 handle) return NULL; } +static void free_sas_device(struct kref *kref) +{ + struct _sas_device *sas_device = container_of(kref, struct _sas_device, + kref); + kfree(sas_device); +} + +static void put_sas_device(struct _sas_device *sas_device) +{ + kref_put(&sas_device->kref, free_sas_device); +} + /** * _scsih_sas_device_remove - remove sas_device from list. * @ioc: per adapter object @@ -583,14 +595,19 @@ _scsih_sas_device_remove(struct MPT2SAS_ADAPTER *ioc, struct _sas_device *sas_device) { unsigned long flags; + int was_on_list = 0; if (!sas_device) return; spin_lock_irqsave(&ioc->sas_device_lock, flags); - list_del(&sas_device->list); - kfree(sas_device); + if (!list_empty(&sas_device->list)) { + list_del_init(&sas_device->list); + was_on_list = 1; + } spin_unlock_irqrestore(&ioc->sas_de
[PATCH 01/14] do_mounts: constify name_to_dev_t parameter
Signed-off-by: Joern Engel --- include/linux/mount.h |2 +- init/do_mounts.c |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/mount.h b/include/linux/mount.h index d7029f4..6b5fa77 100644 --- a/include/linux/mount.h +++ b/include/linux/mount.h @@ -74,6 +74,6 @@ extern struct vfsmount *vfs_kern_mount(struct file_system_type *type, extern void mnt_set_expiry(struct vfsmount *mnt, struct list_head *expiry_list); extern void mark_mounts_for_expiry(struct list_head *mounts); -extern dev_t name_to_dev_t(char *name); +extern dev_t name_to_dev_t(const char *name); #endif /* _LINUX_MOUNT_H */ diff --git a/init/do_mounts.c b/init/do_mounts.c index 1d1b634..da96f85 100644 --- a/init/do_mounts.c +++ b/init/do_mounts.c @@ -202,7 +202,7 @@ done: * bangs. */ -dev_t name_to_dev_t(char *name) +dev_t name_to_dev_t(const char *name) { char s[32]; char *p; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 11/14] blockconsole: Rename device_lock with bc_device_lock
From: Takashi Iwai Avoid the name conflict with device_lock() defined in linux/device.h. Signed-off-by: Takashi Iwai Signed-off-by: Joern Engel --- drivers/block/blockconsole.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/block/blockconsole.c b/drivers/block/blockconsole.c index c22272f..40cc96e 100644 --- a/drivers/block/blockconsole.c +++ b/drivers/block/blockconsole.c @@ -546,17 +546,17 @@ static void bcon_create_fuzzy(const char *name) } } -static DEFINE_SPINLOCK(device_lock); +static DEFINE_SPINLOCK(bcon_device_lock); static char scanned_devices[80]; static void bcon_do_add(struct work_struct *work) { char local_devices[80], *name, *remainder = local_devices; - spin_lock(&device_lock); + spin_lock(&bcon_device_lock); memcpy(local_devices, scanned_devices, sizeof(local_devices)); memset(scanned_devices, 0, sizeof(scanned_devices)); - spin_unlock(&device_lock); + spin_unlock(&bcon_device_lock); while (remainder && remainder[0]) { name = strsep(&remainder, ","); @@ -573,11 +573,11 @@ void bcon_add(const char *name) * to go pick it up asap. Once it is picked up, the buffer is empty * again, so hopefully it will suffice for all sane users. */ - spin_lock(&device_lock); + spin_lock(&bcon_device_lock); if (scanned_devices[0]) strncat(scanned_devices, ",", sizeof(scanned_devices)); strncat(scanned_devices, name, sizeof(scanned_devices)); - spin_unlock(&device_lock); + spin_unlock(&bcon_device_lock); schedule_work(&bcon_add_work); } -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 08/14] blockconsole: Allow to pass a device file path to bcon_tail
From: Takashi Iwai ... instead of always looking through all devices. Minor tweak: Moved the "CANDIDATES=" line below Takashi's new code. Signed-off-by: Takashi Iwai Signed-off-by: Joern Engel --- Documentation/block/blockconsole/bcon_tail | 15 +++ 1 file changed, 15 insertions(+) diff --git a/Documentation/block/blockconsole/bcon_tail b/Documentation/block/blockconsole/bcon_tail index eb3524b..70926c6 100755 --- a/Documentation/block/blockconsole/bcon_tail +++ b/Documentation/block/blockconsole/bcon_tail @@ -58,6 +58,21 @@ end_of_log() { # HEADER contains a newline, so the funny quoting is necessary HEADER=' Linux blockconsole version 1.1' + +DEV="$1" +if [ -n "$DEV" ]; then + if [ ! -b "$DEV" ]; then + echo "bcon_tail: No block device file $DEV" + exit 1 + fi + if [ "`head -c32 $DEV`" != "$HEADER" ]; then + echo "bcon_tail: Invalid device file $DEV" + exit 1 + fi + end_of_log $DEV + exit 0 +fi + CANDIDATES=`lsscsi |sed 's|.*/dev|/dev|'` for DEV in $CANDIDATES; do -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 02/14] add blockconsole version 1.1
Console driver similar to netconsole, except it writes to a block device. Can be useful in a setup where netconsole, for whatever reasons, is impractical. Changes since version 1.0: - Header format overhaul, addressing several annoyances when actually using blockconsole for production. - Steve Hodgson added a panic notifier. - Added improvements and cleanups from Borislav Petkov. Signed-off-by: Steve Hodgson Signed-off-by: Borislav Petkov Signed-off-by: Joern Engel --- Documentation/block/blockconsole.txt| 94 Documentation/block/blockconsole/bcon_tail | 62 +++ Documentation/block/blockconsole/mkblockconsole | 29 ++ block/partitions/Makefile |1 + block/partitions/blockconsole.c | 22 + block/partitions/check.c|3 + block/partitions/check.h|3 + drivers/block/Kconfig |6 + drivers/block/Makefile |1 + drivers/block/blockconsole.c| 621 +++ include/linux/blockconsole.h|7 + 11 files changed, 849 insertions(+) create mode 100644 Documentation/block/blockconsole.txt create mode 100755 Documentation/block/blockconsole/bcon_tail create mode 100755 Documentation/block/blockconsole/mkblockconsole create mode 100644 block/partitions/blockconsole.c create mode 100644 drivers/block/blockconsole.c create mode 100644 include/linux/blockconsole.h diff --git a/Documentation/block/blockconsole.txt b/Documentation/block/blockconsole.txt new file mode 100644 index 000..2b45516 --- /dev/null +++ b/Documentation/block/blockconsole.txt @@ -0,0 +1,94 @@ +started by Jörn Engel 2012.03.17 + +Blocksonsole for the impatient +== + +1. Find an unused USB stick and prepare it for blockconsole by writing + the blockconsole signature to it: + $ ./mkblockconsole /dev/ + +2. USB stick is ready for use, replug it so that the kernel can start + logging to it. + +3. After you've done logging, read out the logs from it like this: + $ ./bcon_tail + + This creates a file called /var/log/bcon. which contains the + last 16M of the logs. Open it with a sane editor like vim which + can display zeroed gaps as a single line and start staring at the + logs. + For the really impatient, use: + $ vi `./bcon_tail` + +Introduction: += + +This module logs kernel printk messages to block devices, e.g. usb +sticks. It allows after-the-fact debugging when the main +disk/filesystem fails and serial consoles and netconsole are +impractical. + +It can currently only be used built-in. Blockconsole hooks into the +partition scanning code and will bring up configured block devices as +soon as possible. While this doesn't allow capture of early kernel +panics, it does capture most of the boot process. + +Block device configuration: +== + +Blockconsole has no configuration parameter. In order to use a block +device for logging, the blockconsole header has to be written to the +device in question. Logging to partitions is not supported. + +The example program mkblockconsole can be used to generate such a +header on a device. + +Header format: +== + +A legal header looks like this: + +Linux blockconsole version 1.1 +818cf322 + + + +It consists of a newline, the "Linux blockconsole version 1.1" string +plus three numbers on separate lines each. Numbers are all 32bit, +represented as 8-byte hex strings, with letters in lowercase. The +first number is a uuid for this particular console device. Just pick +a random number when generating the device. The second number is a +wrap counter and unlikely to ever increment. The third is a tile +counter, with a tile being one megabyte in size. + +Miscellaneous notes: + + +Blockconsole will write a new header for every tile or once every +megabyte. The header starts with a newline in order to ensure the +"Linux blockconsole...' string always ends up at the beginning of a +line if you read the blockconsole in a text editor. + +The blockconsole header is constructed such that opening the log +device in a text editor, ignoring memory constraints due to large +devices, should just work and be reasonably non-confusing to readers. +However, the example program bcon_tail can be used to copy the last 16 +tiles of the log device to /var/log/bcon., which should be much +easier to handle. + +The wrap counter is used by blockconsole to determine where to +continue logging after a reboot. New logs will be written to the +first tile that wasn't written to by the last instance of +blockconsole. Similarly bcon_tail is doing a binary search to find +the end of the log. + +Writing to the log device is strictly circular. This should give +optimal performance and reliability on cheap devices, l
[PATCH 03/14] printk: add CON_ALLDATA console flag
For consoles like netconsole and blockconsole the loglevel filtering really doesn't make any sense. If a line gets printed at all, please send it down to that console, no questions asked. For vga_con, it is a completely different matter, as the user sitting in front of his console could get spammed by messages while trying to login or similar. So ignore_loglevel doesn't work as a one-size-fits-all approach. Add a per-console flag instead so that netconsole and blockconsole can opt-in. Signed-off-by: Joern Engel --- include/linux/console.h |1 + kernel/printk.c |5 +++-- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/include/linux/console.h b/include/linux/console.h index dedb082..eed92ad 100644 --- a/include/linux/console.h +++ b/include/linux/console.h @@ -116,6 +116,7 @@ static inline int con_debug_leave(void) #define CON_BOOT (8) #define CON_ANYTIME(16) /* Safe to call when cpu is offline */ #define CON_BRL(32) /* Used for a braille device */ +#define CON_ALLDATA(64) /* per-console ignore_loglevel */ struct console { charname[16]; diff --git a/kernel/printk.c b/kernel/printk.c index 267ce78..5221c59 100644 --- a/kernel/printk.c +++ b/kernel/printk.c @@ -1261,8 +1261,6 @@ static void call_console_drivers(int level, const char *text, size_t len) trace_console(text, 0, len, len); - if (level >= console_loglevel && !ignore_loglevel) - return; if (!console_drivers) return; @@ -1276,6 +1274,9 @@ static void call_console_drivers(int level, const char *text, size_t len) if (!cpu_online(smp_processor_id()) && !(con->flags & CON_ANYTIME)) continue; + if (level >= console_loglevel && !ignore_loglevel && + !(con->flags & CON_ALLDATA)) + continue; con->write(con, text, len); } } -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 04/14] netconsole: use CON_ALLDATA
Netconsole should really see every message ever printed. The alternative is to try debugging with information like this: [166135.633974] Stack: [166135.634016] Call Trace: [166135.634029] [166135.634156] [166135.634177] Code: 00 00 55 48 89 e5 0f 1f 44 00 00 ff 15 31 49 80 00 c9 c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 [166135.634384] 48 8b 14 25 98 24 01 00 48 8d 14 92 48 8d 04 bd 00 00 00 00 Signed-off-by: Joern Engel --- drivers/net/netconsole.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c index 6989ebe..77783fe 100644 --- a/drivers/net/netconsole.c +++ b/drivers/net/netconsole.c @@ -718,7 +718,7 @@ static void write_msg(struct console *con, const char *msg, unsigned int len) static struct console netconsole = { .name = "netcon", - .flags = CON_ENABLED, + .flags = CON_ENABLED | CON_ALLDATA, .write = write_msg, }; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 05/14] blockconsole: use CON_ALLDATA
Blockconsole should really see every message ever printed. The alternative is to try debugging with information like this: [166135.633974] Stack: [166135.634016] Call Trace: [166135.634029] [166135.634156] [166135.634177] Code: 00 00 55 48 89 e5 0f 1f 44 00 00 ff 15 31 49 80 00 c9 c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 [166135.634384] 48 8b 14 25 98 24 01 00 48 8d 14 92 48 8d 04 bd 00 00 00 00 Signed-off-by: Joern Engel --- drivers/block/blockconsole.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/block/blockconsole.c b/drivers/block/blockconsole.c index 7f8ac5b..32f6c62 100644 --- a/drivers/block/blockconsole.c +++ b/drivers/block/blockconsole.c @@ -481,7 +481,7 @@ static int bcon_create(const char *devname) strlcpy(bc->devname, devname, sizeof(bc->devname)); spin_lock_init(&bc->end_io_lock); strcpy(bc->console.name, "bcon"); - bc->console.flags = CON_PRINTBUFFER | CON_ENABLED; + bc->console.flags = CON_PRINTBUFFER | CON_ENABLED | CON_ALLDATA; bc->console.write = bcon_write; bc->bdev = blkdev_get_by_path(devname, mode, NULL); #ifndef MODULE -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 06/14] bcon: add a release work struct
The final bcon_put() can be called from atomic context, by way of bio_endio(). In that case we would sleep in invalidate_mapping_pages(), with the usual unhappy results. In nearly a year of production use, I have only seen a matching backtrace once. There was a second known issue that could be reproduced by "yes h > /proc/sysrq-trigger" and concurrently pulling and replugging the blockconsole device. It took be somewhere around 30 pulls and sore thumbs to reproduce and I never found the time to get to the bottom of it. Quite likely the two issues are identical. Signed-off-by: Joern Engel --- drivers/block/blockconsole.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/block/blockconsole.c b/drivers/block/blockconsole.c index 32f6c62..b4730f8 100644 --- a/drivers/block/blockconsole.c +++ b/drivers/block/blockconsole.c @@ -65,6 +65,7 @@ struct blockconsole { struct block_device *bdev; struct console console; struct work_struct unregister_work; + struct work_struct release_work; struct task_struct *writeback_thread; struct notifier_block panic_block; }; @@ -74,9 +75,10 @@ static void bcon_get(struct blockconsole *bc) kref_get(&bc->kref); } -static void bcon_release(struct kref *kref) +static void __bcon_release(struct work_struct *work) { - struct blockconsole *bc = container_of(kref, struct blockconsole, kref); + struct blockconsole *bc = container_of(work, struct blockconsole, + release_work); __free_pages(bc->zero_page, 0); __free_pages(bc->pages, 8); @@ -85,6 +87,14 @@ static void bcon_release(struct kref *kref) kfree(bc); } +static void bcon_release(struct kref *kref) +{ + struct blockconsole *bc = container_of(kref, struct blockconsole, kref); + + /* bcon_release can be called from atomic context */ + schedule_work(&bc->release_work); +} + static void bcon_put(struct blockconsole *bc) { kref_put(&bc->kref, bcon_release); @@ -512,6 +522,7 @@ static int bcon_create(const char *devname) if (IS_ERR(bc->writeback_thread)) goto out2; INIT_WORK(&bc->unregister_work, bcon_unregister); + INIT_WORK(&bc->release_work, __bcon_release); register_console(&bc->console); bc->panic_block.notifier_call = blockconsole_panic; bc->panic_block.priority = INT_MAX; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 07/14] bcon: check for hdparm in bcon_tail
From: Borislav Petkov Signed-off-by: Joern Engel --- Documentation/block/blockconsole/bcon_tail |5 + 1 file changed, 5 insertions(+) diff --git a/Documentation/block/blockconsole/bcon_tail b/Documentation/block/blockconsole/bcon_tail index b4bd660..eb3524b 100755 --- a/Documentation/block/blockconsole/bcon_tail +++ b/Documentation/block/blockconsole/bcon_tail @@ -14,6 +14,11 @@ if [ -z "$(which lsscsi)" ]; then exit 1 fi +if [ -z "$(which hdparm)" ]; then + echo "You need to install the hdparm package on your distro." + exit 1 +fi + end_of_log() { DEV=$1 UUID=`head -c40 $DEV|tail -c8` -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 13/14] bcon: Fix wrap-around behaviour
This seems to have broken around the introduction of format 1.1. When wrapping around, we should increment the wrap counter before writing it out, not after. Signed-off-by: Joern Engel --- drivers/block/blockconsole.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/block/blockconsole.c b/drivers/block/blockconsole.c index 01ddbc6..65f8ace 100644 --- a/drivers/block/blockconsole.c +++ b/drivers/block/blockconsole.c @@ -229,8 +229,8 @@ static void bcon_advance_write_bytes(struct blockconsole *bc, int bytes) bc->write_bytes += bytes; if (bc->write_bytes >= bc->max_bytes) { bc->write_bytes = 0; - bcon_init_first_page(bc); bc->round++; + bcon_init_first_page(bc); } } -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 14/14] netconsole: s/syslogd/cancd/ in documentation
Using syslogd to capture netconsole is known to be broken, see for example https://bugzilla.redhat.com/show_bug.cgi?id=432160 or any of the many other bug reports. We should not advertise it, much less as a first choice. The fact that syslogd tends to initially work makes it worse, as that creates false hope. Cancd is a syslog-for-netconsole of sorts and in my experience works better than any alternative for non-trivial setups, i.e. more than a single machine sending netconsole traffic. Since my hacked-up version of cancd is no longer compatible with Oracle's original, I linked to both. Signed-off-by: Joern Engel --- Documentation/networking/netconsole.txt | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/Documentation/networking/netconsole.txt b/Documentation/networking/netconsole.txt index 2e9e0ae2..c59d2bf 100644 --- a/Documentation/networking/netconsole.txt +++ b/Documentation/networking/netconsole.txt @@ -54,9 +54,7 @@ address. The remote host has several options to receive the kernel messages, for example: -1) syslogd - -2) netcat +1) netcat On distributions using a BSD-based netcat version (e.g. Fedora, openSUSE and Ubuntu) the listening port must be specified without @@ -65,10 +63,20 @@ for example: 'nc -u -l -p ' / 'nc -u -l ' or 'netcat -u -l -p ' / 'netcat -u -l ' -3) socat +2) socat 'socat udp-recv: -' +3) cancd + + A daemon written specifically for netconsole that is good at capturing + output from many machines. Using netcat for several machines either + interleaves output from all machines or requires the use of per-machine + ports. + + https://git.kernel.org/cgit/linux/kernel/git/joern/cancd.git/ + https://oss.oracle.com/projects/cancd/ + Dynamic reconfiguration: -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/9] Add blockconsole version 1.1 (try 2)
Blockconsole is a console driver very roughly similar to netconsole. Instead of sending messages out via UDP, they are written to a block device. Typically a USB stick is chosen, although in principle any block device will do. In most cases blockconsole is useful where netconsole is not, i.e. single machines without network access or without an accessable netconsole capture server. When using both blockconsole and netconsole, I have found netconsole to sometimes create a mess under high message load (sysrq-t, etc.) while blockconsole does not. Most importantly, a number of bugs were identified and fixed that would have been unexplained machine reboots without blockconsole. More highlights: * reasonably small and self-contained code, * some 100+ machine years of runtime, * nice tutorial with a 30-sec guide for the impatient. Special thanks to Borislav Petkov for many improvements and kicking my behind to provide a proper git tree and resend patches. A number of cleanup patches could be folded into the main patch, but I decided not to mess with git history and leave any further mistakes for the world to laugh at: git://git.kernel.org/pub/scm/linux/kernel/git/joern/bcon2.git Joern Engel (8): do_mounts: constify name_to_dev_t parameter add blockconsole version 1.1 printk: add CON_ALLDATA console flag netconsole: use CON_ALLDATA blockconsole: use CON_ALLDATA bcon: add a release work struct bcon: check for hdparm in bcon_tail bcon: remove version 1.0 support Takashi Iwai (1): blockconsole: Allow to pass a device file path to bcon_tail Documentation/block/blockconsole.txt| 94 Documentation/block/blockconsole/bcon_tail | 82 +++ Documentation/block/blockconsole/mkblockconsole | 29 ++ block/partitions/Makefile |1 + block/partitions/blockconsole.c | 22 + block/partitions/check.c|3 + block/partitions/check.h|3 + drivers/block/Kconfig |6 + drivers/block/Makefile |1 + drivers/block/blockconsole.c| 617 +++ drivers/net/netconsole.c|2 +- include/linux/blockconsole.h|7 + include/linux/console.h |1 + include/linux/mount.h |2 +- init/do_mounts.c|2 +- kernel/printk.c |5 +- 16 files changed, 872 insertions(+), 5 deletions(-) create mode 100644 Documentation/block/blockconsole.txt create mode 100755 Documentation/block/blockconsole/bcon_tail create mode 100755 Documentation/block/blockconsole/mkblockconsole create mode 100644 block/partitions/blockconsole.c create mode 100644 drivers/block/blockconsole.c create mode 100644 include/linux/blockconsole.h -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 12/14] blockconsole: Mark a local work struct static
From: Takashi Iwai Signed-off-by: Takashi Iwai --- drivers/block/blockconsole.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/block/blockconsole.c b/drivers/block/blockconsole.c index 40cc96e..01ddbc6 100644 --- a/drivers/block/blockconsole.c +++ b/drivers/block/blockconsole.c @@ -564,7 +564,7 @@ static void bcon_do_add(struct work_struct *work) } } -DECLARE_WORK(bcon_add_work, bcon_do_add); +static DECLARE_WORK(bcon_add_work, bcon_do_add); void bcon_add(const char *name) { -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/