Re: How to wake_up the wait_queue of a socket?
On Mon, 14 Jan 2013 17:50:03 +0800, horseriver said: When one datagram has reached , How to wake_up the wait_queue of that socket ? Please clarify your question - I'm not sure which of the following you mean: 1) How does the kernel wake up the waiting process when a datagram arrives? 2) My kernel is failing to wake up the process, how do I fix it? 3) The kernel is waking the process up, but with high latency and I want to speed it up. 4) I'm trying to wake up a process for some reason when a datagram arrives (in which case, you're probably doing something wrong and we need to discuss what you're trying to achieve) Let us know in more detail what you wanted to know pgpxU_sHrUpPE.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Best way to configure Linux kernel for a machine
On Wed, 16 Jan 2013 17:47:08 +0530, Shraddha Kamat said: I normally do the kernel configuration on my machine like this - * copy the distro configuration file to the kernel dir * make menuconfig (answer Y's/N's/M's) Normally keep return key pressed for default answers * then do the actual kernel compilation Now, I know that this is not a clean way to do the kernel compilation (although it has worked for me for thousands of times that I have compiled and successfully booted up with the kernel - without any issues - whatsoever !) But this time , I am bent upon coming up with a configuration specifically targeted to my machine. What is the best way to do this ? Take your distro kernel, boot it up. Make sure to insert any USB storage, webcams, etc, at least long enough for udev to recognize them and load their driver modules. Then cd to your kernel source tree and 'make localmodconfig'. That will build a stripped-down kernel that only builds those modules that are currently listed in 'lsmod' (which on my laptop is on the order of 1/3 the size of the full Fedora 'allmodconfig'). Which is why it's important you get all the modules probed - if you don't plug in that USB storage, the module won't be loaded, so it won't be in lsmod, and won't be included in your new kernel - at which point you'll use some bad language as you try to debug why it doesn't work. :) Also, see the other reply that points at Greg HK's talk. Also, while creating a initrd image # mkinitrd /boot/initramfs.img 3.8.0-rc3+ -f ERROR: modinfo: could not find module ipt_MASQUERADE ERROR: modinfo: could not find module iptable_nat ERROR: modinfo: could not find module nf_nat ERROR: modinfo: could not find module snd_hda_codec_intelhdmi ERROR: modinfo: could not find module joydev I got the above errors - I know how to resolve these errors , but want to understand why in the first place mkinitrd should complain in the first place ?? Because if the module was for your keyboard or hard drive or video, and you got an unbootable kernel as a result, you *really* want to know at mkinitrd time, not at boot time... :) pgpErktqN8pyb.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: no error thrown with exit(0) in the child process of vfork()
On Fri, 18 Jan 2013 19:59:38 +0530, Niroj Pokhrel said: I have been trying to create a process using vfork(). And both of the child and the parent process execute it in the same address space. So, if I execute exit(0) in the child process, it should throw some error right. Why do you think it should throw an error? Since the execution is happening in child process first and if I release all the resources by using exit(0) in the child process then parent should be deprived of the resources and should throw some errors right ?? No, because those resources that were shared across a fork() or vfork() were in general *multiple references* to the same resource. As an example - imagine a flagpole. You grab it with your hand, you're now holding it. You invite your friend to come over and grab it with his hand - now he's holding it too. But either one of you can let go of the flagpole - and the other one is still holding the flagpole until *they* let go. And the order you let go doesn't matter in this case - which is important because your example code has a race condition Note that there are other cases where the order people let go *does* matter. This is when you start having to worry about locking order and things like that. In the following code, however the process ran fine even though I have exit(0) in the child process #includestdio.h #includestdlib.h #includesys/types.h #includeunistd.h int main() { int val,i=0; val=vfork(); if(val==0) { printf(\nI am a child process.\n); Note that printf() gets interesting due to stdio buffering. You probably want to call setbuf() and guarantee line-buffering of the output if you're playing these sorts of games - the buffering can totally mask a real race condition or other bug. printf( %d ,i++); exit(0); } else { /* race condition here - may want wait() or waitpid() to synchronize? */ printf(\nI am a parent process.\n); printf( %d ,i); } return 0; } // The program is running fine . But as I have read it should throw some error right ?? I don't know what I am missing . Please point out the point I'm missing. Thanking you in advance. You're also missing the fact that after the vfork(), there's no real guarantee of which will run first - which means that the parent can race and output the 'printf(%d,i) *before* the child process gets a chance to do the i++. (Aside - for a while, there was a patch in place that ensured that the child would run first, on the theory that the child would often do something short that the parent was waiting on, so scheduling parent-first would just result in the parent running, blocking to wait, and we end up running the child anyhow before the parent could continue. It broke an *amazing* amount of stuff in userspace because often the child would exit() before the parent was ready to deal with the child process's termination. Usual failure mode was the parent would set a SIGCHLD handler, and wait for the signal which never happened because the SIGCHLD actually fired *before* the handler was set up). (And on non-cache-coherent systems, it's even possible that the i++ happens on a different CPU first, and the CPU running the parent process never becomes aware of it. See 'Documentation/memory-barriers.txt' in the Linux source for more info on how this works for data inside the kernel. This example is out in userspace, so other techniques are required instead to do cross-CPU synchronization. pgp2mx7HAS98g.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: bitops or mutex
On Mon, 21 Jan 2013 19:16:47 +0530, Prashant Shah said: There is a bitmap that needs to be locked across many threads for test / set bit operations. Which one is faster - bitops or mutex ? 1. Bitops : set_bit(5, (long unsigned *)tmp); 2. Mutex : mutex_lock(m); *tmp = (*tmp) | (1 5); mutex_unlock(m); Do you care about faster as in less latency, or less total cycles consumed? The two can be quite different One uses a mutex, the other a spinlock and irq save/restore. faster will depend on the architecture (irqsave is more expensive on some archs than others) and how heavily the lock is contended. If the answer *really* matters, you better go ahead and instrument the code and actually time it and do the statistical analysis. Also, double-check that you don't require *additional* locking. It's pretty rare that the *entire* critical section is exactly one bit-set operation long pgpAdcNNIa5mv.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Can jiffies freeze?
On Tue, 22 Jan 2013 10:29:05 -0800, sandeep kumar said: I am seeing this problem at the very early in the start_kernel-- mm_init-- free_highpages, at that time nothing is up and kernel is running in single thread. If you build a kernel with printk timestamps, you'll see that they all come out like this: [0.00] Initializing cgroup subsys cpuset [0.00] Initializing cgroup subsys cpu [0.00] Linux version 3.8.0-rc3-next-20130117-dirty (val...@turing-police.cc.vt.edu) (gcc version 4.7.2 20121109 (Red Hat 4.7.2-9) (GCC) ) #49 SMP PREEMPT Thu Jan 17 13:25:28 EST 2013 [0.00] Command line: ro root=/dev/mapper/vg_blackice-root log_buf_len=2M vga=893 loglevel=4 threadirqs intel_iommu=off LANG=en_US.UTF-8 [0.00] KERNEL supported cpus: [0.00] Intel GenuineIntel [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009bbff] usable [0.00] BIOS-e820: [mem 0x0009bc00-0x0009] reserved (100 or so more lines with same timestamp) (now we finish memory init) [0.00] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) [0.00] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) [0.00] __ex_table already sorted, skipping sort [0.00] xsave: enabled xstate_bv 0x3, cntxt size 0x240 [0.00] Memory: 4015936k/4718592k available (6266k kernel code, 536744k absent, 165912k reserved, 7260k data, 576k init) (more lines skipped) [0.00] memory used by lock dependency info: 5855 kB [0.00] per task-struct memory footprint: 1920 bytes [0.00] hpet clockevent registered [0.00] tsc: Fast TSC calibration using PIT [0.00] tsc: Detected 2527.012 MHz processor [0.001004] Calibrating delay loop (skipped), value calculated using timer frequency.. 5054.02 BogoMIPS (lpj=2527012) [0.001009] pid_max: default: 32768 minimum: 301 [0.001100] Security Framework initialized It probably simply be that your code is running before the clock is started by the kernel. pgpWqlU6S1t4c.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Can jiffies freeze?
On Tue, 22 Jan 2013 11:32:19 -0800, sandeep kumar said: as you rightly mentioned,cat /proc/kmsg is showing the time stamps, according to that it is 0ms only. But when you see the same with UART there is 2sec delay in showing the next log. i caught this while i m observing the UART logs with Terminaliranicca. Oh, I could believe there's 2 seconds of time used up there that doesn't show in kernel timestamps because the timers aren't started yet. Since i m early in the mm_init, i cant use watchdog to detect it, hrtimers i cant use..i am really thinking how to analyse this delay.. Time for some lateral thinking.. :) Can you give us some specs on the hardware (in particular, the CPU type/speed and how much RAM is installed)? 2 seconds on a 2Ghz CPU is about 4 billion cycles. Also, are you adding any code into the mm_init path? If so, what exactly are you doing? I wonder how early the kernel tracing and profiling stuff is enabled. It may be possible to boot a kernel that has function-call tracing enabled, which would not have timing info, but if you see a function that's being called 500K times that should only be called a dozen times, that's probably your problem :) You'd probably want it with 'init=/bin/bash' and dump the stuff, as running to multiuser will almost certainly roll the buffers and lose the info). pgpHTz8BL1r2i.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Can jiffies freeze?
On Wed, 23 Jan 2013 14:05:25 +0800, bill4carson said: Hmmm, all the boot messages are routed into a buffer it first printed into console, here there is no delay, possible tick timer are not setup yet. But when it does get printed into the console, this process could be interrupted by other action as well, that's where you see a 2sec delay. Unlikely, unless Sandeep is running an actual serial console at a very low speed (which *can* cause fun on large NUMA machines that spew lots of messages). I'm pretty convinced that Sandeep is actually seeing a 2 second delay somewhere near mm_init that isn't reflected in the timestamps because mm_init runs before the clocks are set up. Of course, it may not be mm_init *itself* that's causing the delay - all we *really* know is it's somewhere between a printk in mm_init and the previous printk - there may be something *else* in between that's the actual time sink. Sandeep - I admit not having tried it, but can you see if booting with 'initcall_debug' narrows down where your problem is? If the initcall stuff is running early enough (I'm not sure when it starts relative to mm_init), you'll get a message from each initcall as it is entered end exited. With any luck, that will help narrow down exactly where your problem is. pgpoTc5cSjbfb.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Intercepting a system call
On Fri, 25 Jan 2013 18:58:29 +0530, Paul Davies C said: [1] is the module I wrote for intercepting the system call fork(). Totally skipping over the details of actually doing it - it's usually considered a Bad Idea to hook a system call, and 98% of the time there's a much better way to achieve whatever goal you're trying to accomplish by hooking the syscall. In other words, why are you trying to do that in the first place? pgpiMai3e9Sfj.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: locking spinlocks during copy_to_user, copy_from_user
On Fri, 25 Jan 2013 09:58:42 -0300, Pablo Pessolani said: My question is: Is there any know consequence if I enable preemption before copy_to_user/copy_from user (keeping the spinlock locked) and then disable preemption again after the copy? Well, at that point, you potentially have a spinlock locked during operations that can be preempted, which you noted is not recommended. The generic problem is that while you're spinning, you can get hit with a preempt, which ends up rescheduling or other fun stuff, and the preempting thread ends up calling into the same code - at which point you'll possibly deadlock because the second thread is now blocked on the spinlock that the first thread holds... You're much better off either restructuring your code so you don't do anything that can preempt, or fix your locking in other ways so the problem can't arise. pgpIWp_P0z3Mx.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: GRUB question
On Mon, 28 Jan 2013 06:10:36 +0800, horseriver said: On Mon, Jan 28, 2013 at 12:05:35PM +0530, Mandeep Sandhu wrote: On Mon, Jan 28, 2013 at 2:07 AM, horseriver horseriv...@gmail.com wrote: hi:) Is /boot/initrd.img a root filesystem? what is the filetype of it? Yes, it's a rootfs with minimal stuff needed for booting a workable system. why does this matter. doing 'file /boot/initrd.img' on my system shows its a gzip compressed file. Can I put initrd.img in a floppy to boot system ? I think you can. Provided you have the floppy driver compiled into your kernel. And assuming the initrd fits on a floppy (which is actually unlikely - even without any kernel modules on it, the initrd to get LVM launched comes in at around 8M. A default Fedora initramfs is closer to 20M. Good luck fitting that on a floppy :) Of course, an initrd on floppy is kind of silly, because you still need to find someplace else to fit the actual kernel - which hasn't fit on a floppy for quite some time. Thanks! Does this /boot/initrd.img file come out when building kernel ? how to build it? Your system should have either 'mkinitrd' or 'dracut' to build the initrd image. Some older systems will have 'mkinitramfs'. pgpcUUj4GCG2s.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: thread concurrent file operation
On Tue, 29 Jan 2013 16:56:02 +0100, Tobias Boege said: Look some lines above: struct fd f = fdget(fd); That creates a reference, not a lock. It basically assures that the system doesn't reap and reclaim that fd out from under the code. (In other words, it's managing lifetime, not concurrency). pgpedFhCsKnu5.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: thread concurrent file operation
On Tue, 29 Jan 2013 18:25:19 +0100, Karaoui mohamed lamine said: This function is supposed to return the file reference, does do the locking? Refcounting only, no locking provided by fdget. It seems that i can't find the lock instruction( with all those rcu instructions, i am little lost), can you guide me throught ? Because it isn't there. Concurrent writes can happen - that's why lockf() exists, so that multiple programs that want to scribble on the same file can do their locking. pgp1kBIaAjp4k.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Kernel Config for Chromium Browser?
On Thu, 31 Jan 2013 16:15:45 +0100, Martin Kepplinger said: I stripped down my .config for my kernel-compilation a bit, but thought that I really just removed unnecessary stuff. But really, the consequency was, that the Chromium Browser didn't load _any_ page. Not even locally and no chrome:// page. It started, but just stayed at a white page. I didn't change the system whatsoever. I know it was the kernel. Does anyone by chance know what parameter caused that behaviour? What does chromium do differently? different than firefox. There's too many possibilities to count, actually. It's probably possible to debug it and figure out *which* thing you missed, but this is probably a lot faster and more accurate: 1) Boot your distro kernel, which is probably an 'allmodconfig' and will end up loading a whole pile of modules. 2) Insert all your USB memory sticks, webcams, disk drives, and other peripherals, at least long enough for udev to see them and load their respective device drivers. 3) At this point, 'lsmod' should list pretty much every module you actually use during normal use. 4) cd to wherever you have your kernel source tree, and 'make localmodconfig'. This will take the output of 'lsmod' and customize the kernel for you. 5) Then proceed to make/make install/reboot and enjoy. :) Note that in step 4, it *is* possible to miss a kernel module that you may need in the future (that's why I said to insert all the peripherals, so their modules get included). It will usually show up as something like You add a new rule/option to iptables and it doesn't work or similar. At that point, you just have to go enable that missing option. (It's possible to strip down a distro kernel a *lot* - comparing the current Fedora Rawhide kernel with the one I have booted now: [~] grep '=[ym]' /boot/config-3.8.0-0.rc5.git1.1.fc19.x86_64 | wc -l 3741 [~] grep '=y' /boot/config-3.8.0-0.rc5.git1.1.fc19.x86_64 | wc -l 1490 [~] grep '=m' /boot/config-3.8.0-0.rc5.git1.1.fc19.x86_64 | wc -l 2251 [~] grep '=[ym]' /boot/config-3.8.0-rc3-next-20130117 | wc -l 1209 [~] grep '=y' /boot/config-3.8.0-rc3-next-20130117 | wc -l 924 [~] grep '=m' /boot/config-3.8.0-rc3-next-20130117 | wc -l 285 And I could get that 1209 down to well under 900 - there's a few parts of the kernel (iptables, crypto, and some filesystems) that I mostly just build just to give it build/test coverage. Yes, it builds 3 times faster than the Fedora kernel. ;) pgpyhOeya_qRg.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Android Kernel Compilation
On Thu, 31 Jan 2013 18:24:01 +0100, Matthias Brugger said: 2013/1/30 Rahul Gandhi rahul.rahulg...@gmail.com: I am trying to compile Kernel for my Android device. I am using the NDK Toolchain (arm-linux-androideabi-4.4.3). When I use the defconfig, the kernel compiles without any errors but when I flash it onto my device, it either gets stuck on the HTC logo or continuously reboots. If I pull the config.gz from my device, it gives errors at the tome of compilation. What could have possibly gone wrong? first of all, check the kernel logs. that will give you a clue where to start digging. If it hangs on the HTC logo or reboots, his kernel isn't living long enough for userspace to retrieve the dmesg buffer. First thing I'd try is a combo of the 3 kernel parameters 'earlyprintk', 'ignore_loglevel', 'initcall_debug' and either serial console or netconsole. Though it's quite possible that he's dying before even that infrastructure can give a hint, in which case it gets a lot trickier (and will probably require some help from the hardware platform in the form of either a JTAG interface or enough infrastructure to use kgdb or similar tool...) pgpXFGR0hz73p.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: kernel driver vs userspace program
On Thu, 31 Jan 2013 13:38:07 -0500, Simon said: Hi guys, I'm building an electrical device which will be controlled by computer. It will have an embedded microcontroller and will use USB to communicate with the PC. I believe this calls automatically for a device driver, correct? And for using the machine from the PC, interacting with it, that calls for a userspace program, correct? I mean, doing things differently, such as all in userspace or all in-kernel, would be bad form, right? My *first* reaction would be do it almost all in userspace and use libusb to talk to the device from userspace. Unless there's weird wonkyness or quirks that have to be handled by a kernel module. My question is in case the machine is used in an industrial context where there is really only one usage and one kind of interaction that follows a pre-determined procedure (therefore totally automated). There's no reason that an embedded system can't fire up a /sbin/init that isn't a standard 'init' but is a program to do the process control needed - in fact, most no-MMU and many embedded systems do that. This could give extremely high priority of execution, I guess. First, see if you're able to meet the timing constraints from a regular userspace before worrying about going the RT and/or kernel route. A lot of embedded controllers are amazingly fast and may not need any extra assistance to make the timing issues. Similar to a factory robot controlled by a computer. Would it make more sense to have everything in kernel space, while the userspace (if any) would only serve the purpose of reporting? Especially in the embedded world, there really isn't one right answer. You'll have to do some trial-and-error to see what balance of userspace versus kernel is the proper fit for your application. But in general, you want to try to keep it in userspace (where things are more protected in case of a stray pointer, etc) if at all possible. pgpRNMIUNyRBb.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: open image file
On Tue, 05 Feb 2013 04:59:42 +0800, horseriver said: hi: It is not a cpio archive , so that command can not work . its file system type is tmpfs. Umm. No. It's not tmpfs. tmpfs is a specific ram/swap based filesystem - basically, take enough 4K pages for the size= parameter and do it in memory. Major user-visible difference from the older 'ramfs' is that tmpfs pages can move to swap space, and ramfs pages are nailed down in RAM. mount -t tmpfs /dev/loop0 /mnt This never actually looks at /dev/loop0 *at all*. You could even say this: mount -t tmpfs none /mnt and it would work just fine. Try leaving the '-t tmpfs' off entirely and let the mount command figure out what type it is, and see if that works any better for you. pgpXXmU9SP6cB.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Process exit codes
On Tue, 05 Feb 2013 14:07:37 +0100, Grzegorz Dwornicki said: I guess that there may be a better API that why this thread was created in first place. My project goal is to make process checkpoints like cryopid had. This is for my thesis and will be GPL for everyone after my graduaction. I am researching this subject at this point. In that case, you want to go look at the checkpoint/restart patches that are already in the kernel, and in process. Hint - it's a *lot* harder to do this right than a thesis project (unless you want to only do a very restricted subset, like no open files, no TCP connections, etc). In fact, a lot of the 'namespace' stuff was added to help support C/R. For instance, the PID namespace is there to deal with the fact that if you checkpoint a process with PID 23974, you need to be able to guarantee that it gets 23974 on restart (as otherwise you hit problems with getpid() and kill() not referring to the process you though it did). Of course, this majorly sucks if that PID is already in use. The solution there is to spawn a new, empty PID namespace to guarantee that number is available... https://ckpt.wiki.kernel.org/index.php/Main_Page is a good place to start. pgp3emuj6hcV5.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: pr_info not printing message in /var/log/messages
On Wed, 06 Feb 2013 04:43:20 +0800, Jimmy Pan said: in fact, i've been always wondering what is the relationship between dmesg and /var/log/message. they diverse a lot... What ends up in /var/log/message is some subset (possibly 100%, possibly 0%) of what's in dmesg. Where your syslog daemon routes stuff is a local config issue - if your syslogd supports it, there's no reason not to dump the iptables messages in to /var/log/firewall and the rest of it in /var/log/kernel, or any other policy that makes sense for the sysadmin pgpY5Nuperpaq.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: When does the /dev/sda1 node comes into being ?
On Wed, 06 Feb 2013 01:26:44 +0800, horseriver said: During booting period .every device will have a node at /dev/ folder. what is the detail of ths procedure? 'man udev'. Although the details are a tad murkier for kernels after 2.6.32 that include CONFIG_DEVTMPFS in the config. Also, note that not all systems will have a /dev/sda1 - that assumes a partition table on a particular type of disk handled by a specific device driver. If that disk has no recognizable partition table, it will just have a /dev/sda entry. If the disk is driven by a different driver, you'll see /dev/hda entries instead. And if your boot storage device is an SD card or something, you may have /dev/mmc0 or other entries. And this: [~] ls -l /dev/sdre1 brw-rw 1 root disk 133, 385 2012-11-27 05:17 /dev/sdre1 is how I pay the rent. :) (Bonus points if you can figure out why my system reports that. :) pgpUy9q6LoalV.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: hard disk dirver
On Wed, 06 Feb 2013 02:53:11 +0800, horseriver said: At booting time ,bootloader loads kernel from hard disk too memory. During this period,does it need hd driver's support . Think for a bit - at that point, the hd driver hasn't been loaded yet, so it *can't* need the hd driver's support. So instead, the bootloader has a very dumb simplistic driver that's stripped down (for instance, no queued command support, one I/O in flight at a time, very little error handling, read operations only , etc etc) that's just enough to load the kernel and initrd. (There's often also a very stupid filesystem driver, just enough to read files. So for instance 'grub' can find the files for the kernel and initrd. Some bootloaders are too stupid for even that, and you have to run a special program to tell the boot loader where all the blocks of the file are (I'm looking at you, LILO :) Once the bootloader gets the kernel and initrd loaded, *then* the kernel can initialize the production driver with all the bells and whistles needed. pgpKqjycozpqZ.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: hard disk dirver
On Wed, 06 Feb 2013 05:37:41 +0800, horseriver said: After grub load kernel and initrd , it get around root filesystem mounting , but failed with no finding root device ,from which kernel and initrd have been located . ls /dev/ ; there is no disk device node . Why? Any number of possible reasons, anything from an improperly configured kernel, to a misbuilt initrd, to a root= parameter that points someplace broken, to pgployuzG7RlR.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: hard disk dirver
On Wed, 06 Feb 2013 12:30:37 +0800, horseriver said: root = ? You mean the aasignment at grub command line ? For instance, the grub entry for the kernel I'm running right now: title 3.8.0-rc6-next-20130206 kernel /vmlinuz-3.8.0-rc6-next-20130206 ro root=/dev/mapper/vg_blackice-root log_buf_len=2M vga=893 loglevel=4 thre adirqs intel_iommu=off LANG=en_US.UTF-8 initrd /initramfs-3.8.0-rc6-next-20130206.img (Strictly speaking, it's not the grub command line, it's the kernel command line that is passed to the kernel by grub or lilo or grub2 or syslinux or whatever boot loader happens to float your boat). pgpBVQ01Bs78a.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: hard disk dirver
On Wed, 06 Feb 2013 13:21:17 +0800, horseriver said: At booting stage,kernel need to detect the hard device before mount it, does this work need pci's surport? That depends. Is the controller for the hard drive a PCI-based controller? On most x86-based boxes, it is (and I'm not sure it's even possible to build an x86 kernel that doesn't have PCI as a =y in the config). However, very old units may still have ISA based disk controllers, and other archs may have other I/O buses. At loading stage ,boot loader need to move binaries from hard disk partition to ram,does this work need pci's surport? Same as above. pgpRaVaMPFRw6.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Creating scheduler
On Wed, 06 Feb 2013 23:19:26 +0530, jeshkumar...@gmail.com said: Can anyone suggest a good tutorial to create our own scheduler ? Doing an I/O scheduler is pretty trivial, and there's a number of examples in-tree already to look at. If you mean a CPU scheduler, the major reason why there's no tutorial is because writing a non-toy scheduler is *hard*, and by and large anybody who's a good enough kernel hacker to write a working scheduler doesn't need a tutorial. Why is it hard? Lots of reasons. Even on a single-core, single-thread CPU, it's hard to go a good job of picking the next task to run, mostly because tasks are so damned good at changing behavior. You decide that it would be good to run an I/O bound task, so you pick a task that went into an I/O wait its last 12 times on the CPU - at which point the task turns around and goes CPU bound crunching all the data it read in the last 12 times. :) You also have interactions with thermal issues and frequency governors (usually, cranking to highest frequency and doing race-to-idle and then dropping to lowest freq results in the lowest total energy use, but especially for high-density applications, there may be a upper limit on watts per second that you can cool, resulting in trade-offs being needed). Then there's cache affinity issues, balancing load across cores on multi-socket systems, etc etc etc... pgp7cA2WQLDsB.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: hard disk dirver
On Wed, 06 Feb 2013 13:20:13 -0500, Greg Freemyer said: Most new MB's have a SATA controller directly on the MB connected directly to either the North or South bridge (I don't know which). I don't think any PCI is support needed to talk to the boot disk. Yes, but said SATA controller and north/southbridge are usually emulating a PCI: % lspci 00:00.0 Host bridge: Intel Corporation Mobile 4 Series Chipset Memory Controller Hub (rev 07) 00:01.0 PCI bridge: Intel Corporation Mobile 4 Series Chipset PCI Express Graphics Port (rev 07) 00:1f.2 RAID bus controller: Intel Corporation 82801 Mobile SATA Controller [RAID mode] (rev 03) So no actual PCI slots involved there, but that PCI bridge is going to require PCI support to program all the BARs and other stuff to talk to that 82801. pgpdsBLl341BU.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Creating scheduler
On Wed, 06 Feb 2013 20:40:47 +0100, Jonathan Neuschäfer said: I'm sorry to ask, but don't you rather mean watts than watts per second? There may indeed be a second order time component involved - for instance, a cooling system that can handle 10 watts continuously, 20 watts for up to 30 seconds, or 40 watts for 10 seconds max. And of course, 40 watts steady for 10 seconds is different from averaging 40 watts but bouncing between 30 and 50 watts for 10 seconds etc etc.. And of course, there's usually a per-system limit, and per-chip limits, and your power/cooling budget constraints may force you to go for a higher value on one to make the budget for the other (burn an extra 0.5 watts in chip A in order to get Chip B under 0.87 watts type stuff) pgp7UJ6odCI2Y.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: hd controller
On Thu, 07 Feb 2013 16:19:33 +0800, horseriver said: hi:) I am curious about how hd controller work . When user am reaing/writing hd ,it was implemented by sending command to hd controller's special port.Then ,how does the controller know a new command has received? In this procedure , what work does the hd driver do ? You may wish to get a copy of 'Linux Device Drivers, 3rd Edition' and read it before posting lots of questions here. A free version is available online, and last I checked it was the very first hit if you google for 'Linux device drivers. pgp6PLoWn0vWf.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: pr_info not printing message in /var/log/messages
On Thu, 07 Feb 2013 23:20:27 +0530, anish kumar said: Other insteresting standard logs managed by syslog are /var/log/auth.log, /var/log/mail.log. Other interesting *common* logs, as shipped pre-configured by some distros. They are hardly a standard (unless the definitions of these managed to sneak into Posix or the LSB or similar while I wasn't looking). pgprdhDTWCP68.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: hd controller
On Fri, 08 Feb 2013 07:48:39 +0800, Peter Teoh said: So the drivers just literally concatenate these command into a string and send it over to the device. The reason that good disk drivers are hard to write is because it isn't *just* literally concatenating the commands - it also has to do memory management (make sure that everybody's data ends up in the right buffers), command queue management, elevator management (if there's multiple I/O requests pending from userspace, what order do we issue them in?), error recovery, power management, and a ton of other stuff... pgpLQscp6zy4D.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: MAX limit of file descriptor
On Sat, 09 Feb 2013 13:10:47 +0800, horseriver said: In one process ,what is the max number of opening file descriptor ? Can it be set to infinite ? In network programing ,what is the essential for the maximum of connections dealed per second In general, you'll find that number of file descriptors isn't what ends up killing you for high-performance network programming. What usually gets you are things like syn floods (either intentional ddos or getting slashdotted), because each time you do an accept() on an incoming connection you end up using userspace resources to handle the connection. So the *real* question becomes how many times per second is your box able to fork() off an httpd, do all the processing required, and close the connection? A secondary gotcha is that dying TCP connections end up stuck in FIN-WAIT and FIN-WAIT-2, And if you're trying to drive multiple 10G interfaces at line speed, it gets even more fun. Fortunately, for my application (high performance disk servers) the connections are mostly persistent, so it's only a problem of getting disks to move data that fast. :) pgpnFt8n4QjYM.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: MAX limit of file descriptor
On Mon, 11 Feb 2013 06:07:38 +0800, horseriver said: Actually , my question comes from network performance ,I want to know ,in per second ,the maximum of tcp connections that can be dealed with by my server. That will be *highly* dependent on what your server code does with each connection. A hello world reply and close socket will, of course, go lots faster than something that has to go contact an enterprise-scale database, do 3 SQL joins, and format the results. How can I do the test and calculate the connection number , Is it possible that my server can deal with 10k tcp connections per second? 10K/sec peaks can be achieved even on a laptop, assuming a dummy do-nothing service. Keeping that sustained for a real application will depend on the service time needed - if you have 20 CPUs in the box, and spread the load across all 20, you have to average under 2ms to service each request, which will be a killer if you have to go to disk at all for a request. At that point, the guys at Foundry will be more than happy to sell you a load-balancer so you can have a stack of 10 20-CPU servers each of which only handles 1K/sec and thus has a 20ms time budget. what is the relationship between this and throughput rate? Lots of tiny connections will totally suck at aggregate throughput, if for no other reason than TCP slow-start never gets a chance to really open the transmit window up. But in general, there is always a trade-off between transaction rate and throughput. Is there document that tells the best optimization of this ? best is defined by what your application actually needs. The best settings for my NFS server will be totally different than what the HTTP server 12 racks over needs... pgpwRsaM1hmdj.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: start address of the code segment of the program on x86-64
On Thu, 14 Feb 2013 15:33:48 +0200, Kevin Wilson said: Hi, 0x08048000 address is the start address of the code segment of a program in on x86-32. More likely, it was the start address of *one particular run* of the program. In most kernel configurations, there's something called Address Space Layout Randomization (ASLR) that makes the code land at different places each time, to make it harder to write exploits because you can't hardcode addresses. What is the start address of the code segment of the program on x86-64 ? Is there a place in the kernel code where I can add a printk on a x86_64 machine to view the code segment start ? How can it be done ? cat /proc/self/smapsand ponder for a while. Try it twice and compare and see if you can see what ASLR does. You may also want to think about *why* you want to know where the code segment starts. If you know what this address is, what do you plan to use it for? (In other words, there's probably a different, easier way to do whatever it is you're trying to accomplish here)... pgpJpJpeA0Yfq.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: using prefetch
On Fri, 15 Feb 2013 12:16:02 +0200, Kevin Wilson said: Is the prefetch operation synchronous ? I mean, after calling it, are we gauranteed that the variable is indeed in the cache ? No, the whole *point* is that it's asynchronous. You issue the prefetch several lines of code before you need it to be in cache, so that you can get several lines of hopefully not data-dependent code to run while the cache line fetch happens, rather than take a stall when you reference the variable. The prefetch may in fact not complete in time, but at worst you end up just stalling for a cache miss the same as you would have otherwise. According to this logic, anywhere that we want to call skb_shinfo(skb) we better do a prefetch before. No, because most references to skb will be cache-hot because you're in the middle of the IP stack, which touches the skb struct all over the place, and therefor it's probably in L2 already. In fact, if we prefetch any variable that we want to use then we end up with performance boost. Nope. Not as true as you might think. If you play around with the 'perf' command you'll find out that on modern processors you'll see a 98% or so hit rate on the L2 cache - so 98% of the time you'll *waste* a cycle issuing the opcode needlessly. If you look carefully at some of the other structs in the net/ subtree, you'll see where they've put variables together so that once you reference one field of the struct, all/most of the needed stuff gets sucked in on the same cache line. That's probably more productive than trying to add prefetch calls all over the place. So - any hints, what are the guidlines for using prefetch()? Only use it if you have good reason to believe that you *will* need that variable (in other words, it's not in the unlikely half of an if statement or somehting) *and* there's a good chance that the variable/memory is cache-cold. pgpTNSEkPagr9.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: process 0 (swapper)
On Sat, 16 Feb 2013 18:48:52 +0200, Kevin Wilson said: ~0U is not 0 but -1; -ENOCAFFEINE. You'd think that after having done kernel-level C programming since the days of SunOS 3.1.5 and BSD 4.2 I'd k know better. ;) pgpBxhyWvlc2R.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Tracing SIGKILL, is that possible?
On Mon, 18 Feb 2013 15:46:58 -0300, Daniel. said: Is there a way to track signals, specially SIGKILL. I would like to know if some process dies because reach some resource limit, because an OMM error or something likewise.. Depends on where you want the tracking to go. But your first thing to try would probably be: echo 1 /proc/sys/kernel/print-fatal-signals which controls this code in kernel/signal.c: static void print_fatal_signal(int signr) { struct pt_regs *regs = signal_pt_regs(); printk(%s/%d: potentially unexpected fatal signal %d.\n, current-comm, task_pid_nr(current), signr); Bahh. That's missing a KERN_INFO. Patch submitted. pgpRnyvNs8Uih.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Linux Kernel
On Tue, 19 Feb 2013 10:50:26 +0530, kapil agrawal said: How the linux kernel runs in the system after spawning the init and mounting the root FS. Does it run as some background process ? No. You probably want to get some basic knowledge about operating systems in general. http://en.wikipedia.org/wiki/Operating_system as you appear to be confused regarding the basic concepts of an operating systems kernel. How it serves the system calls etc. ? There's about 5 different answers to that, depending on how in-depth you want the details, but I suspect that none of them will make any sense to you until you get a better grasp on the basics pgp4q_gcdem4X.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Linux Kernel
On Tue, 19 Feb 2013 12:01:55 +0530, kapil agrawal said: Do you mean process with PID 0 is the one, which runs in the background and serves the request from userland and goes to cpu_idle() if nothing to run. No. Large parts of the kernel run in kernel mode, but using the 'struct task' of the related userspace process (in particular, most system calls work this way). Other large chunks borrow the 'struct task' and run under it just so there's *a* process running. And parts aren't in process context at all, but interrupt context (so they aren't running as process code at all). pgpN6fkyAc3yG.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: SIGKILL and a sleeping kernel module
On Tue, 19 Feb 2013 10:37:28 +0200, Kevin Wilson said: Hi all, I am trying to send a SIGKILL to a kernel module which is sleeping. I added a printk after the sleep command. Sending a SIGLKILL (by kill -9 SIGLKILL pidOfKernelThread) does **not** yield the message from printk(calling do_exit\n); which is immediately after the msleep() command, as I expected. Others have mentioned the various types of sleeping in the kernel, but overlooked a minor detail. If a task is in the kernel in a non-interruptible state, signals are queued and delivered once that status is cleared (which often doesn't happen until a syscall is about to return to userspace). The reason this detail is important for would-be kernel hackers: If one kernel thread manages to BUG() or oops() or otherwise die or wedge up while holding a lock, other processes can end up blocking while waiting for the lock. The problem is that the other processes are usually in non-interruptible state when they try to take the lock. The end result is that you end up with processes that are blocked in the kernel, and you can't kill -9 them - you're basically stuck with them until you reboot. This is why your system will often limp along and slowly become more and more wedged up after a BUG(). Also - the fact that /bin/ps shows a D or S does *not* in fact mean the process is in a sleep state inside the kernel. That's *usually* the case, but it's quite possible for the code to be actively executing and burning lots of CPU (often because it's stuck in a loop that's failing to make forward progress). The result there is that ps shows a D/S but your CPU starts getting *very* warm pgpmgSVJyYlVo.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: cpu_relax(), rep: nop, and PAUSE
On Wed, 20 Feb 2013 01:58:17 +0700, Mulyadi Santosa said: On Tue, Feb 19, 2013 at 7:20 PM, David Shwatrz dshwa...@gmail.com wrote: Hi, kernel newbies, We have: #define cpu_relax() asm volatile(rep; nop) in arch/x86/boot/boot.h. Why don't we use the PAUSE assembler instruction here ? Just guessing, maybe rep+nop could do better power saving because processor is considered as idle. The 'rep; nop' is actually a placeholder - for some CPUs, a different opcode gets filled in during boot time. See arch/x86/kernel/alternative.c and arch/x86/include/asm/alternative.h for the gory details. pgpN9SahkMrpX.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: unsubscibe
On Thu, 21 Feb 2013 15:57:46 +0530, Sandeep Sonawane said: Please remove my email id sandeep.sonaw...@gmail.com from this DL. If your mail software supported RFC2369 mail headers, you would have seen the following on every posting to the list: List-id: Learn about the Linux kernel kernelnewbies.kernelnewbies.org List-unsubscribe: http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies, mailto:kernelnewbies-requ...@kernelnewbies.org?subject=unsubscribe List-archive: http://lists.kernelnewbies.org/pipermail/kernelnewbies List-post: mailto:kernelnewbies@kernelnewbies.org List-help: mailto:kernelnewbies-requ...@kernelnewbies.org?subject=help List-subscribe: http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies Nice clickable links. pgpaw4LF9xl3S.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Sending an IP packet
On Fri, 22 Feb 2013 14:36:17 +0200, Adel Qodmani said: My question is quite simple, I have an sk_buff that I want to transmit, the sk_buff is an ICMP message and so far, I've built the headers and set up everything. Others have given some details on how. A better question is why. Sending an ICMP message without the rest of the IP stack's knowledge is usually a bad idea, because it can cause the remote end's concept of network state to become desynchronized with the local concept. As a quick example, consider a spurious 'host/port unreachable' sent to the remote end - many IP stacks will use that info to abort a TCP 3-packet handshake. However, the rest of *your* end thinks the connection is still trying to establish. So what are you trying to accomplish by sending a forged ICMP packet from within the kernel? There may be better ways to approach it (for example, if you're trying to say this port is closed, a better way is to use iptables with a '-j REJECT --reject-with ', which will (a) do all the heavy lifting of sending the ICMP for you and (b) also prevent the packet from making it to the rest of the local IP stack... pgpdYjQiHPn5d.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Sending an IP packet
On Fri, 22 Feb 2013 17:15:35 +0200, you said: I am trying to implement a new protocol that we've designed which works on top of the IP layer, so I am using ICMP messages to carry control information for the protocol. Why using ICMP, it seemed natural since our protocol is a Network-layer protocol and ICMP is a control messages protocol. In that case, you *really* want to go look at how TCP and SCTP and other protocols handle ICMP integration. You want an API that integrates your ICMP handling with the rest of the protocol stack, because otherwise you'll end up with an unmaintainable mess. Also, it will be about 436 times easier to extend your protocol to work correctly over IPv6. :) Go look at net/ipv4/udp.c, functions __udp4_lib_err() and __udp_lib_rcv(), particularly the latter's use of icmp_send(). You'll want to extend icmp_send() to handle your additional control information. pgpcNK60D0dRC.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: atomic operations
On Sun, 24 Feb 2013 11:50:14 +0100, richard -rw- weinberger said: On Sun, Feb 24, 2013 at 10:42 AM, Shraddha Kamat sh200...@gmail.com wrote: what is the relation between atomic operations and memory alignment ? I read from UTLK that an unaligned memory access is not atomic please explain me , I am not able to get the relationship between memory alignment and atomicity of the operation. Not all CPUs support unaligned memory access, such an access may cause a fault which needs to be fixed by the kernel... There's a more subtle issue - an unaligned access can be split across a cache line boundary, requiring 2 separate memory accesses to do the read or write. This can result in CPU A fetching the first half of the variable, CPU B updating both halves, and then A fetching the second half of the now updated variable.. This can bite you even on CPUs that support unaligned accesses. pgpaOFflPKynw.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: barrier()
On Mon, 25 Feb 2013 12:26:06 +0530, Shraddha Kamat said: #define barrier() asm volatile( ::: memory) What exactly volatile( ::: memory) doing here ? You probably should read Documentation/memory-barriers.txt in your kernel source tree, and let us know if you still have questions after that... pgpYDJoYxqUZ5.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: general_protection result to die
On Tue, 26 Feb 2013 06:23:34 +0800, horseriver said: does general_protection trap necessarily result to die ? Think for a bit - what other actions can reasonably be taken? You hit a GPF, it's obvious that the variables you're working on have been corrupted, so automatically continuing is probably a Really Bad Idea. If there's a debugger involved (gdb/kgdb), you can hand it to the (presumed) person running the debugger and let *them* figure out what to do, but that's about the only other realistic option. pgpCOpw6oSLE8.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: How to measure the RAM read/write performance
On Tue, 26 Feb 2013 22:35:35 +0700, Mulyadi Santosa said: let' see what if you do read and write pattern, in certain order so that it will be invalidated by the L1/L2/L3 cache everytime? AFAIK, one thing for sure, reading data from sequentially and re-read them will make end up reading cache in the 2nd operation and so on. I think the most certain way to do it is to read data (or write) data bigger than total L1/L2/L3 cache. Of you could just download a copy of memtest+ and run that - I think that provides some timing info in addition to actually testing your memory. pgpXhrIunMesq.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Kernel freeze when writing e1000 driver
On Tue, 26 Feb 2013 11:19:18 -0500, Phani Vadrevu said: I am writing a network driver for the e1000 card. While doing the receive part, I saw that the kernel freezes whenever it reaches the netif_rx(skb) call. I was able to reproduce the same error when using a bare bones driver where I hard codde the skb data. There's a known-working driver in the kernel source tree for this device already. Start by looking at what data it's placed in the skb when it calls that routine, and how it differs from what you filled in. For bonus points - lose the 'unsigned char t[]' array and replace it with a bunch of explicit 'skb-foo = bar' statements. In particular, that assures that you haven't missed a 0x15 27 bytes into the array, or failed to allow alignment padding bytes. pgpCfhqrKUauP.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: How to measure the RAM read/write performance
On Wed, 27 Feb 2013 15:38:00 +0530, sandeep kumar said: In development phase of the board, we are trying to measure RAM performance gain while changing type of the RAM. The standard benchmark tools are giving us the Cache performance only. So we want to try some method to measure RAM performance. The fact that you can't measure the effect of RAM speed because the L1/2/3 cache masks the effect should tell you something :) If you are seeing a 98% hit rate or so, RAM speed will indeed not matter much. If you're seeing a poor cache hit ratio, you're most likely to get better performance not by changing the RAM, but changing the application to improve its cache usage. And of course, if the application's design is one that is resistant to improved cache hit ratios, it is important you measure RAM performance *with that application running*, not a benchmark. This is because if your application is managing to thrash the cache, the resulting RAM access patterns will be *highly* sensitive to actual program behavior, and any corner cases in the hardware may or may not be hit by the benchmark the same way the application does. pgpZ2odHm7mss.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: [ARM_LINUX] ioremap() allowing to map system memory...
On Fri, 01 Mar 2013 16:48:12 +0530, sandeep kumar said: Don't you think it should throw panic()while calling the ioremap() itself. Because this sounds like a serious violation... As you noted, it does give you a warning. That's a kernel design philosophy - to reserve the panic() and BUG() calls for cases where it is *known* that proceeding further is unsafe or impossible. So the kernel does a panic() if it can't start /sbin/init at system boot-up - because without that, further progress is impossible. But once the system is up, we don't panic if PID 1 goes away - because it's possible that the user has an open window, and can su and at least do an orderly shutdown. Similarly, if a device driver gets confused, the driver code may do a BUG_ON() and end up locking up that device because to do anything else may scramble the disk further. But we don't panic() because that will basically wedge the system - and the user loses any chance at dumping the dmesg buffer for debugging or other attempts at an orderly shutdown (in particular, panic() won't sync the filesystems. So even though a BUG() often kills a thread while it holds an important lock, which often leads to the system eventually deadlocking one process at a time, it's still a net win if it doesn't panic but lets the user at least try to run sync. And even BUG_ON() is frowned upon if further progress in a degraded mode is possible (for instance, a networking error that totally locks up one TCP connection, but other connections are still working) - at that point, warn() is the correct thing to do. As in this case - it *is* a serious violation, but the kernel (a) can at least possibly keep going and (b) it's at least possible that the user can recover from it. There's a *very* good chance that if the kernel just does a warn(), the user will say *facepalm* Stupid typo in the address, fix the typo, and re-try with the correct address. So that's the design philosophy of why it gives you a warning rather than a panic. pgp__OqYMSY9W.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: [filesystem] struct of m_inode
On Sat, 02 Mar 2013 10:28:26 +0800, lx said: if (block = 7+512+512*512) because the i_zone[9]. But the question is why the i_zone[7] can repesent 512 , and i_zone[8] can repesent 512*512 ? Sngle, double, and triple indirect blocks... http://en.wikipedia.org/wiki/Inode_pointer_structure pgpk8_k_LWk9q.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Why vmlinux.bin are changed from raw image to elf for x86 ?
On Sat, 02 Mar 2013 16:36:43 +0800, Jacky said: The -O binary is removed. And I don't find any changelog. A quick course on researching kernel development history... Step 1: 'git blame arch/x86/boot/compressed/Maekfile' That gives us the line: 099e1377 (Ian Campbell2008-02-13 20:54:58 + 42) OBJCOPYFLAGS_vmlinux.bin := -R .comment -S (Fortunately, this is the commit we wanted - figuring out how to get git to trace through the history if a subsequent commit had touched this line is left as an exercise for the reader :) Step 2: 'git log 099e1377' gives us this: commit 099e1377269a47ed30a00ee131001988e5bcaa9c Author: Ian Campbell i...@hellion.org.uk Date: Wed Feb 13 20:54:58 200 x86: use ELF format in compressed images. Signed-off-by: Ian Campbell i...@hellion.org.uk Cc: Ian Campbell i...@hellion.org.uk Cc: Jeremy Fitzhardinge jer...@goop.org Cc: virtualizat...@lists.linux-foundation.org Cc: H. Peter Anvin h...@zytor.com Cc: Jeremy Fitzhardinge jer...@goop.org Cc: virtualizat...@lists.linux-foundation.org Signed-off-by: Ingo Molnar mi...@elte.hu Signed-off-by: Thomas Gleixner t...@linutronix.de The one-liner summary matches exactly with what we're interested in, so it's quite likely the commit we care about. Step 3: That's a pretty damned sparse Changelog. Fortunately, that's enough to feed to Google, and in about 25 seconds, I find this message: http://www.gossamer-threads.com/lists/linux/kernel/902407 [PATCHv3 1/3] x86: use ELF format in compressed images. This allows other boot loaders such as the Xen domain builder the opportunity to extract the ELF file. So there's the complete patch, including the things it touched besides the Makefile, plus the reason for doing it. Have a nice day.. ;) pgpYs4RS7JMvT.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: how to trace tcp protocol stack ?
On Sun, 03 Mar 2013 12:13:51 +0800, ishare said: Is there mothod to look up the call stack of tcp protocol solution? ftrace and related functionality. Note that there is a difference between look up the call stack and trace the flow of execution. Consider the following code: int a ( print(a) } ; int b { print(b) } ; int c ( a(); b(); }; int d { c(); b() }; If you print the call stack in a(), you'll get a c d. If you trace the flow, you get d c a b b (plus some returns scattered in between. The difference is subtle, but often important. If you're trying to figure out how it works, you probably want to trace the flow. If you're trying to figure out how the code *got* to function foobar(), you're looking at a stack trace. Also, being familiar with the RFCs that define TCP is helpful. In particular, the Linux TCP stack will make close to zero sense unless you're familiar with the state machine defined in RFC793. pgp_r03CrIkmH.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Module compilation error
On Tue, 05 Mar 2013 09:07:51 +0700, Mulyadi Santosa said: On Tue, Mar 5, 2013 at 7:48 AM, Pietro Paolini pulsarpie...@aol.com wrote: echo 2 Run 'make oldconfig make prepare' on kernel src try the suggested above step. IIRC, those commands will do things like preparing the neccessary object files, headers and so on, so it is ready for you to be used on your kernel programming. In addition, note that 'make oldconfig' followed by 'make prepare' will only do the right thing and result in a usable module if the source tree matches your running kernel. Doing 'make prepare' on a 3.7.2 source tree and then building a module against it will result in a module that loads in a 3.7.2 kernel with the same .config - but a different .config and/or release will have anything from a module that simply won't load to one that blows up the system for mysterious reasons. It's *highly* recommended that you first learn how to build, install, and boot a self-compiled kernel (and remember to keep your distro kernel around), and then once you got that down, *then* start building external modules against it. pgpi8vzeqnXOo.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: pthread_lock
On Tue, 05 Mar 2013 11:02:45 +0530, Mandeep Sandhu said: next schedule. I think the waiting threads (processes) will moved from the wait queue to the run queue from where they will be scheduled to run. For bonus points, read source code and/or comments and figure out what Linux does to prevent the 'thundering herd' problem (consider 100 threads all waiting on the same mutex - if you blindly wake all 100 up, you'll schedule them all, the first will find the mutex available and then re-take it, and then the next 99 will get run only to find it contended and go back to sleep. So figure out what Linux does in that case. :) pgp0soR4TXUvS.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Query on skb buffer
On Wed, 06 Mar 2013 10:39:13 -0800, Kumar amit mehta said: Now, if alloc_skb(4096, GFP_KERNEL) is the routine that gets called to allocate the kernel buffer then, how does the kernel manages such prospective memory allocation failures and how kernel manages large packet requests from the application. Did you actually look at the source for use of alloc_skb() and how it handles error returns? (Hint - the kernel doesn't do the same thing at every use of alloc_skb(), because an allocation failure needs to be handled differently depending on where it happens. At some places, just bailing out and dropping the packet on the floor without any notification to anybody is appropriate. At other places, we need to propagate an error condition to the caller). Typical pattern (from net/core/sock.c:) /* * Allocate a skb from the socket's send buffer. */ struct sk_buff *sock_wmalloc(struct sock *sk, unsigned long size, int force, gfp_t priority) { if (force || atomic_read(sk-sk_wmem_alloc) sk-sk_sndbuf) { struct sk_buff *skb = alloc_skb(size, priority); if (skb) { skb_set_owner_w(skb, sk); return skb; } } return NULL; } EXPORT_SYMBOL(sock_wmalloc); and then the caller does something like this (net/ipv4/ip_output.c, in function __ip_append_data(): } else { skb = NULL; if (atomic_read(sk-sk_wmem_alloc) = 2 * sk-sk_sndbuf) skb = sock_wmalloc(sk, alloclen + hh_len + 15, 1, sk-sk_allocation); if (unlikely(skb == NULL)) err = -ENOBUFS; else /* only the initial fragment is time stamped */ cork-tx_flags = 0; } if (skb == NULL) goto error; pgpERJxEr0q7W.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Several unrelated beginner questions.
On Wed, 06 Mar 2013 18:19:09 -0500, Konstantin Kowalski said: 1.) Currently, I am reading 2 books about Linux kernel: Linux Device Drivers (3rd edition) and Linux Kernel Development (3rd edition). I like both books and I am learning a lot from them. I heard that both of this books are outdated, but so far all the information in this books seems valid and applicable. Is there better books you would recommend? They're both still mostly applicable. The concepts listed are still valid - certain things need to be locked at certain times, things have lifetimes, and so on. The outdated is mostly places where the API has changed slightly - for instance, where api_foo(struct bar *a, struct baz *b) is now api_quux(struct bar *a, struct baz *b, int blat). So you can't cut-n-paste the code and expect it to still work. 2.) In Linux Device Drivers, it states that module_exit(function) is discarded if module is built directly into kernel or if kernel is compiled with option to disallow loadable modules. But what if the module still has to do something during shutdown? Releasing memory is unimportant since it does not persist over reboot, but what if the module has to write something to a disk file, or do some other action? If your module has allocated 128M for a graphics buffer, you'll think releasing memory is important. :) Strictly speaking, a module *should* have already been quiesced and taken care of business before module_exit() is called - there shouldn't be much of anything left to do at that point. (Hint - this is exactly the same question as why is an empty -release() function considered a Bad Thing - it's because release() and similar are supposed to do the clean-up before the module exits) 3.) What's the deal with different kernel versions? I heard back in the 2.x days, even kernels were stable and odd versions were experimental, but with 2.6 it changed. So with 3.x kernels, are all of them experimental in the beginning and stable in the end? Also, with 3.x new versions seem to be released more often than in 2.1-2.5 days. Did the release cycle get smaller or is it just my imagination? Also, what does rc number mean? The 3.x series is exactly the same policy as 2.6 was - Linus just decided that 2.6.42 was too much and reset the counter, and he's been holding to pretty close to every three months for releases for all that time. And 2.1 got up to 2.1.142 or something insane like that in fewer years than it took 2.6 to get to .42, so it isn't like releases are more frequent these days :) 4.) Currently, I am running linux-next, and it works great. Am I correct Lucky you. I manage to break at least 2-3 things in linux-next per release cycle. ;) to assume that linux-next is supposed to have newest, shiniest and most unstable features? `uname -a` says that I am still running 3.8-next, but there is already 3.9 out. So which version is more experimental and least stable? Which one is the newest? Do another pull of the linux-next tree, it will say you're on 3.9-rc1-next now. And even when it said 3.8-next, that was already 3.8 plus all the patches queued for 3.9. Now that Linus's tree is at 3.9-rc1, (closing the merge window for major additions for 3.9) people will be dumping 3.10 material into the linux-next tree. 5.) How exactly does make/.config work? When I run `make oldconfig`, does it use the everything from the previous .config and only ask how to configure new features? Yes, that's what *should* happen. And when I run `make` does it re-use old object files if nothing was changed in the specific file, or does it re-compile everything from scratch? Try it and see. :) Note that sometimes, an apparently innocuous config change can result in the rebuild of lots of files. This is because some commonly used .h file has a #ifdef CONFIG_FOO in it - and when you change FOO, then everybody that includes that .h (even indirectly) ends up rebuilding. But in general, if you touch only 1 or 2 .c files and no widely used .h files, you'll just have to rebuild those .c's if they're modules. If they're kernel builtins, there's another 10 or 12 things that have to happen, but it's still a lot faster than a full rebuild. pgpmLCIVkUM6W.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: zap_low_mappings
On Thu, 07 Mar 2013 10:33:18 +0800, ishare said: kernel halts because the page mapping has been modified by zap_low_mappings why we should do zap_low_mappings in init procedure ? this will disorder the page mapping. You might want to get yourself an up to date kernel, as the code you're asking about was removed almost 2 1/.2 years ago. zap_low_mappings was removed in October 2010 by this commit: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/arch/x86/mm/init_32.c?id=b40827fa7268fda8a62490728a61c2856f33830b x86-32, mm: Add an initial page table for core bootstrapping This patch adds an initial page table with low mappings used exclusively for booting APs/resuming after ACPI suspend/machine restart. After this, there's no need to add low mappings to swapper_pg_dir and zap them later or create own swsusp PGD page solely for ACPI sleep needs - we have initial_page_table for that. Signed-off-by: Borislav Petkov b...@alien8.de LKML-Reference:20101020070526.ga9...@liondog.tnic Signed-off-by: H. Peter Anvin h...@linux.intel.com pgpmZjAlzdolh.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: zap_low_mappings
On Thu, 07 Mar 2013 11:43:43 +0800, ishare said: set_pgd(swapper_pg_dir+i, __pgd(0)); If I have not define CONFIG_X86_PAE ,then the low mem will be invalided all . And what makes you think that call invalidates *all* the page mappings? pgpwj47K1DPrA.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Disabling interrupts and masking interrupts
On Thu, 07 Mar 2013 17:17:19 +0200, Kevin Wilson said: Does this mean that once you are disabling interrupts, these interrupts are lost ? even later, when we will enable interrupts, the interrupts from the past that should have been created (but interrupts were disabled at that time interval) are in fact lost? Level-triggered interruots will go off once interrupts are re-enabled, assuming that the device has kept the level set and not given up and timed out. Edge-trittered interrupts are gone. That's part of why most hardware doesn't use edge triggers - it's just too hard to guarantee proper device driver operation. Also, in common usage, disabled interrupts means that you're not listening to *any* interrupts, while masked means we're not listening to *this* interrupt source, even if we *are* accepting interrupts from other sources. The difference is that sometimes the CPU is doing stuff that it would be potentially screwed if *any* interrupt happened, so we disable them. Other times we're busy inside a device driver, and we're in a critical section for that device - but it's safe for other devices to interrupt. So to improve latency we mask off just the one interrupt not all of them. pgpceuS7ZHI2D.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Disabling interrupts and masking interrupts
On Thu, 07 Mar 2013 09:28:58 -0800, Dave Hylands said: In my experience, edges triggered interrupts are always latched by the HW when they arrive. If another edge comes along between the initial edge and the time that the interrupt is cleared, then this second edge is lost. The fact that an interrupt is pending will still be retained though, and as soon as interrupts are enabled, then the interrupt handler will fire. Actually, what you're describing there is hardware that converts edge triggered to level triggered precisely because edge triggered stuff sucks otherwise. ;) Also, in common usage, disabled interrupts means that you're not listening to *any* interrupts, while masked means we're not listening to *this* interrupt source, even if we *are* accepting interrupts from other sources. Normally disabling interrupts is just another form of masking, it just happens to mask all of the interrupts rather than one particular one. Even when you disable interrupts, you typically still have access to the unmasked interrupt state. Yes, but it;'s still useful to distinguish between the two cases. Also, on many hardware architectures, the actual code to 'disable all' and 'disable one' is very different (on X86, 'cli' does all very fast, ignoring exactly one takes some more doing) pgpagp8PH3VR0.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Userspace interception of locally valid memory location
On Sun, 10 Mar 2013 19:27:52 +0530, harish badrinath said: Is it possible to intercept (both read and write) a locally valid address of a process and replace it with our own values (it is for a transparent distributed shared memory project). Go look at how gdb traces variables. pgpWt5_GYt4_l.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Userspace interception of locally valid memory location
On Sun, 10 Mar 2013 19:27:52 +0530, harish badrinath said: Hello, Is it possible to intercept (both read and write) a locally valid address of a process and replace it with our own values (it is for a transparent distributed shared memory project). (Damn, hit send too soon) Go look at how gdb traces variables. Note that method pretty much only works for writes to a variable, and has some performance implications. Tracing reads is more difficult, and will probably end up being dependent on exactly how good the hardware debugging support is - the S/390 architecture has had the Program Event Recording feature since the 70s, and recent x86 chipsets have had similar features - details such as how many tracepoints you can have active, how much memory each one can cover, and whether you can intercept an event before it completes will be dependent on the arch and CPU - what's true for a old Pentium4 won't be true for an i7, and ARM is a whole different beast. pgp5YJ78qG_Tx.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: 64bit MMIO access
On Sun, 10 Mar 2013 20:35:37 +0100, Jagath Weerasinghe said: readq and writeq do the job. Please double-check how those are implemented on your architecture. I seem to remember that on some systems, readq and writeq may not be atomic and may become two bus cycles. And some hardware cares about that. It was a discussion on linux-kernel maybe a year or so ago... pgpacFTwAj9hl.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: 64bit MMIO access
On Sun, 10 Mar 2013 20:35:37 +0100, Jagath Weerasinghe said: Hi, readq and writeq do the job. (hit send too soon) Also, the read/write [bwlq] functions refer to the width of the *data*, not the address pgpRFJDCHascC.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: User space memory
On Tue, 12 Mar 2013 18:38:05 +0530, Prabhu nath said: On Sun, Mar 10, 2013 at 11:30 PM, Christoph Seitz c.se...@tu-bs.de wrote: I use a char device for reading and writing from/to a pcie dma card. Especially the read function makes me some headache. The user allocates some memory with posix_memalign and call the read function on the device, so that the devices knows where to write to. My driver now uses get_user_pages() to pin the user pages. The memory has never been written or read by the user, so it's not yet in the RAM, right? And get_user_pages returns a valid number of pages, but for every page the same struct. (respectively the same pointer). Is there any way to ensure that the user pages are in the ram and get_user_pages returns a valid page array? If you know the RAM physical address range you can figure out by doing the following *page_to_pfn(page_ptr) 12*; where page_ptr is a struct page * returned by get_user_pages(). * page_to_pfn()* will return the pfn of the corresponding page frame and left shifting by 12 bits will give you page frame base address. Unfortunately, that doesn't actually tell you what Christoph was worried about - is the page *currently* in RAM? For that, you need to check some bits in the pfn once you find it. Also, note the following: It's not always 12, because not everything uses a 4K page - consider hugepage support, or Power and Itanium where the pages are bigger and often several different sizes are supported. There's an API for the current page size. Use it. :) Also, there's an API for pinning pages so they *stay* in RAM so you can target them for I/O. Use that. ;) pgpNyoHnvwGNr.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: User space memory
On Tue, 12 Mar 2013 15:03:53 +0100, Christoph Seitz said: I found out, if I use the force flag with get_user_pages, the pages get faulted, but there has to be a nicer way than using the force flag. Why does there have to be a nicer way? Maybe you already got the nice way. (Hint - why does the force flag even exist? :) pgplpaBBPv2se.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: block mailing list?
On Fri, 15 Mar 2013 06:22:14 -0700, Raymond Jennings said: Is there a kernel list dedicated to discussion of block devices? What's to discuss? There probably isn't enough ongoing traffic to support a separate mailing list (we got too many of them as it is :) MAINTAINERS says: BLOCK LAYER M: Jens Axboe ax...@kernel.dk T: git git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git S: Maintained F: block/ Which basically boils down to ask linux-kernel and/or kernelnewbies if you don't understand, and cc: Jens if you managed to break it :) pgpvl_k99Fd8e.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: signals handling: kill() successful, but nothing delivered
On Mon, 18 Mar 2013 06:50:25 +0100, mic...@michaelblizek.twilightparadox.com said: Hi! On 06:52 Fri 08 Mar , mic...@michaelblizek.twilightparadox.com wrote: ./a.out `ps a|grep wget|grep -v grep To save the double grep, you can do something like this: ps a | grep '[w]get' | ... Figuring out why that works is left as an exercise for the reader... pgpJxOfEdOhxH.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: programme header
On Tue, 19 Mar 2013 09:38:49 +0800, ishare said: I am linking my kernel by a link script. its contens is as below: I think it will work ,but ld report that No enough room for programme header,what is the reason? what should I do ? The first thing you do is ask yourself why you're using a link script of your own, when most architectures come with a working link script already. The second thing you do is *read the script* - there's a big copmment in there: /* This linker script is used both with -r and with -shared. For the layouts to match, we need to skip more than enough space for the dynamic symbol table et al. If this amount is insufficient, ld -shared will barf. Just increase it here. */ Hope that helps. pgpNxO3hcuU3n.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: programme header
On Tue, 19 Mar 2013 12:44:36 +0800, ishare said: because I need to generate a .so for sysenter used And that solves what problem for you, exactly? Consider that most architectures that use sysenter manage to do so without having to worry about a .so for it (or if they really do need one, they already create a .so). So what problem are you trying to solve by using a .so? pgplXadb05EFh.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Memory allocations in linux for processes
On Tue, 19 Mar 2013 20:41:55 +0530, Niroj Pokhrel said: #includestdio.h int main() { while(1) { } return 0; } I don't understand where does mmap or malloc come in to play in this code. Unless you linked it statically, a lot of stuff happens before you ever get to main() - namely, any shared library linking and mapping. Run strace on your binary and see how many system calls happen before you hit the infinite loop. pgpjT3sQt6KU0.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: kernel build error
On Wed, 20 Mar 2013 00:07:57 -0700, Kumar amit mehta said: I forgot that 'uname -m' will return me the kernel version and _not_ the CPU architecture. The CPU on my machine seem to be 64 bit (/proc/cpuinfo|grep flags shows 'lm'). So my understanding is that I've a 32 bit kernel running on a 64 bit machine. Or more correctly, you have a kernel actually running in 32-bit mode on a machine that is 64-bit capable. pgppq034nqkqM.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: BFQ: simple elevator
On Thu, 21 Mar 2013 02:24:23 +0700, Mulyadi Santosa said: pardon me for any possible sillyness, but what happen if there are incoming I/O operation at very nearby sectors (or perhaps at the same sector?)? I suppose, the elevator will prioritize them first over the rest? (i.e starving will happen...) And this, my friends, is why elevators aren't as easy to do as the average undergrad might hope - it's a lot harder to balance fairness and throughput across all the corner cases than you might think. It gets really fun when you have (for example) a 'find' command moving the heads all over the disk while another process is trying to do large amounts of streaming I/O. And then you'll get some idiot process that insists on doing the occasional fsync() or syncfs() call. Yes, it's almost always *all* corner cases, it's very rare (unless you're an embedded system like a Tivo) that all your I/O is one flavor that is easily handled by a simple elevator. pgpwhjtDXJzNR.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: BFQ: simple elevator
On Wed, 20 Mar 2013 14:41:31 -0700, Raymond Jennings said: Suppose you have requests at sectors 1, 4, 5, and 6 You dispatch sectors 1, 4, and 5, leaving the head parked at 5 and the direction as ascending. But suddenly, just before you get a chance to dispatch for sector 6, sector 4 gets busy again. I'm not proposing going back to sector 4. It's behind us and (as you indicated) we could starve sector 6 indefinitely. So instead, because sector 4 is on the wrong side of our present head position, it is ignored and we keep marching forward, and then we hit sector 6 and dispatch it. Once we hit sector 6 and dispatch it, we do a u-turn and start descending. That's when we pick up sector 4 again. The problem is that not all seeks are created equal. Consider the requests are at 1, 4, 5, and 199343245. If as we're servicing 5, another request for 4 comes in, we may well be *much* better off doing a short seek to 4 and then one long seek to the boonies, rather than 2 long seeks. My laptop has a 160G Western Digital drive in it (WD1600BJKT). The minimum track-to-track seek time is 2ms, the average time is 12ms, and the maximum is probably on the order of 36ms. So by replacing 2 max-length seeks with a track-to-track seek and 1 max-length, you can almost half the delay waiting for seeks (38ms versus 72ms). (And even better if the target block is logically before the current one, but still on the same track, so you only take a rotational latency hit and no seek hit. (The maximum is not given in the spec sheets, but is almost always 3 times the average - for a discussion of the math behind that, and a lot of other issues, see: http://pages.cs.wisc.edu/~remzi/OSFEP/file-disks.pdf And of course, this interacts in very mysterious ways with the firmware on the drive, which can do its own re-ordering of I/O requests and/or manage the use of the disk's onboard read/write cache - this is why command queueing is useful for throughput, because if the disk has the option of re-ordering 32 requests, it can do more than if it only has 1 or 2 requests in the queue. Of course, very deep command queues have their own issues - most notably that at some point you need to use barriers or something to ensure that the metadata writes aren't being re-ordered into a pattern that could cause corruption if the disk lost its mind before completing all the writes... In my case I'm just concerned with raw total system throughput. See the above discussion. pgpldeK4nVfHY.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Linux elevators (Re: BFQ: simple elevator)
On Wed, 20 Mar 2013 16:05:09 -0700, Arlie Stephens said: The ongoing thread reminds me of a simple question I've had since I first read about linux' mutiple I/O schedulers. Why is the choice of I/O scheduler global to the whole kernel, rather than per-device or similar? They aren't global to the kernel. On my laptop: # find /sys/devices/pci* -name 'scheduler' | xargs grep . /sys/devices/pci:00/:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/queue/scheduler:noop deadline [cfq] /sys/devices/pci:00/:00:1f.2/ata2/host1/target1:0:0/1:0:0:0/block/sr0/queue/scheduler:noop deadline [cfq] # echo noop | /sys/devices/pci:00/:00:1f.2/ata2/host1/target1:0:0/1:0:0:0/block/sr0/queue/schedule # find /sys/devices/pci* -name 'scheduler' | xargs grep . /sys/devices/pci:00/:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/queue/scheduler:noop deadline [cfq] /sys/devices/pci:00/:00:1f.2/ata2/host1/target1:0:0/1:0:0:0/block/sr0/queue/scheduler:[noop] deadline cfq I just changed the scheduler for the CD-ROM. pgp0_KZpObd65.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: BFQ: simple elevator
On Wed, 20 Mar 2013 16:37:41 -0700, Raymond Jennings said: Hmm...Maybe a hybrid approach that allows a finite number of reverse seeks, or as I suspect deadline does a finite delay before abandoning the close stuff to march to the boonies. Maybe. Maybe not. It's going to depend on the workload - look how many times we've had to tweak something as obvious as cache writeback to get it to behave for corner cases. You'll think you got the algorithm right, and then the next guy to test-drive it will do something only 5% different and ends up cratering the disk. :) Now of course, the flip side of a disk's average seek time is between 5ms and 12ms depending how much you paid for it is that there's no spinning disk on the planet that can do much more than 200 seeks per second (oh, and before you knee-jerk and say SSD to the rescue, that's got its own issues). Right now, you should be thinking so *that* is why xfs and ext4 do extents - so we can keep file I/O as sequential as possible with as few seeks as possible. Other things you start doing if you want *real* throughput: you start looking at striped and parallel filesystems, self-defragmenting filesystems, multipath-capable disk controllers, and other stuff like that to spread the I/O across lots of disks fronted by lots of servers. Lots as in hundreds. As in imagine 2 racks, each with 10 4U shelves with 60 drives per shelf, with some beefy DDN or NetApp E-series heads in front, talking to a dozen or so servers in front of it with multiple 10GE and Infiniband links to client machines. In other words, if you're *serious* about throughput, you're gonna need a lot more than just a better elevator. (For the record, a big chunk of my day job is maintaining several several petabytes of storage for HPC users, where moving data at 3 gigabytes/second is considered sluggish...) pgp66ugnedhkF.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: optimization in kernel compile
On Fri, 22 Mar 2013 13:41:25 +0800, ishare said: Is it needed or must to compile fs and driver with -O2 option when compiling kernel ? It's not strictly mandatory to use -O2 (for a while, -Os was the default). There are a few places that for correctness, you *cannot* use -O0. For instance, a few places where we use builtin_return_address() inside an inline (-O0 won't inline so builtin_return_address() ends up returning a pointer to a function when we want the function's parent). Since gdb and friends are able to deal with -O2 compiled code just fine, there's really no reason *not* to optimize the kernel. pgpXTcS9iUOyU.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: optimization in kernel compile
On Fri, 22 Mar 2013 22:32:40 +0800, ishare said: are a few places that for correctness, you *cannot* use -O0. For instance, a few places where we use builtin_return_address() inside an inline (-O0 won't inline so builtin_return_address() ends up returning a pointer to a function when we want the function's parent). So it will cause an error ? Yes, there are places where failing to optimize causes errors. Consider this code: static inline foo (return builtin_return_address()); int bar ( x = foo()); If you don't optimize, x ends up with a pointer into bar. If it gets inlined because you're optimizing, x ends up pointing to bar's caller. This breaks stuff like the function tracer. Since gdb and friends are able to deal with -O2 compiled code just fine, there's really no reason *not* to optimize the kernel. the debug information will be stripped by -O2 ,for example ,you can not touch No debug information is stripped by -O2. Debug information isn't emitted if you don't compile with -g. At one time, long ago (quite possibly literally before you were born for some of the younger readers on the list), gcc was unable to generate -g output if the optimizer was invoked. But that was last century (gcc 2.95 era). the value of some varibles at stack , and debugging will not run line by line, instead , the source jump in unexpectable order . I'm probably going to piss a bunch of people off by saying this, but: If your C skills aren't up to debugging code that's been compiled with -O2, maybe you shouldn't be poking around inside the kernel. pgp2l_ihNO3sv.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: optimization in kernel compile
On Fri, 22 Mar 2013 10:52:56 -0400, valdis.kletni...@vt.edu said: No debug information is stripped by -O2. Debug information isn't emitted if you don't compile with -g. At one time, long ago (quite possibly literally before you were born for some of the younger readers on the list), gcc was unable to generate -g output if the optimizer was invoked. But that was last century (gcc 2.95 era). GCC 4.8 was officially released today (since I sent the previous note). From the release notes: A new general optimization level, -Og, has been introduced. It addresses the need for fast compilation and a superior debugging experience while providing a reasonable level of runtime performance. Overall experience for development should be better than the default optimization level -O0. The current Linus tree does build with 4.8. I do *not* know if earlier releases build correctly (or how far back), nor if -Og is sufficient optimization to allow correct kernel functioning. But it's something to look at. pgpvT83zc9Q26.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: BFQ: simple elevator
On Fri, 22 Mar 2013 13:53:45 -0700, Raymond Jennings said: The first heap would be synchronous requests such as reads and syncs that someone in userspace is blocking on. The second is background I/O like writeback and readahead. The same distinction that CFQ completely makes. Again, this may or may not be a win, depending on the exact workload. If you are about to block on a userspace read, it may make sense to go ahead and tack a readahead on the request for free - at 100MB/sec transfer and 10ms seeks, reading 1M costs the same as a seek. If you read 2M ahead and save 3 seeks later, you're willing. Of course, the *real* problem here is that how much readahead to actually do needs help from the VFS and filesystem levels - if there's only 600K more data before the end of the current file extent, doing more than 600K of read-ahead is a loss. Meanwhile, over on the write side of the fence, unless a program is specifically using O_DIRECT, userspace writes will get dropped into the cache and become writeback requests later on. So the vast majority of writes will usually be writebacks rather than syncronous writes. So in many cases, it's unclear how much performance CFQ gets from making the distinction (and I'm positive that given a sufficient supply of pizza and caffeine, I could cook up a realistic scenario where CFQ's behavior makes things worse)... Did I mention this stuff is tricky? :) pgpwattaBMg37.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: initramfs.cpio
On Sun, 24 Mar 2013 16:27:27 +0800, ishare said: Hi : I find that my initramfs_data.cpio generated by gcc does not contain init files ,which should be executed by terminal initialization. My initramfs_data.cpio only contains these : /dev /dev/consol /root . where to search the init file? You want to use mkinitramfs or mkinird or dracut or whatever your distro calls it to create one. gcc has no *clue* what needs to go in the initramfs (for example, if your root file system is on a LVM partition on a LUKS-encrypted disk, the initramfs has to do a 'cryptsetup openLuks' and then an 'lvm varyon' before the mount will succeed). Plus any modprobes that may or may not be needed, etc etc etc. pgpXw2SDf5840.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: cap on writeback?
On Mon, 25 Mar 2013 16:33:48 -0700, Raymond Jennings said: Just curious, is there a cap on how much data can be in writeback at the same time? I'm asking because I have over a gigabyte of data in dirty, but during flush, only about 60k or so is in writeback at any one time. Only a gigabyte? :) (I've got a box across the hall that has 2.6T of RAM, and yes, it's pretty sad when it decides it's time for writeback across an NFS or GPFS mount, even though it's a 10GE connection.) For the record, writeback is one of those things that's really hard to get right, because there's always corner cases. Probably why we seem to end up screwing around with it every 2-3 releases. :) pgp12EHSoPSAH.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: cap on writeback?
On Mon, 25 Mar 2013 17:23:40 -0700, Raymond Jennings said: Is there some sort of mechanism that throttles the size of the writeback pool? There's a lot of tunables in /proc/sys/vm - everything from drop_caches to swappiness to vfs_cache_pressure. Note that they all interact in mystical and hard-to-understand ways. ;) it's somewhat related to my brainfuck queue, since I would like to stress test it digesting a huge pile of outbound data and seeing if it can make writeback less seeky. The biggest challenge here is that there's a bit of a layering violation to be resolved - when the VM is choosing what pages get written out first, it really has no clue where on disk the pages are going. Consider a 16M file that's fragged into 16 1M extents - they'll almost certainly hit the writeback queue in logical block order, not physical address order. The only really good choices here are to either allow the writeback queue to get deep enough that an elevator can do something useful (if you only have 2-3 IOs queued, you can do less than if you have 20-30 of them you can sort into some useful order), and filesystems that are less prone to fragmentation issues Just for the record, most of my high-performance stuff runs best with the noop scheduler - when you're striping I/O across several hundred disks, the last thing you want is some some single-minded disk scheduler re-arranging the I/Os and creating latency issues for your striping. Might want to think about why there's lots of man-hours spent doing new filesystems and stuff like zcache and kernel shared memory, but the only IO schedulers in tree are noop, deadline, and cfq :) pgpkHyXSB4htP.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: initramfs_list
On Wed, 27 Mar 2013 20:38:54 +0800, ishare said: I am do some test on kernel 2.6.0 and encountering an problem about initramfs . I find my initramfs generated without a initramfs_list file ,which describes the list of files that will be created into the initramfs file . such as /sbin/init /etc ... What the kernel tree creates by default is a very small stub, basically only what's needed to make sure that *some* sort of initramfs gets created so the kernel doesn't panic on a stray pointer trying to access an uninitialized file system. To create something that will actually boot your system, please see 'man mkinitramfs' or 'man mkinitd' or 'man dracut' or similar for whatever tool your actual distro uses to build a functional initramfs. pgpE85l_nkRrX.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Creating mkfs for my custom filesystem
On Fri, 29 Mar 2013 15:44:49 +0530, Sankar P said: I have decided on a simple layout for my filesystem where the first block will be the super block and will contain the version information etc. The second block will contain the list of inodes. Third block onwards will be data blocks. Each file can grow only up to a single block size. Thrid block will represent the first file, fourth block for the second file and so on. Directories will not be supported. You *will* have to support at least the top-level directory, because you'll need at least directory entries for . and ... If your second block is list of inodes, then you have directories (and adding subdir support isn't *that* hard). You'll also want either a list or bitmap structure or some other way to determine if a given block is allocated - trying to write a new file without having a freelist to get blocks from is hard. Oh, and don't forget to add locking around the freelist operations and similar things - having two processes both grab block 27 for the file they just created can suck :) oh okay. But how do I create the superblock ? What are the APIs available to do these block level operations from a user space application (my mkfs program ) ? struct foobar_suoper { int version; int num_files; int free_blocks; char padding[512-3*sizeof(int)]; }; struct foobar_super sb; int disk; bzero(sb, sizeof struct foobar_super); sb.version = 1; sb.num_files= 0; sb.free_blocks = 999; /* should probably set to actual size of partition/file */ disk = open(*diskorfilename, ); /* testing on loop mounts is useful */ lseek ( disk, 0); write (disk, sb, sizeof(sb)); /* congrats, you just wrote a superblock */ Yes, it's that simple :) You want to write some empty inodes, add a 'struct inode' variable, initialize it, lseek to were the inode goes and write it out. Just open, lseek, write, close. ;) And yes, those operations *do* work just fine on both files you then use woth 'mount -o loop' and with /dev/sd* or /dev/mapper/* LVM. You might want to look at the source for mkfs.vfat (part of dosfstools package) for additional details. pgpCAJbuKLR1t.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Online migration of arbitrary filesystems, possible?
On Fri, 29 Mar 2013 17:09:14 -0300, Daniel Hilst said: The idea is, mount both filesystems together, and make write/read operations go on this way Read operations: 1. See if data is already on dest fs, 2. If is then read data and bright back to caller (lets call this cold read) 3. If is not, then read file from source fs, put it on page cache, and change the backstorage of that page.. 3.1 So when this page get dirty or too old, it will be writed to Any reason you can't just 'rsync /source-fs /dest-fs'? pgpCeLeHQ48_a.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: why not choose another way to define the _IOC_xxxMASK related to the ioctl
On Sat, 30 Mar 2013 18:01:32 +0800, RS said: Now I think this will spend more time than the kernel code when executed. Have you actually examined the generated code on several popular architectures to see what gcc actually does? (hint - many things can constant-folded at compile time. So if the 3 values are #defined to constants, the expression (_IOC_NRSHIFT _IOC_NRBITS) - _IOC_NRSHIFT) will generate no actual shift or subtract instructions, merely another constant. pgp48fQJlrpug.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: what does it use two !!
On Mon, 01 Apr 2013 15:10:46 +0800, Ben Wu said: 1 I found some placeuse two !!, what's means if(button-gpio != INVALID_GPIO) state = !!((gpio_get_value(button-gpio) ? 1 : 0) ^ button-active_low); else Gaah. That line of code fell out of the ugly tree and hit every branch on the way down. Use of !! *and* ? 1 :0 in the same line of code to do the same thing. Ouch. pgpWnNk2OVRgd.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Reading linux boot args
On Tue, 02 Apr 2013 12:12:09 +0900, manty kuma said: Is there any way i could read the reason for reboot. I want to read it so that i can get the reason that is stored. like 0xABADBABE is watchdog 0xCODEDEAD is panic. Etc.. Please suggest an alternative approach. See the 'pstore' persistent storage filesystem in fs/pstore. That lets you store more than just an int (you can even get it to stash your dmesg buffer). 'more fs/pstore/Kconfig' would be a good place to start. pgpRELPvKNH33.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: cgroup.procs versus tasks (cgroups)
On Tue, 02 Apr 2013 16:46:24 +0300, Kevin Wilson said: Hi, Thanks a lot Vlad. This explains it. - Does anybody know of a ps command (or a filter to ps command) which will display only multithreaded processes (list processes by TGID) ? (I know now about the option of displaying cgroup.procs , but is something parallel can be done with ps ? ) Have you tried 'ps -m' and friends? Though it doesn't do exactly what you wanted and *only* display multithreaded, you need to do some post-processing: $ ps max ... 928 ?- 0:00 /sbin/auditd -n - -Ssl 0:00 - - -Ssl 0:00 - 940 ?- 0:00 /sbin/audispd - -Ssl 0:00 - - -Ssl 0:00 - 951 ?- 0:00 /usr/sbin/abrtd -d -s - -Ss 0:00 - 960 ?- 0:00 /usr/bin/abrt-watch-log -F Backtrace /var/log/Xorg.0.log -- /usr/bin/abrt-dump-xorg -xD - -Ss 0:00 - If there's 2 or more '- -' after the process entry, it's multi=threaded. Note however that as far as the kernel is concerned, a single-threaded process is handled by the code as a multi-threaded that happens to have only one thread at the moment. In other words, thinking that single and multi threaded is different in some mystical way will probably end up causing trouble for you... pgpcu3BQco7sL.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Online migration of arbitrary filesystems, possible?
On Mon, 01 Apr 2013 17:50:43 -0300, Daniel Hilst said: Any reason you can't just 'rsync /source-fs /dest-fs'? because I can't use dest-fs while rsynching Sure you can. You just have to remember to pay attention to race conditions - if you create foo/bar.dat on the dest and then rsync wants to copy over a foo/bar.tar from the source, things will go poorly. However, if you wanted to write to the dest while doing your sync, you'll have that issue no matter *what* method you use to do it. Read operations: 1. See if data is already on dest fs, 2. If is then read data and bright back to caller (lets call this cold read) 3. If is not, then read file from source fs, put it on page cache, and change the backstorage of that page.. 3.1 So when this page get dirty or too old, it will be writed to You may want to look for 'overlayfs' and 'unionfs', which may provide you the function you need. (Note there's several different patchsets calling themselves 'unionfs'). pgpTPJOBUFmxB.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Method to calculate user space thread size
On Wed, 03 Apr 2013 14:03:40 +0530, naveen yadav said: I have code written, and I cannot modify. I want to fix user stack size for all threads in glibc, 'man ulimit'? pgp5JyPP87h7J.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: NMIs, Locks, linked lists, barriers, and rculists, oh my!
On Wed, 03 Apr 2013 19:08:45 -0700, Arlie Stephens said: - I've got a linked list, with entries occassionally added from normal contexts. - Entries are never deleted from the list. This is already busted - you *will* eventually OOM the system this way. This would be simple, except that the routines which READ the list may get called from panic(), or inside an oops, or from an NMI. It's important that they succeed in doing their jobs in that case, even if they managed to interrupt a list addition in progress. At that point, you need to be *really* specific on what the semantics of succeed really are. In particular, do they merely need to succeed at reading a single specific entry, or the entire list? You should also be clear on why panic() and oops need to poke the list (hint - if you're in panic(), you're there because the kernel is already convinced things are too screwed up to continue, so why should you trust your list enough to walk across it during panic()?) 1) Use an rculist, and hope that it really does cover this situation. That's probably safe/sufficient, as long as the read semantics of an RCU-protected region are acceptable. In particular, check the behavior of when an update happens, compared to when it becomes visible to readers. pgpjWkWqN_aAn.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: simple question about the function memcmp in kernel
On Mon, 08 Apr 2013 08:57:01 +0800, Ben Wu said: int memcmp(const void *cs, const void *ct, size_t count) { I want to know why it use the temp pointer su1, su2? why it doesn't directly use the cs and ct pointer? This is a C 101 question, not a kernel question. But anyhow.. They're declared const, so the compiler will whine about ++'ing them. pgp54ftb3974m.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: simple question about the function memcmp in kernel
On Mon, 08 Apr 2013 05:56:29 +0400, Max Filippov said: const is the the object they point to, not the pointers themselves (that would be void * const cs). memcmp compares bytes at which cs and ct point, but these are void pointers, and the expression res = *cs - *ct is thus meaningless. Max is right, and I'm obviously under-caffienated or something. :) pgpBklAZO3h8m.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: How will Linux community support the coming Intel new chip Bay Trail?
On Thu, 11 Apr 2013 23:08:21 +0800, Peter Xu said: Hi, all, It seems that Intel will publish a nice chip called Bay Trail (or plus, I don't quick sure, which is for smartphones/tablets, also some lower ends of laptops in the future). It was said publically that Intel will support Linux platform on that chip. I just want to know something from the communitiy side, that is there a plan or something related to that chip? Over the past few years, Intel has been quite good about contributing drivers for coming chips, often before official numeric part numbers have even been assigned. I don't know about this particular chip, but there's a good chance that there will be in-tree support before the chip officially releases for general use. pgpLdJsgudXSn.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: net_device: limit rate ot tx packets
On Sun, 14 Apr 2013 10:09:54 +0200, mic...@michaelblizek.twilightparadox.com said: This is not what I meant. When the qdisc has a size of say 256KB and the socket memory is, say 128kb, the socket memory limit will be reached before the qdisc limit and the socket will sleep. But when the socket memory limit is greater than the qdisc limit, it will be interesting whether the socket still sleeps or starts dropping packets. How to figure this out for yourself: Look at net/sched/sch_plug.c, which is a pretty simple qdisc (transmit packets until a plug request is recieved, then queue until unplugged). In particular, look at plug_enqueue() to see what happens when q-limit is exceeded, and plug_init() to see where q-limit is set. Then look at the definition of qdisc_reshape_fail() in include/net/sch_generic.h to figure out what the qdisc returns if q-limit is exceeded. Then go look at net/core/dev.c, in function __dev_xmit_skb(), and watch the variable 'rc'. Now go find the caller of __dev_xmit_skb() and see what it does with that return value. Hope that helps... pgpP8jeZUYp64.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: So I want to get some kernel routine called....
On Tue, 16 Apr 2013 16:33:17 -0700, Arlie Stephens said: I have some kernel routine I'd like to get called, with the decision to call it made in user space. The proper answer here is *highly* dependent on exactly what this routine has to do once it's called. Can you explain the problem the routine is trying to solve? Quite often, by the time you get to the i need to call a routine stage, you've stopped seeing the forest for the trees, and stepping back and looking at the actual problem to be solved rather than a proposed solution will provide insight. pgp6HRCIoBp_i.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Building kernel modules with debuginfo and printing line numbers in kernel oops message / coredump
On Fri, 19 Apr 2013 23:55:49 +0530, Sankar P said: myfunctionname +0x2507 +5679 That function is too honking big and needs to be refactored. :) pgp5Kg1q_h7vQ.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: oops in a kernel module
On Sat, 27 Apr 2013 19:34:00 +0300, Kevin Wilson said: Hello, static int __init init_zeromib(void) This is your init routine... { int ret = 0; printk(in %s\n,__func__); Missing KERN_DEBUG or similar here. This can cause it to fail to appear in dmesg output, causing much confusion. #define SNMP_ZERO_STATS(mib, field) this_cpu_add(mib[0]-mibs[field],-(mib[0]-mibs[field])) You *do* realize that this doesn't in fact zero the statistics, right? If you have a 32-core machine, this will zero 1/32 of the statistics. this_cpu_add and friends are there specifically so that on multi-core systems there's a lockless way to update the statistics values - to actually find the values, you need to walk across all the per_cpu areas and sum them up. And why for the love of all that is good did you do this bletcherous thing with this_cpu_add() instead of using 'this_cpu_write(whatever, 0)'? Or at least use this_cpu_sub()? ;) pgpVU13OrNS6Q.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies