Re: I remove the audit and selinux from kernel and start the new kernel in Centos, but i can't bring up the network device, Why?
Hi Sizel, just some possibly pertinent question: On Mon, Oct 13, 2014 at 2:36 PM, wrote: > On Mon, 13 Oct 2014 13:04:08 +0800, sizel said: > > I remove the audit and selinux from kernel and start the new kernel in > > Centos, but i can't bring up the network device, Why? > > "remove"...how do u do it? recompile kernel CONFIG with SELINUX=n and AUDIT=n? or u just followed standard procedures (like below): http://www.crypt.gen.nz/selinux/disable_selinux.html https://www.digitalocean.com/community/tutorials/an-introduction-to-selinux-on-centos-7-part-1-basic-concepts just like this case: https://www.centos.org/forums/viewtopic.php?t=30942 I suspect it is just the improper way of configuring/disabling your SELINUX that you got the errorstry again. > This would be a lot easier to answer if you gave us some actual details: > > 1) Why did you think audit and selinux were the problem? (They probably > aren't). > 2) What *exactly is "the network device"? A wireless card? 1G ethernet? > 10G Ethernet? > INfiniband? Something else? > 3) What configuration are you trying to set up? > 4) What error message, if any, do you get? > 5) Have you ruled out the easy stuff, like trying to start an Ethernet > port with a missing/bad cable? > 6) Why do you think it's a kernel problem and not a userspace problem? > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Removed from eudyptula challenge
On Sat, Sep 20, 2014 at 3:05 AM, Jeshwanth Kumar N K wrote: > Hello, > > Today I was asking some suggestions in IRC for my eudyptula challenge > (indirectly, because working for it for 1 month). So I am removed from the > challenge now. > > So, who all doing the challenge please do everything yourself by reading > the docs, kernel codes or ask little directly. Because, you will feel > really bad after removing from challenge, anyway my mistake, I shouldn't > have break the integrity. > > And my mistake was I thought I am smart in asking questions and nobody > will get doubt :). So don't do that :). > > Does not matter, the aim of the whole thing is to learn, and to learn u either search, read, discuss, or ask. Not to solve the problem as the ultimate aim. Internet has tons of resources to learn true, but we always want to do targetted learning, to solve narrow range of problems, and in the fastest, more efficient way. ASK! And like Newton said, his knowledge is built on top of the shoulders of other giants. So please ask, and help value add to others knowledge, and solving Eudyptula Challenge COMPLETELY DOES NOT ACHIEVE THAT GOAL, because everything is hidden and shrouded in secrecy. Just my 2cts ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: suspend/resume PM criterion for application
On Wed, Sep 17, 2014 at 11:16 AM, Peter Teoh wrote: > > > On Sun, Sep 14, 2014 at 2:11 AM, Ran Shalit wrote: > >> On Thu, Sep 11, 2014 at 12:24 PM, Ran Shalit wrote: >> > On Thu, Sep 11, 2014 at 8:32 AM, AYAN KUMAR HALDER < >> ayankum...@gmail.com> wrote: >> >> On Thu, Sep 11, 2014 at 12:55 AM, wrote: >> >>> On Wed, 10 Sep 2014 21:58:48 +0300, Ran Shalit said: >> >>> >> >>>> 1. How can I make a process to notice this inactivity ? Do you think >> >>>> it can be implemented by some periodic process who check if there is >> >>>> activity ? It returns to the original question I raised, that I will >> >>>> use some periodic process who checks maybe cpu load or something like >> >>>> that. What do you think ? >> >>> >> >>> That's going to depend on your system and what processes are running. >> >>> >> >>> You may have an MP3 player going that doesn't take much CPU at all - >> but >> >>> shutting down because the user hasn't hit a button in 47 minutes will >> probably >> >>> irritate the user no end. Or there may be a screensaver running that >> takes >> >>> twice as much CPU as the MP3 player, but is totally OK on the system >> >>> suspending whenever the rest of the system wants it. >> >>> >> >>> You're going to have to look at your system design, and decide for >> yourself >> >>> what the criteria are. >> >> >> >> Please correct me if my understanding is wrong:- >> >> >> >> I believe that autosuspend feature (for system suspend) is not present >> >> in kernel. I believe that there is no feature in kernel which checks >> >> for system ( cpu, devices) inactivity and suspends the entire system. >> >> System suspend is caused when :- >> >> 1. the user issues a command >> >> 2. The system receives some interrupt or event (lid closing event) >> >> 3. There is an external process which monitors system inactivity and >> >> suspends the system. >> >> >> >> For runtime suspend of a device, I believe it is the driver who has >> >> the complete responsibility to decide when to suspend the device or >> >> resume it. The driver can take this decision on user intervention (eg >> >> when user writes to /sys/devices//power/* ) or when the >> >> driver has completed servicing an interrupt and feels it has nothing >> >> more to do, etc >> > >> > Thanks Vlaid, Ayan, >> > >> > I am a bit yet struggling for couple of days on this PM issue, and I >> > would appreciate your continous advise. >> > The system requirement I have is as following: >> > 1. make everything as automatic as possible , so that there won't be >> > any need to add any userspace application for the matter. >> > 2. wakeup from all relevant wakeup sources >> > 3. should not use sysfs (it should be disabled from kernel) >> > 4. platform is OMAP3530. >> > > a. look into /arch/arm/mach-omap2 of kernel source and grep for "sleep" > and "wakeup" functionality: power management is just managing with the > different frequencies of the the CPU. as far as I can tell, once sleep, > only uart pin can be used for waking upnot sure. > > b. read this: > > > http://e2e.ti.com/support/dsp/omap_applications_processors/f/447/t/30005.aspx > > http://www.ti.com/lit/an/slva310b/slva310b.pdf (read page 2, which > describe the different powerup-sequence of the CPU, "Powering-Up Sequence". > > > c. the technology brand name for omap3530 is "DVFS"search for this > inside the arch/arm kernel source.you can find lots of sample codes > there. > > (don't confuse with another omap CPU brand name "DeepSleep" but is PM for > another type of omap cpu.) > > d. http://www.ti.com/product/omap3530 --> on the right is a DVSDK + > Android source code for 3530grep the codes for the above keywords... > > hopefully it helps? > > at the risk of missing out other files: how about this two files: inside arch/arm/mach-omap2: omap-pm.h omap-pm-noop.c which I think provide a lot of hint for you. ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: suspend/resume PM criterion for application
On Sun, Sep 14, 2014 at 2:11 AM, Ran Shalit wrote: > On Thu, Sep 11, 2014 at 12:24 PM, Ran Shalit wrote: > > On Thu, Sep 11, 2014 at 8:32 AM, AYAN KUMAR HALDER > wrote: > >> On Thu, Sep 11, 2014 at 12:55 AM, wrote: > >>> On Wed, 10 Sep 2014 21:58:48 +0300, Ran Shalit said: > >>> > >>>> 1. How can I make a process to notice this inactivity ? Do you think > >>>> it can be implemented by some periodic process who check if there is > >>>> activity ? It returns to the original question I raised, that I will > >>>> use some periodic process who checks maybe cpu load or something like > >>>> that. What do you think ? > >>> > >>> That's going to depend on your system and what processes are running. > >>> > >>> You may have an MP3 player going that doesn't take much CPU at all - > but > >>> shutting down because the user hasn't hit a button in 47 minutes will > probably > >>> irritate the user no end. Or there may be a screensaver running that > takes > >>> twice as much CPU as the MP3 player, but is totally OK on the system > >>> suspending whenever the rest of the system wants it. > >>> > >>> You're going to have to look at your system design, and decide for > yourself > >>> what the criteria are. > >> > >> Please correct me if my understanding is wrong:- > >> > >> I believe that autosuspend feature (for system suspend) is not present > >> in kernel. I believe that there is no feature in kernel which checks > >> for system ( cpu, devices) inactivity and suspends the entire system. > >> System suspend is caused when :- > >> 1. the user issues a command > >> 2. The system receives some interrupt or event (lid closing event) > >> 3. There is an external process which monitors system inactivity and > >> suspends the system. > >> > >> For runtime suspend of a device, I believe it is the driver who has > >> the complete responsibility to decide when to suspend the device or > >> resume it. The driver can take this decision on user intervention (eg > >> when user writes to /sys/devices//power/* ) or when the > >> driver has completed servicing an interrupt and feels it has nothing > >> more to do, etc > > > > Thanks Vlaid, Ayan, > > > > I am a bit yet struggling for couple of days on this PM issue, and I > > would appreciate your continous advise. > > The system requirement I have is as following: > > 1. make everything as automatic as possible , so that there won't be > > any need to add any userspace application for the matter. > > 2. wakeup from all relevant wakeup sources > > 3. should not use sysfs (it should be disabled from kernel) > > 4. platform is OMAP3530. > a. look into /arch/arm/mach-omap2 of kernel source and grep for "sleep" and "wakeup" functionality: power management is just managing with the different frequencies of the the CPU. as far as I can tell, once sleep, only uart pin can be used for waking upnot sure. b. read this: http://e2e.ti.com/support/dsp/omap_applications_processors/f/447/t/30005.aspx http://www.ti.com/lit/an/slva310b/slva310b.pdf (read page 2, which describe the different powerup-sequence of the CPU, "Powering-Up Sequence". c. the technology brand name for omap3530 is "DVFS"search for this inside the arch/arm kernel source.you can find lots of sample codes there. (don't confuse with another omap CPU brand name "DeepSleep" but is PM for another type of omap cpu.) d. http://www.ti.com/product/omap3530 --> on the right is a DVSDK + Android source code for 3530grep the codes for the above keywords... hopefully it helps? > > > > Now, As I understand thus far, I have the following options ( > > requirement 3 above I will ignore, don't know how to handle it yet, > > and assume for meanwhile that I have sysfs) : > > 1. use suspend scheme (no runtime PM) > > 1.a. create some kernel thread who check cpu load and will decide > > to disable system only if its below some minimum threshold (which > > should indicate no activity) > > 1.b. initialize all HW interrupts (gpio, uart, etc) as wakeup sources > > with this scheme only this thread is responsible for the suspend, > > and there is no use of the runtime PM, right ? > > > > 2. use runtime PM scheme : > > With this scheme I don't understand how some device will wake the > > system , or doesn't it need to ? If a driver wakes up maybe it need > > to deliver some info to system? > > > > I think option 1 is also easier to support, what do you think about both > ? > > > > Thanks!! > > Ran > > Does Anyone have any suggestions and feedback on the above requirements ? > > Thank you, > Ran > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Any char device example for runtime PM ?
On Sat, Sep 13, 2014 at 3:50 PM, Ran Shalit wrote: > On Sat, Sep 13, 2014 at 4:14 AM, Peter Teoh > wrote: > > please elaborate your requirements. char dev is for I/O to hardware. > but > > runtime PM is for hibernating machine. what is the connection u trying > to > > achieve? > > > > On Mon, Sep 8, 2014 at 1:22 PM, Ran Shalit wrote: > >> > >> Hello, > >> > >> Is there any character device example using runtime PM available ? > >> It is most helpful, > >> > Hi, > > Some of the drivers I'm using are char devices, while I only saw > platform device registration for runtime PM, so my question stem from > this. > > As to the system requirement I have, it is as following: > 1. make everything as automatic as possible , so that there won't be > any need to add any userspace application for the matter. > 2. wakeup from all relevant wakeup sources > 3. should not use sysfs (it should be disabled from kernel) > 4. platform is OMAP3530. > > Now, As I understand this far, I have the following options ( > requirement 3 above I will ignore, don't know how to handle it yet, > and assume for meanwhile that I have sysfs) : > 1. use suspend scheme (no runtime PM) > 1.a. create some kernel periodic thread who check cpu load and will > decide > to disable system only if its below some minimum threshold (which > should indicate no activity) > 1.b. initialize all HW interrupts (gpio, uart, etc) as wakeup sources > with this scheme only this thread is responsible for the suspend, > and there is no use of the runtime PM, right ? > > 2. use runtime PM scheme : > With this scheme I don't understand how some device will wake the > system , or doesn't it need to ? If a driver wakes up maybe it need > to deliver some info to system? > > as a general comment, your requirement for PM sounds weird. a. normally, the linux kernel has its own PM protocoland it governs which devices to saves states, and restore it later.there is a hierarchy of calls to be made. and it is a complex daisy chain from devices to higher logical level. but yours never seem to mention or plan to integrate to this infrastructure? b. hardware PM (sorry, i am a software guy...may be wrong) for microcontroller/CPU normally means different states resulting in different external PINs being disable, and for the least powered state only one or two pins are available to wake up the CP/microcontroller. but when u mentioned so many pins are potential wake up source..then it is not powered down at all. i am being vague and brief, not to waste time, as this is a big topic, sorry. -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: x86_64_defconfig and i386_defconfig: What is the difference?
On Tue, Sep 9, 2014 at 3:58 PM, Rajat Jain wrote: > Hi, > > Can someone tell me if the i386 one is to be used when we want to build > for a 32bit machine and the x86_64 is to be used for 64 bit machine? > i386 or 32-bit machines? i think it don't exists anymore, but what likely is correct: i386-compatible machine. The i386 config is for 32-bit OS, ie, the entire binaries must be build for 32-bit architecture. So choose the correct config provided u have the correct userspace files/libraries to support it. > > Thanks, > > Rajat > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Testing Code for Btrfs
some well known filesystem testing tools are listed here: http://linuxpoison.blogspot.sg/2008/07/linux-filesystem-testing-tools.html LTP is one of my favorite, very actively updated and basically it focus on testing kernel as a whole. On Sat, Sep 6, 2014 at 10:48 AM, nick wrote: > Hey Guys, > After purchasing a hard drive for btrfs testing, I am wondering what areas > of testing you would like me to do. > In addition this drive is enterprise based, a Seagate Constellation so > feel free to hammer it with the tests > as you wish :), I have no important data on it and don't care about losing > it. > Nick > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Questions about Kernel Memory that I didn't find answers in Google - Please Help
And Q2: Just want to comment that the load address has to be fixed initially, because unlike normal ELF, after loading ELF, there is a relocation tasks done by the linker. In vmlinuz we cannot have relocation, before executing the kernel is the BIOS / uboot / bootloader etc. One possible answer. Others: https://groups.google.com/forum/#!topic/comp.os.linux.embedded/0-SAzCqQKFM And perhaps some of the links below may help you: http://jianggmulab.blogspot.sg/2010_01_01_archive.html http://stackoverflow.com/questions/5647279/why-does-the-module-start-from-address-0xbf00 http://www.arm.linux.org.uk/developer/memory.txt http://en.wikipedia.org/wiki/High_memory bottomline: keep googling. Q6 and 7 makes no sense to mesorry. On Mon, Aug 4, 2014 at 11:22 PM, Lucas Tanure wrote: > Thanks! > > A quick look in all of that show me that there a lot of information > about how kernel manage memory. > But, I will find the answer for question 2, 6 and 7 in it ? > > Thanks! > -- > Lucas Tanure > +55 (19) 988176559 > > > On Sun, Aug 3, 2014 at 8:58 PM, Peter Teoh > wrote: > > I like your curiosities and interests in Linux > > kernel. > http://virtuallyhyper.com/2013/07/rhcsa-and-rhce-chapter-10-the-kernel/ > > > > Instead of answering one by one, I think I will just identify the > knowledge > > you are lacking: > > > > Memory management (from both x86/intel and linux kernel perspective). > > > > There are many many resources out there for you in these area, eg: > > > > http://en.wikipedia.org/wiki/Page_table > > http://en.wikipedia.org/wiki/X86-64 > > > > (both boring, but just understand it well enough) > > > > http://wiki.osdev.org/Paging (good explanationunderstand it very > very > > well). > > > > The ultimate classic ebook: > > > > https://www.kernel.org/doc/gorman/pdf/understand.pdf > > > > And this blog site has tons of good info on intel/memory etc: > > > > http://duartes.org/gustavo/blog/post/cpu-rings-privilege-and-protection/ > > http://duartes.org/gustavo/blog/post/anatomy-of-a-program-in-memory/ > > > > http://virtuallyhyper.com/2013/07/rhcsa-and-rhce-chapter-10-the-kernel/ > > > > http://www.cse.psu.edu/~anand/spring01/linux/memory.ppt > > > > One more thing: > > > > "readelf -S -W vmlinux" shows u the sections and the address where the > > different sections are supposed to be loaded in memory. If u replace > the > > vmlinux with the kernel module, eg: ip_tables.ko, then it says: > > > > starting at offset 0x328c blah blah > > > > so the loaded address is with respect to ZERO, but then the actual module > > address is: > > > > sudo cat /proc/modules |grep ip_table > > > > ip_tables 18106 1 iptable_filter, Live 0xf8bf5000 > > > > So all the output from your readelf, just add 0xf8bf5000 to it and you > will > > get the actual virtual address of that section IN MEMORY. > > > > Just only in memory. In file, the file offset of the section is > different. > > And many parts inside the ELF is also different from memory too: you > will > > need to add the virtual load address (above) to the offset as specified > > inside the relocation tables (objdump -r), and for each section there is > a > > separate relocation table (all independent from another, meaning that the > > different section CAN BE loaded to different parts in memory). > > > > Thanks. > > > > > > On Sun, Aug 3, 2014 at 11:59 PM, Lucas Tanure wrote: > >> > >> Hi, > >> > >> I'm looking for some site, pdf, book etc, that can answer this > questions. > >> For now I have : > >> > >> > http://unix.stackexchange.com/questions/5124/what-does-the-virtual-kernel-memory-layout-in-dmesg-imply > >> > >> > >> I want to understand a few things about the memory and the execution > >> of Linux kernel. > >> Taking from a X86 and grub I have: > >> > >> 1) Grub loads kernel and root file system in memory, and the vmlinux > >> has the code to decompress it self, right ? linux > >> > >> 2) The address of load kernel is always the same ? And It's at > >> compilation time that is chosen ? > >> > >> 2a) The kernel takes places in 3g-4g memory place, and user space from 0 > >> to 3gb. > >> But if the pc has only 256mb of memory ? > >> And when pc has 16gb of memory, the user space will be split in two ? > >> > >> 2b) And if kernel has soo many modules that needs more than
Re: Questions about Kernel Memory that I didn't find answers in Google - Please Help
f8178d5c8 > 98d5c8 0050a8 00 A 0 0 8 > [12] __ksymtab_strings PROGBITS81792670 > 992670 01cb42 00 A 0 0 1 > [13] __init_rodata PROGBITS817af1c0 >9af1c0 e8 00 A 0 0 32 > [14] __param PROGBITS817af2a8 > 9af2a8 000b00 00 A 0 0 8 > [15] __modverPROGBITS817afda8 >9afda8 000258 00 A 0 0 8 > [16] .dataPROGBITS > 8180 a0 0e1180 00 WA 0 0 4096 > [17] .vvarPROGBITS > 818e2000 ae2000 001000 00 WA 0 0 16 > [18] .data..percpu PROGBITS > c0 015300 00 WA 0 0 4096 > [19] .init.text PROGBITS > 818f9000 cf9000 0503ea 00 AX 0 0 16 > [20] .init.data PROGBITS > 8194a000 d4a00009e4c8 00 WA 0 0 4096 > [21] .x86_cpu_dev.initPROGBITS819e84c8 > de84c818 00 A 0 0 8 > [22] .parainstructions PROGBITS819e84e0 > de84e000bd3c 00 A 0 0 8 > [23] .altinstructionsPROGBITS819f4220 > df4220 005f40 00 A 0 0 1 > [24] .altinstr_replacement PROGBITS819fa160 > dfa160 001a69 00 AX 0 0 1 > [25] .iommu_table PROGBITS819fbbd0 > dfbbd0 f0 00 A 0 0 8 > [26] .apicdrivers PROGBITS819fbcc0 > dfbcc0 20 00 WA 0 0 8 > [27] .exit.text PROGBITS819fbce0 >dfbce0 0009bc 00 AX 0 0 1 > [28] .smp_locks PROGBITS819fd000 > dfd000005000 00 A 0 0 4 > [29] .data_nosave PROGBITS81a02000 > e02000001000 00 WA 0 0 4 > [30] .bss NOBITS > 81a03000e03000122000 00 WA 0 0 4096 > [31] .brk NOBITS > 81b25000 e03000425000 00 WA 0 0 1 > [32] .comment PROGBITS > e0300027 01 MS 0 0 1 > [33] .debug_frame PROGBITS > e03028002560 00 0 0 8 > [34] .shstrtab STRTAB > e0558800018a 00 0 0 1 > [35] .symtab SYMTAB > e060581a29f8 18 36 43659 8 > [36] .strtab STRTAB > fa8a50180d92 00 0 0 1 > Key to Flags: > W (write), A (alloc), X (execute), M (merge), S (strings), l (large) > I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown) > O (extra OS processing required) o (OS specific), p (processor specific) > > So the vmlinux is loaded in memory like a dd ? > > 5) In my function A, inside the module that I wrote, a non-initialized > variable will take place in non-initialized section that was loaded in > memory ? > Or my modules has a new sections for it's own use, and my module is > loaded my memory like a process, with all his sections? > So how another module or kernel code will fin my exported > variable/function ? > > > 6) Let's suppose: > I have a int variable, with 17 as content, and the address is 0xGG. > If I stop the linux in this time, read my memory at address 0xGG I > will got 17, right ? > 0xGGG will be bigger than 0xc000 always, right ? > > > 7) Now take int from question and change for: > struct mystruct * foo = (struct mystruct* ) kmalloc(sizeof(struct > mystruct)); > > I will be able to read at address 0xGG the struct that created, > and it address will be greater than 0xc000, right ? > But for this struct, the memory will be allocated for ever, until I > free the pointer, right ? > > > > Well, this just a start. I really want to understand how kernel is > run, loaded etc. Any help is appreciate, answering my questions, links > to read, books to read. > Actually, I didn't find any book with that kind of information . > > > -- > Lucas Tanure > +55 (19) 988176559 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org";> em...@kvack.org > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: userspace stack start and end
and look into the function "print_context_stack()" which will teach u how to identify the start/end of stack, whether the address is valid, how to traverse from one frame to another (using RBP / EBP of course, so CONFIG for framepointer is definitely needed). On Sat, Aug 2, 2014 at 12:22 AM, Peter Teoh wrote: > FYI, there are many different types of kernel stack: > > http://www.x86-64.org/pipermail/discuss/2005-April/005944.html > > > On Mon, Jul 28, 2014 at 12:52 AM, Xin Tong wrote: > >> I am trying to find the start and end address of the userspace stack. I >> see in the task_struct there is start_stack. But I could not find end_start >> anywhere in the kernel code ? >> >> Can someone please tell me how to find the end of the stack ? >> >> Thanks, >> Xin >> >> ___ >> Kernelnewbies mailing list >> Kernelnewbies@kernelnewbies.org >> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies >> >> > > > -- > Regards, > Peter Teoh > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: userspace stack start and end
FYI, there are many different types of kernel stack: http://www.x86-64.org/pipermail/discuss/2005-April/005944.html On Mon, Jul 28, 2014 at 12:52 AM, Xin Tong wrote: > I am trying to find the start and end address of the userspace stack. I > see in the task_struct there is start_stack. But I could not find end_start > anywhere in the kernel code ? > > Can someone please tell me how to find the end of the stack ? > > Thanks, > Xin > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: building kernel with -O
If function is built with framepointer, then EBP + 4 == return address of the caller of then present function.Because by convention, the entire function usually don't touch the EBP's value, so with respect to that, u can always retrieve the return address of the caller. (which is what this function does). and u ask if is not compiled inline? Then __builtin_return_address() become a function itself? Then u are getting the caller of "__builtin_return_address". That was not the original intention. Its purpose is to get the caller address of the current function. On Thu, Jul 31, 2014 at 9:05 AM, Xin Tong wrote: > In that case, the __builtin_return_address(level) level > 1 is not > possible either ? what if the kernel uses this ? > > Xin > > > On Wed, Jul 30, 2014 at 8:00 PM, Peter Teoh > wrote: > >> >> >> >> On Thu, Jul 31, 2014 at 12:59 AM, Xin Tong wrote: >> >>> why can not __builtin_return_address() be made *never* inline and use >>> current level+1 to get the return address of the function of interest. For >>> any stack introspection, having 1 more level will not hurt functionality. >>> >> >> Actually, the answer for your remark is "impossible" - in the case when >> the kernel is compiled without frame pointer. (CONFIG_FRAME_POINTER=n) >> which is true for certain variant of RHEL / CentOS. Without the >> availability of EBP on the stack, there is no way to know when to stop >> reading the stack to retrieve the previous stackframe. Of course u can >> statically walk the disassembly of the function and see how much stack >> space the particular function has allocated. But that requires >> implementing a disassembler in the kernel. >> >> >> >>> >>> given its explanation below >>> >>> — Built-in Function: void * *__builtin_return_address* (unsigned int >>> level) >>> >>> This function returns the return address of the current function, or of >>> one of its callers. The level argument is number of frames to scan up >>> the call stack. A value of 0 yields the return address of the current >>> function, a value of 1 yields the return address of the caller of the >>> current function, and so forth. When inlining the expected behavior is that >>> the function returns the address of the function that is returned to. To >>> work around this behavior use the noinline function attribute. >>> >>> >>> >>> >> >> -- >> Regards, >> Peter Teoh >> > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: IRQ mismatch ifconfig and /proc/interrupts
I suspect it is a bug, mine is Ubuntu 3.2.0-32 pae kernel, 12.04 32-bit: cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 41: 0 0 0 0 PCI-MSI-edge eth0 eth0 Link encap:Ethernet HWaddr 5c:f9:dd:75:54:d8 Interrupt:41 Base address:0x8000 everything matched. On Fri, Jul 25, 2014 at 9:58 PM, Oscar Salvador < osalvador.vilard...@gmail.com> wrote: > Hi People! How are you doing? > > I'm writting to you because I have a doubt about interrupts. > > If I look the interrupts assigned to my eth* with ifconfig, I get: > > eth0 Link encap:Ethernet HWaddr bb:aa:bb:bb:aa:aa > Interrupt:20 Memory:f7e0-f7e2 > > eth1 Link encap:Ethernet HWaddr bb:aa:bb:bb:aa:aa > Interrupt:18 Memory:f7d0-f7d2 > > As you can see, my system assigned IRQ-20 and IRQ-18 to eth0 and eth1. > > But If i look into /proc/interrupts, I don't have these interrupts: > > root@oscar:/home/oscar# cat /proc/interrupts >CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 > CPU6 CPU7 > 0: 15 0 0 0 0 0 > 0 0 IR-IO-APIC-edge timer > 8: 0 1 0 0 0 0 > 0 0 IR-IO-APIC-edge rtc0 > 9: 0 0 0 0 0 2 > 1 0 IR-IO-APIC-fasteoi acpi > 16: 191342 27819 25143 21231 19007 18159 > 17183 15717 IR-IO-APIC-fasteoi ehci_hcd:usb3 > 19: 15 7 0 0 2 9 > 1 4 IR-IO-APIC-fasteoi firewire_ohci > 23: 1441 76 61 42101 55 > 29 23 IR-IO-APIC-fasteoi ehci_hcd:usb4 > 40: 0 0 0 0 0 0 > 0 0 DMAR_MSI-edge dmar0 > 41: 0 0 0 0 0 0 > 0 0 DMAR_MSI-edge dmar1 > 42: 0 0 0 0 0 0 > 0 0 IR-PCI-MSI-edge xhci_hcd > 43: 27318 1788 1314 1414 4046 2273 > 2232 2059 IR-PCI-MSI-edge eth0 > 44: 115244 14686 10096 8738 41559 16021 > 10972 10090 IR-PCI-MSI-edge ahci > 45: 197010 19487 45260 14687 43697 29520 > 24546 21590 IR-PCI-MSI-edge eth1-rx-0 > 46: 27239 20276 18861 14845 54218 17950 > 12907 9765 IR-PCI-MSI-edge eth1-tx-0 > 47: 0 0 1 0 0 0 > 0 1 IR-PCI-MSI-edge eth1 > 48:262150 78 60261249 >168 47 IR-PCI-MSI-edge snd_hda_intel > 49: 857324 80338 67789 59555 682632 90385 > 78616 65048 IR-PCI-MSI-edge i915 > > > As you can see, seems to be that eth1 has IRQ-45 IRQ-46 and IRQ-47, and > eth0 has IRQ-43. > I don't understand why ifconfig shows another IRQ. > > Is this a normal behaviour? Someone would be so kind to explain me this? > > Or maybe throw me some paper that explains this. > > thank you very much > Best Regards > > Oscar > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: building kernel with -O
On Thu, Jul 31, 2014 at 12:59 AM, Xin Tong wrote: > why can not __builtin_return_address() be made *never* inline and use > current level+1 to get the return address of the function of interest. For > any stack introspection, having 1 more level will not hurt functionality. > Actually, the answer for your remark is "impossible" - in the case when the kernel is compiled without frame pointer. (CONFIG_FRAME_POINTER=n) which is true for certain variant of RHEL / CentOS. Without the availability of EBP on the stack, there is no way to know when to stop reading the stack to retrieve the previous stackframe. Of course u can statically walk the disassembly of the function and see how much stack space the particular function has allocated. But that requires implementing a disassembler in the kernel. > > given its explanation below > > — Built-in Function: void * *__builtin_return_address* (unsigned int level > ) > > This function returns the return address of the current function, or of > one of its callers. The level argument is number of frames to scan up the > call stack. A value of 0 yields the return address of the current > function, a value of 1 yields the return address of the caller of the > current function, and so forth. When inlining the expected behavior is that > the function returns the address of the function that is returned to. To > work around this behavior use the noinline function attribute. > > > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: rootkits blocking using virtualization??
this is a recent classic bug implementing ideas like you mentioned: http://xenbits.xenproject.org/xsa/advisory-98.html All mapping are done on hosts side. But the kernelnewbies is proposing something from the guest side, but if I have control over the guest OS (as a rootkit), then I also can undo what the protection has done - potentially.depending on available exploitable path of entry. On Thu, Jul 31, 2014 at 8:31 AM, Peter Teoh wrote: > Are u referring to this: > > http://kernelnewbies.org/KernelProjects/VirtRootkitBlocker > > Just trying to answer your question: > > --Is the method of making kernel read only to block rootkits used in linux > kernel mainline? > > I suspect not. How are u going to distinguish between "legitimate > program" and "rootkit" program? Program includes both userland program > and kernel modules.This distinction is needed, because legitimate > kernel modules can call "kmalloc" and that is read/writeable kernel memory. > Supposed there is a vulnerability in the kernel modules (and thus > userspace program can escalate privilege and execute into) then the > "kmalloc" is executed on behalf of the malware, but outwardly it looks as > if the kernel module is making a memory allocation.Unless u record down > all the potential legitimate kernel execution path (sequence of EIP > addresses), and compare it dynamically with the redirected path (as > triggered by the malware), it seemed like impossible to distinguish. And > the database of path is also going to be very huge. > Let me know if u have alternative ideas about setting kernel memory > readonly. > > But on the other hand, this idea is also not new, explored before, for > virtualization protection, NOT for rootkit detection. > > When u virtualized OS, the host has to set the all the memory given to the > guest as readonly. For details: > > For KVM: > > http://www.linux-kvm.org/wiki/images/3/33/KvmForum2008$kdf2008_15.pdf > > For Xen: > > http://wiki.xen.org/wiki/X86_Paravirtualised_Memory_Management > http://lists.xen.org/archives/html/xen-devel/2009-10/msg01201.html > > And this page has good info: > > http://www.linux-kvm.org/page/Memory > > (read esp the "shadow page memory" mechanism, which is very expensive, and > somewhat like the ideas proposed in the kernelnewbies mentor page). > > > > On Wed, Jul 30, 2014 at 7:44 PM, Aniket Shinde < > universalvirus@gmail.com> wrote: > >> Hello guys, >> I was going through kernelnewbies.org and came across a project >> "Block Rootkits using Virtualization" by riel. >> Basically we have to make kernel read only after boot process >> completes so rootkits get blocked. >> I have few doubts... >> >> --Is the method of making kernel read only to block rootkits used in >> linux kernel mainline? >> >> --have anybody implenented this project already? >> >> --what is the good way to start with above project? >> >> --any guidelines to implemnet above project?? >> >> --can I get any menor?? >> >> --any material related to above project?? >> >> (note: i have requested to mailing list but have not been approved yet. >> So please reply me personely.) >> >> ___ >> Kernel-mentors mailing list >> kernel-ment...@selenic.com >> http://selenic.com/mailman/listinfo/kernel-mentors >> >> > > > -- > Regards, > Peter Teoh > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: rootkits blocking using virtualization??
Are u referring to this: http://kernelnewbies.org/KernelProjects/VirtRootkitBlocker Just trying to answer your question: --Is the method of making kernel read only to block rootkits used in linux kernel mainline? I suspect not. How are u going to distinguish between "legitimate program" and "rootkit" program? Program includes both userland program and kernel modules.This distinction is needed, because legitimate kernel modules can call "kmalloc" and that is read/writeable kernel memory. Supposed there is a vulnerability in the kernel modules (and thus userspace program can escalate privilege and execute into) then the "kmalloc" is executed on behalf of the malware, but outwardly it looks as if the kernel module is making a memory allocation.Unless u record down all the potential legitimate kernel execution path (sequence of EIP addresses), and compare it dynamically with the redirected path (as triggered by the malware), it seemed like impossible to distinguish. And the database of path is also going to be very huge. Let me know if u have alternative ideas about setting kernel memory readonly. But on the other hand, this idea is also not new, explored before, for virtualization protection, NOT for rootkit detection. When u virtualized OS, the host has to set the all the memory given to the guest as readonly. For details: For KVM: http://www.linux-kvm.org/wiki/images/3/33/KvmForum2008$kdf2008_15.pdf For Xen: http://wiki.xen.org/wiki/X86_Paravirtualised_Memory_Management http://lists.xen.org/archives/html/xen-devel/2009-10/msg01201.html And this page has good info: http://www.linux-kvm.org/page/Memory (read esp the "shadow page memory" mechanism, which is very expensive, and somewhat like the ideas proposed in the kernelnewbies mentor page). On Wed, Jul 30, 2014 at 7:44 PM, Aniket Shinde wrote: > Hello guys, > I was going through kernelnewbies.org and came across a project > "Block Rootkits using Virtualization" by riel. > Basically we have to make kernel read only after boot process > completes so rootkits get blocked. > I have few doubts... > > --Is the method of making kernel read only to block rootkits used in linux > kernel mainline? > > --have anybody implenented this project already? > > --what is the good way to start with above project? > > --any guidelines to implemnet above project?? > > --can I get any menor?? > > --any material related to above project?? > > (note: i have requested to mailing list but have not been approved yet. So > please reply me personely.) > > ___ > Kernel-mentors mailing list > kernel-ment...@selenic.com > http://selenic.com/mailman/listinfo/kernel-mentors > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: global descriptor table in X86 Linux
search for "lgdt" here (for 32-bit kernel): http://lxr.free-electrons.com/source/arch/x86/kernel/head_32.S On Tue, Jul 29, 2014 at 4:04 AM, Xin Tong wrote: > Hi > > Ive heard that Linux uses the flat mode segmentation, i.e. the > segmentation base is forced to be 0 and the limit to be 2^64. > > I am having trouble finding the kernel code that sets up the GDT, can > someone please point me to the right direction. > > Thanks a lot. > Xin > > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Doubt Regarding Floating Point Arithmetic
You are welcome. To sidetrack, there is a longstanding vulnerability/security bug or just a "feature" of linux kernel though: If you compile any program with "float" or "double" type declaration, you will see that a lot of "XMM" registers and its instruction set being used. But searching the entire kernel source for XMM, we know the kernel don't touch these registers. So if u were to do your security keys calculation on these registers, then beware that upon being context-switched (which can happened anytime, beyond your control), another process can easily view all the XMM registers contents, and thus potentially looking at your secret keys. Same goes with the GPU as well (which has been commonly used for password cracking) - simply because the kernel don't touch these "memory" sources inside the kernel, and thus cross-process it is possible to have information leakage. On Wed, Jul 30, 2014 at 12:31 AM, Prasad Ram wrote: > Thanks @Peter a very good explanation and it's very help full to me. > > > On 29 July 2014 19:49, Peter Teoh wrote: > >> Perhaps a little explanation:anything that can be done at userspace, >> should not be done at the kernel, simply because doing at the kernel >> entailed a lot of security privileges being available. (ie, logic which >> require hardware interaction / access, process scheduling logic or anything >> cutting across processes, sharing of common resources like memory etc) >> floating point arithmetics is a good example which is not necessary to be >> done in the kernel. Lots of hardware registers are available for FPU >> stuff (SSE/SSE2/XMM registers etc): >> >> http://en.wikipedia.org/wiki/SSE2 >> http://www.godevtool.com/TestbugHelp/XMMintins.htm >> http://x86.renejeschke.de/html/file_module_x86_id_117.html >> >> and generally their usage entailed a lot of performance hits when used >> extensively (another good reason to avoid it). And more importantly, >> context switching as provided by Intel processor, the hardware operation >> does not include the floating pointers registers (simply because there are >> so many of them, and XMM can be like 128 bytes long?) Context switching >> will swap out the entire registers set when switching from one process to >> another, and if u were to do this for all the process, when 99% of the time >> floating point are not in use, it is a terrible waste of CPU cycle. >> >> Userspace can only interact with the kernel through well-defined syscall >> - for purpose of security, interprocess, or hardware access etc. So >> generally it is not possible to schedule floating point instruction (or any >> user-defined instructions for that matter) to be executed in the kernel. >> >> But it is possible to schedule floating point arithmetics to be executed >> in the kernel indirectly, for example, when u have a special hardware like >> DSP that does floating point arithmetics, and u wrote a driver to schedule >> instructions to be executed in that hardware unit. And u have to worry >> about many processes concurrently sending instructions to the same unit as >> well. >> >> Thanks for the reading. >> >> >> >> On Wed, Jul 23, 2014 at 11:15 AM, me storage >> wrote: >> >>> Hi >>> I am reading LDD .In that i didn't understand one point .In Chapter >>> 2(Building and Running Modules) they mentioned that >>> " Kernel code cannot do floating point arithmetic" >>> .My doubt is which code is used for floating point arithmetic that means >>> at low level? >>> >>> Thank you >>> >>> ___ >>> Kernelnewbies mailing list >>> Kernelnewbies@kernelnewbies.org >>> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies >>> >>> >> >> >> -- >> Regards, >> Peter Teoh >> > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Doubt Regarding Floating Point Arithmetic
Perhaps a little explanation:anything that can be done at userspace, should not be done at the kernel, simply because doing at the kernel entailed a lot of security privileges being available. (ie, logic which require hardware interaction / access, process scheduling logic or anything cutting across processes, sharing of common resources like memory etc) floating point arithmetics is a good example which is not necessary to be done in the kernel. Lots of hardware registers are available for FPU stuff (SSE/SSE2/XMM registers etc): http://en.wikipedia.org/wiki/SSE2 http://www.godevtool.com/TestbugHelp/XMMintins.htm http://x86.renejeschke.de/html/file_module_x86_id_117.html and generally their usage entailed a lot of performance hits when used extensively (another good reason to avoid it). And more importantly, context switching as provided by Intel processor, the hardware operation does not include the floating pointers registers (simply because there are so many of them, and XMM can be like 128 bytes long?) Context switching will swap out the entire registers set when switching from one process to another, and if u were to do this for all the process, when 99% of the time floating point are not in use, it is a terrible waste of CPU cycle. Userspace can only interact with the kernel through well-defined syscall - for purpose of security, interprocess, or hardware access etc. So generally it is not possible to schedule floating point instruction (or any user-defined instructions for that matter) to be executed in the kernel. But it is possible to schedule floating point arithmetics to be executed in the kernel indirectly, for example, when u have a special hardware like DSP that does floating point arithmetics, and u wrote a driver to schedule instructions to be executed in that hardware unit. And u have to worry about many processes concurrently sending instructions to the same unit as well. Thanks for the reading. On Wed, Jul 23, 2014 at 11:15 AM, me storage wrote: > Hi > I am reading LDD .In that i didn't understand one point .In Chapter > 2(Building and Running Modules) they mentioned that > " Kernel code cannot do floating point arithmetic" > .My doubt is which code is used for floating point arithmetic that means > at low level? > > Thank you > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: how to determine kernel interrupt latency
never use any of the following software beforejust suggest based on reading online: cyclictest: read this: http://www.spinics.net/lists/linux-rt-users/msg04088.html(as explained within, "timer interrupt latency" is what is being measured by cyclictest.not sure how it is done). More on cyclictest: http://elinux.org/images/0/01/Elc2013_rowand.pdf http://people.redhat.com/williams/latency-howto/rt-latency-howto.txt (last section on different ways of using cyclictest for interrupt latency is covered.) another: https://github.com/atlas555/rt-test (interrupt_tool) another is "intrperf" which originate from FreeBSD but there is a Linux version. google for it ( https://repos.dcl.info.waseda.ac.jp/spumone/trac/wiki/Interrupt%20Latency%20of%20Linux%20(intrperf)???). As highlighted here: http://marc.info/?l=linux-smp&m=102733872816465 interrupt latencies is really in such smaller order magnitude-wise, or due to low-level CPU feature like I-cache http://marc.info/?l=linux-arm-kernel&m=107472646713656 that it is not easy nor worth the time tuning.may be I am wrong. Check this for another discussion: http://marc.info/?t=10740345885&r=1&w=2 On Sun, Mar 16, 2014 at 9:09 PM, loody wrote: > hi peter: > > > 2014-01-17 13:41 GMT+08:00 Peter Teoh : > > > http://stackoverflow.com/questions/15383259/are-there-any-kernel-tools-available-to-measure-interrupt-latency-with-reasonabl > > > > checkout cyclictest. > I have checked cyclictest. > from manual page, it seems used to calculate thread latency instead of > interrupt latency > Would you please let me know if there any kind of command for using it > to check interrupt latency ? > thanks fo -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Pass through kernel memory manager
the parameter you passed in section start looks weird, given that your physical memory so limited. (8K and 128K, 2 different bank? if so then only one is available at any one time?), Perhaps some knowledge about linker-script should help: http://blogs.bu.edu/md/2011/11/15/the-dark-art-of-linker-scripts/ the "1:1" mapping is called identity mapping, and linker script provide a way for you to load the binary into specific part of the physical memory, On Sat, Feb 8, 2014 at 4:29 AM, Paul Chavent wrote: > Hi > > I'm working on an ARM926EJS based SOM (OMAPL138). The ARM has internal > memory spaces (8k one and 128k one) where i would like to put some code. > > I thought to use something like : > > void foobar (void) __attribute__ ((section ("bar"))); > > Then link with > > -Wl,--section-start,bar=1000 > > > But the Linux loader fails to load this segment. > > So, is it worth to try to achieve to run code at desired position ? > > Is there any way to tell Linux to 1:1 map some physical regions to > processes address space ? Perhaps the memmap= kernel parameter ? > > Thanks for your help. > > Paul. > > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Firmware Loading every boot?
FYI, the "firmware" are loaded from flash: http://en.wikipedia.org/wiki/Flash_memory which means microcontroller (or microprocessor) + DMA/DDR memory + flash are the usual makeup of an embedded system. flash are non-volatile, but normally it is slower and cannot be executed as CPU or microncontroller instruction. which is why you will need to load it into memory to be executed: http://lwn.net/Articles/135472/ cheers. On Mon, Feb 10, 2014 at 9:29 PM, Jeshwanth wrote: > Hello List, > > I came to know that, linux loads firmware for my dma everytime it boots. > But I don't understand, why it is required to load everytime it boots, > don't dma holds which is loaded previously. > AFAIK, firmware is a program which runs in devices. > > Please correct me if I am wrong. > > Thanks :) > > Regards, > Jeshwanth > > Sent from my HTC > > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: how to determine kernel interrupt latency
http://stackoverflow.com/questions/15383259/are-there-any-kernel-tools-available-to-measure-interrupt-latency-with-reasonabl checkout cyclictest. On Sat, Jan 11, 2014 at 3:25 PM, loody wrote: > hi all: > is it possible to determine interrupt latency in kernel with any ftrace or > proc? > > -- > Regards, > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: How to access a DRM CRTC's scan out buffer?
As indicated here: http://www.botchco.com/agd5f/?p=51 the input to CRTC is the framebuffer, and output of CRTC is already monitor-level information...which is meaningless to you. So but best bet is to get it at the framebuffer level? Correct me if wrong? On Thu, Jan 16, 2014 at 8:06 PM, Sannu K wrote: > On Thu, Jan 16, 2014 at 1:14 PM, Peter Teoh wrote: > >> In general how it worked is explained here: >> >> https://www.kernel.org/doc/htmldocs/drm/drm-kms-init.html >> >> Not sure which is the name of your video card, but I think in general all >> the page flip API should have access to the scan buffer (see link above). >> For Intel these are possible APIs >> : >> >> > Thanks. I was trying to find out a generic way to access the scan out > buffer. The page flip functions looks specific to hardware. > > >> >> static void do_intel_finish_page_flip(struct drm_device *dev, >> void intel_finish_page_flip(struct drm_device *dev, int pipe) >> do_intel_finish_page_flip(dev, crtc); >> void intel_finish_page_flip_plane(struct drm_device *dev, int plane) >> do_intel_finish_page_flip(dev, crtc); >> void intel_prepare_page_flip(struct drm_device *dev, int plane) >> * is also accompanied by a spurious intel_prepare_page_flip(). >> inline static void intel_mark_page_flip_active(struct intel_crtc >> *intel_crtc) >> >> -- >> Regards, >> Peter Teoh >> > > It is enough to have a way to get the content of scan out buffer instead > of accessing it directly using a pointer. > > Thanks for you help, > Sannu K > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: How to access a DRM CRTC's scan out buffer?
For ATI GPU the crtc_base could be the base pointer to the memory buffer: ./drivers/gpu/drm/radeon/rv770.c: u32 rv770_page_flip(struct radeon_device *rdev, int crtc_id, u64 crtc_base) ./drivers/gpu/drm/radeon/rs600.c: void rs600_pre_page_flip(struct radeon_device *rdev, int crtc) void rs600_post_page_flip(struct radeon_device *rdev, int crtc) u32 rs600_page_flip(struct radeon_device *rdev, int crtc_id, u64 crtc_base) As to the internals of these buffer area, well, u may need the datasheet from the vendor. Just grep for "CRTC" inside the gpu/drm/radeon directory and you can understand why. On Thu, Jan 16, 2014 at 3:44 PM, Peter Teoh wrote: > In general how it worked is explained here: > > https://www.kernel.org/doc/htmldocs/drm/drm-kms-init.html > > Not sure which is the name of your video card, but I think in general all > the page flip API should have access to the scan buffer (see link above). > For Intel these are possible APIs > : > > > static void do_intel_finish_page_flip(struct drm_device *dev, > void intel_finish_page_flip(struct drm_device *dev, int pipe) > do_intel_finish_page_flip(dev, crtc); > void intel_finish_page_flip_plane(struct drm_device *dev, int plane) > do_intel_finish_page_flip(dev, crtc); > void intel_prepare_page_flip(struct drm_device *dev, int plane) > * is also accompanied by a spurious intel_prepare_page_flip(). > inline static void intel_mark_page_flip_active(struct intel_crtc > *intel_crtc) > > > On Sat, Jan 11, 2014 at 9:27 PM, Sannu K wrote: > >> Hi, >> >> I would like to access a monitor's content in kernel mode. I tried but >> could not find a generic way to access CRTC's scan out buffer in kernel >> mode. I prefer to do it in kernel mode as an experiment. Any pointers will >> greatly help. >> >> Thanks and Regards, >> Sannu K >> >> ___ >> Kernelnewbies mailing list >> Kernelnewbies@kernelnewbies.org >> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies >> >> > > > -- > Regards, > Peter Teoh > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: How to access a DRM CRTC's scan out buffer?
In general how it worked is explained here: https://www.kernel.org/doc/htmldocs/drm/drm-kms-init.html Not sure which is the name of your video card, but I think in general all the page flip API should have access to the scan buffer (see link above). For Intel these are possible APIs : static void do_intel_finish_page_flip(struct drm_device *dev, void intel_finish_page_flip(struct drm_device *dev, int pipe) do_intel_finish_page_flip(dev, crtc); void intel_finish_page_flip_plane(struct drm_device *dev, int plane) do_intel_finish_page_flip(dev, crtc); void intel_prepare_page_flip(struct drm_device *dev, int plane) * is also accompanied by a spurious intel_prepare_page_flip(). inline static void intel_mark_page_flip_active(struct intel_crtc *intel_crtc) On Sat, Jan 11, 2014 at 9:27 PM, Sannu K wrote: > Hi, > > I would like to access a monitor's content in kernel mode. I tried but > could not find a generic way to access CRTC's scan out buffer in kernel > mode. I prefer to do it in kernel mode as an experiment. Any pointers will > greatly help. > > Thanks and Regards, > Sannu K > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: DMA, CMA, coherence and performance
I think this discussion should help you: http://e2e.ti.com/support/embedded/linux/f/354/t/89419.aspx other failures: http://stackoverflow.com/questions/14625919/allocating-a-large-dma-buffer and some guideline here: https://www.kernel.org/doc/Documentation/DMA-API.txt https://lkml.org/lkml/2011/3/25/19 As I don't have any specific crashdump or error information, nothing I can comment further about your problem. It is quite difficult to make general comment. On Fri, Jan 3, 2014 at 7:20 AM, Steven Bell wrote: > Hi, > > I'm working on a device driver for a video device which continuously reads > and writes image frames using DMA. The frames are fairly large, in the > range of 2-8MB, and I would like the buffers for them to be contiguous > because of my hardware. My understanding is that using the contiguous > memory allocator is the current "right way" to get the buffers, and that > CMA operates entirely behind the scenes when calls are made to > dma_alloc_coherent(). > > However, it seems that for this system, a streaming DMA setup would be > more appropriate. The buffer gets filled with data once, handed to the > device, and then isn't touched again until it gets reused with new data. > The resources I've read have hinted that streaming DMA has some performance > benefits over coherent DMA, so this seems like the way to go. But I > haven't seen any discussion of how to use CMA with streaming DMA (or > whether such a thing is even necessary). > > Does the CMA also work behind get_free_pages, or other kernel memory > allocation methods? Does it matter? The kernel newbies page on memory > allocation (http://kernelnewbies.org/KernelMemoryAllocation) says that > get_free_pages up to about 8MB are ok. Is that a generalization based on > typical memory fragmentation, or a guarantee? > > Thanks, > Steven > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Bug 12665
this list (Linux-API) focus on adding new API to the linux platform. So perhaps this one about timing may get you started: http://www.spinics.net/lists/linux-api/msg02243.html or in general: https://www.google.com.sg/search?q=site%3Awww.spinics.net%2Flists%2Flinux-api%2F+time On Fri, Jan 3, 2014 at 2:43 AM, johnd wrote: > On Tue, Dec 24, 2013 at 02:19:30PM +0800, Peter Teoh wrote: > > the DELAYTIMER_MAX is for realtime POSIX. > > > > but Linux is based on http://en.wikipedia.org/wiki/Linux_Standard_Base, > > which is LSB. > > > > There is no direct mapping between LSB and POSIX, but perhaps this: > > > > http://man7.org/linux/man-pages/man7/time.7.html > > > > and > > > > http://pubs.opengroup.org/onlinepubs/7908799/xsh/timer_gettime.html > > > > Look carefully between the two and you can perhaps find the balancing > point > > u will need for implementing this feature. > > Thanks for the explanation. I was just looking at bugs in bugzilla that > I could actually reproduce. I'm just getting started with kernel > programming and am looking for bugs I can observe. > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Bug 12665
reading the specs: http://pubs.opengroup.org/onlinepubs/7908799/xsh/timer_gettime.html the DELAYTIMER_MAX is for realtime POSIX. but Linux is based on http://en.wikipedia.org/wiki/Linux_Standard_Base, which is LSB. There is no direct mapping between LSB and POSIX, but perhaps this: http://man7.org/linux/man-pages/man7/time.7.html and http://pubs.opengroup.org/onlinepubs/7908799/xsh/timer_gettime.html Look carefully between the two and you can perhaps find the balancing point u will need for implementing this feature. whether it is a kernel bug, or userspace bug is therefore highly controversial. On Tue, Dec 17, 2013 at 1:29 PM, John de la Garza wrote: > I found a bug that appears to be simple to fix. I assume I am missing > something. > > here is a link to the bug description: > https://bugzilla.kernel.org/show_bug.cgi?id=12665 > > the man page for the function in the bug report mentions that linux does > not impliment the desired functionality > > > It seems like it is accepted as working the way it does, and at the same > time it is reported in bugzilla as a current bug. > > > What am I missing? > > > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: help in developing soft and hardlockup detection tool
i think the logic is not possible and does not make sense. essentially you cannot disable interrupt and loop for 11 seconds and reenable interrupt after that.this is because the timer is not going to trigger you once the interrupt is disabled. but u can of course do some pre-calculation: for your CPU, for platform, do a precise low level accurate timing of CPU to assess how many instructions of a certain types is need to achieve a duration, say 1 microsecond. then you implement a deterministic loop of 1 million loop to exactly implement a timing delay of 1 second for ONE cpu. you can disable interrupt before entering that deterministic loop. and once out of loop, u can enable interrupt again. the whole operation has to be precisely calculated and extrapolated from microseconds to seconds, and it really varies from CPU to CPU, or even same CPU in different platform. and btw, normal kernel operation is always with interrupt enabled, so all performance timestamping measurement will be very different in your constraint of disabling interrupt, which u are trying to do to simulate hardlockup. On Sun, Dec 15, 2013 at 4:27 PM, Vipul Jain wrote: > Hi, > > I would like to write a kernel module that will induce the softlockup and > hardlockup on the cpu core(s). Below is my logic and was wondering if some > one can help me verify and guide me creating a thread and other stuff for > implementing the logic. > > softlockup: > on given cpu number. > 1. disable kernel preemption > 2. keep looping for 21 seconds (as per kernel Documentation it takes 20 > seconds to detect and I would like to recover the system once its detected). > 3. release the cpu > > hardlockup > on given cpu number. > 1. disable interrrupts. > 2. keep looping for 11 seconds. > 3. enable interrupts and release cpu. > > Regards, > Vipul. > > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: network register on /proc fs
doing a "strace ifconfig -a" and you can see the following in your stderr: open("/proc/net/dev", O_RDONLY) = 6 open("/proc/net/if_inet6", O_RDONLY)= 6 open("/proc/net/if_inet6", O_RDONLY)= 6 open("/proc/net/if_inet6", O_RDONLY)= 6 and see fd id is 6, you can also see the fd in your /proc//fd: ls -al /proc/fd for a particular process give: >ls -al /proc/2187/fd total 0 lr-x-- 1 xxx xxx 64 Dec 16 07:26 0 -> /dev/null l-wx-- 1 xxx xxx 64 Dec 16 07:26 1 -> /dev/null lrwx-- 1 xxx xxx 64 Dec 16 07:26 10 -> anon_inode:[eventfd] lrwx-- 1 xxx xxx 64 Dec 16 07:26 11 -> /dev/dri/card0 registration of /proc happened when /proc is being initialized and created: fs/proc/root.c:proc_root_init() and the function for network is proc_net_init() inside fs/proc/proc_net.c. On Sat, Dec 7, 2013 at 12:58 PM, Hatt Tom wrote: > hi: > > Does the network relevent register on /proc fs ? when does it register ? > > which entry of /proc will be used by ifconfig" command ? > > > Thanks! > -- > Best Regards! > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Should I pass user-space buffer pointer to read() of struct file implemented by `filp_open()`?
On Wed, Nov 27, 2013 at 9:57 PM, 乃宏周 wrote: > In module code: > > *unsigned char buf[20];* > > *struct file *device;* > > *device = filp_open(...);* > > *device->f_op->read(device,buf,20,&device->f_pos);* > > In signature(interface) of *read()* of *struct file*, *buf* should came > from user-space. I fed my buffer, and I get correct data from that, Is that > correct? Shouldn't I provide a user-space buffer to that ? > Some convention in kernel programming: long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode) { here __user is used for declaration - explicitly saying that the pointer is pointing to userspace data. without it, all pointer necessarily need to point to kernel allocated memory, and u used copy_from_user() to copy data from userspace to kernel pointer. > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: about cheating upper layers
what you said about ip_rcv() should work, since it is even earlier than tcp, so no problem. your ip address translation is exactly what NAT is doing, so u definitely can remap it to what address u like. but port + IP address does not imply anything about which NIC port it is coming from. perhaps the L2/MAC layer can provide the redirectionthis part I am not sure. On Thu, Dec 5, 2013 at 1:41 AM, Guibin(Bill) Tian wrote: > Thanks Peter for your explanation. But in fact, I am not going to touch > transport layer. The work shall be done inside the call stack of ip_rcv(). > > In ip layer, there is no specific process information, so the process > assignment shouldn't be a problem. > At the application layer, each application maintains its own socket pair. > > My concern is that if the packet is from another NIC rather than the one > used to make the connection, can I make it transparent to the application > by only modifying the source and destination address in the ip header? > > > > On Wed, Dec 4, 2013 at 12:01 PM, Peter Teoh wrote: > >> >> >> >> On Wed, Dec 4, 2013 at 7:49 PM, Peter Teoh wrote: >> >>> >>> >>> >>> On Mon, Dec 2, 2013 at 1:48 PM, Guibin(Bill) Tian wrote: >>> >>>> Hi there, >>>> Right now, I am trying to do such a thing. >>>> >>>> If a computer has multiple interface A and B, assume the packet is from >>>> device A. >>>> At ip layer, before the packet is transmitted to transport layer, I >>>> change the source address and destination address of the IP header and >>>> transmit to transport layer to pretend that this packet is from device B. >>>> Not sure whether this can work or not. >>>> >>> >>> feasible? yes, theoretically u can change IP address, but there is a >>> problem. TCP port + IP address is combined together to uniquely identify >>> which "socket" to pass to. And each socket is always associated with each >>> process. if u change that the packet will be redirected to another >>> process. (this process context identification is done in the upper layer >>> of TCP, ie, in interrupt context the packet has not been associated with >>> any process yet.) >>> >>> and remember there is a checksum (TCP and IP) that need to be patched >>> whenever u change anything. >>> >>> not sure why u want to that, but I suspect Netfilter should fulfill your >>> requirement as well? >>> >>> >> >> Sorry, not sure if the earlier explanation is clear to you? >> >> So to be specific: >> >> in net/ipv4/tcp_ipv4.c: >> >> This is from the ingress path (still executing in software interrupt >> context mode, ie, packet has not been assigned to any process yet, and so >> you can always modify the packet content): >> >> int tcp_v4_rcv(struct sk_buff *skb) >> { >> const struct iphdr *iph; >> const struct tcphdr *th; >> struct sock *sk; >> int ret; >> struct net *net = dev_net(skb->dev); >> >> if (skb->pkt_type != PACKET_HOST) >> goto discard_it; >> >> /* Count it even if it's bad */ >> TCP_INC_STATS_BH(net, TCP_MIB_INSEGS); >> >> if (!pskb_may_pull(skb, sizeof(struct tcphdr)) >> >> and in include/net/sock.h: >> >> /* This is the per-socket lock. The spinlock provides a synchronization >> * between user contexts and software interrupt processing, whereas the >> * mini-semaphore synchronizes multiple users amongst themselves. >> */ >> typedef struct { >> spinlock_t slock; >> int owned; >> wait_queue_head_t wq; >> /* >> >> To modify the packet, you can either do it before the above function, or >> after the function, but before the packet gtet assigned to its rightful >> owner. MY GUESS* >> >> >>> >>>> There are other interface specific information in the skb structure >>>> like the net_device member. If I pass the packet to transport layer only >>>> with my proposed modification, will the application's sockehit detect this? >>>> I didn't look into the socket match code, not sure if the socket match will >>>> check other information in skb struct beside the ip address and port >>>> number. >>>> >>>> Thanks for your help. >>>> >>>> Bill >>>> >>>> ___ >>>> Kernelnewbies mailing list >>>> Kernelnewbies@kernelnewbies.org >>>> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies >>>> >>>> >>> >>> >>> -- >>> Regards, >>> Peter Teoh >>> >> >> >> >> -- >> Regards, >> Peter Teoh >> > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: watchdog pet in kernel module
On Thu, Dec 5, 2013 at 10:19 AM, Rajat Sharma wrote: > Although /dev/watchdog is available in usermode, but nothing should stop > you to write to it from a kernel thread. > > Rajat > I don't think /dev/watchdog (literally, I meant) is available in the kernel. It is accessible in userspace, but translated to a different name in the kernel. and moreover, if u access the variable directly, bypassing all the spinlock (see drivers/watchdog and look for "wdt_lock" spinlock) that is implemented around it, u might be going into a racing condition. BUT.if u really insist probing from inside the kernelit is not watchdog, it is "process watch", in your own way. ie, u can always write a loop that periodically probe the status of that specific to make sure it is in RUNNING state (vs BLOCKING when it is waiting for some I/O, or locks to complete), and perhaps check the CPU instruction to make sure that it is not going into a tight loop (ie, a userspace program that literally do "while(true) {do_nothing()}and many other possible "hung" criteria for a process as well. not easy...but extremely complex. > > > On Wed, Dec 4, 2013 at 5:50 PM, Peter Teoh wrote: > >> >> >> >> On Thu, Dec 5, 2013 at 9:06 AM, Vipul Jain wrote: >> >>> >>> >>> >>> On Wed, Dec 4, 2013 at 4:57 PM, wrote: >>> >>>> On Wed, 04 Dec 2013 16:45:44 -0800, Vipul Jain said: >>>> >>>> > If you don't mind can you please provide me more insight as what can >>>> be >>>> > false alarm I can encounter to move pet inside kernel module? >>>> >>>> The issue isn't false alarms - it's failure to alarm when it should. >>>> >>>> The problem is that it's possible for a kernel to get wedged in such a >>>> way that >>>> a kernel thread is still able to feed the watchdog timer on a regular >>>> basis, >>>> but userspace is effectively hung and unable to proceed. For example, >>>> if an >>>> OOPS happens while a filesystem lock is held, all future userspace >>>> references >>>> to that filesystem (and possibly all filesystems of the same type) will >>>> hang, >>>> eventually strangling the box while the kernel is still perfectly able >>>> to keep >>>> the watchdog working. >>>> >>>> Hi Valdis, >>> >>> I see what you are saying but what if the user process that's feeding >>> the dog gets hung and rest of the system is fine then it will bring the >>> whole system down won't it? I basically want to avoid this? >>> >>> >> Normally the process that feed the dog, is a simple process that JUST >> periodically set the watchdog device descriptor.Yes, one main() with a >> while loop just periodically resetting the descriptor. >> >> And so it is is not able to respond in time, by inference, OTHER PROCESS >> must have hung. In other system i saw there is a mother process that >> monitor a few (not all) of its key child process so perhaps one child >> will have one variable to signal to the mother that it is running. If not >> responding in time, the mother will clean up everything and then purposely >> not setting the watchdog, resulting in reboot. >> >> >>> Regards, >>> Vipul. >>> >>> >> >> >> -- >> Regards, >> Peter Teoh >> >> ___ >> Kernelnewbies mailing list >> Kernelnewbies@kernelnewbies.org >> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies >> >> > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: watchdog pet in kernel module
On Thu, Dec 5, 2013 at 9:06 AM, Vipul Jain wrote: > > > > On Wed, Dec 4, 2013 at 4:57 PM, wrote: > >> On Wed, 04 Dec 2013 16:45:44 -0800, Vipul Jain said: >> >> > If you don't mind can you please provide me more insight as what can be >> > false alarm I can encounter to move pet inside kernel module? >> >> The issue isn't false alarms - it's failure to alarm when it should. >> >> The problem is that it's possible for a kernel to get wedged in such a >> way that >> a kernel thread is still able to feed the watchdog timer on a regular >> basis, >> but userspace is effectively hung and unable to proceed. For example, if >> an >> OOPS happens while a filesystem lock is held, all future userspace >> references >> to that filesystem (and possibly all filesystems of the same type) will >> hang, >> eventually strangling the box while the kernel is still perfectly able to >> keep >> the watchdog working. >> >> Hi Valdis, > > I see what you are saying but what if the user process that's feeding the > dog gets hung and rest of the system is fine then it will bring the whole > system down won't it? I basically want to avoid this? > > Normally the process that feed the dog, is a simple process that JUST periodically set the watchdog device descriptor.Yes, one main() with a while loop just periodically resetting the descriptor. And so it is is not able to respond in time, by inference, OTHER PROCESS must have hung. In other system i saw there is a mother process that monitor a few (not all) of its key child process so perhaps one child will have one variable to signal to the mother that it is running. If not responding in time, the mother will clean up everything and then purposely not setting the watchdog, resulting in reboot. > Regards, > Vipul. > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: watchdog pet in kernel module
On Thu, Dec 5, 2013 at 8:45 AM, Vipul Jain wrote: > > > > On Tue, Dec 3, 2013 at 10:28 PM, Peter Teoh wrote: > >> Hi Vipul, >> >> I have seen this in a number of commercial software running on RHEL, and >> on other realtime OS as well. The watchdog mechanism is always working in >> pair: userspace "feeding" the dog (in the kernel). (btw, feed the dog >> is a more usually used term than "pet" the dog. sorry for that. google >> for that and perhaps you can get more info?). >> >> Like Valdis said, this way you will know when userspace hang, which is >> the key criteria for reboot. Why do u want to detect if the kernel hang >> (versus busy doing something)? Theoretically that is not possible, >> especially when all interrupt are disabled. >> >>> >>> Hi Peter, > > If you don't mind can you please provide me more insight as what can be > false alarm I can encounter to move pet inside kernel module? > > "Feeding the dog" is simply a periodic timer that wakes up and set a variable. By the fact that the variable can be set/reset, also means that the periodic timer IS working. In userspace, if you just have one process to "feed the watchdog", then essentially we are monitoring whether system-wide the performance overall is good enough so that the periodic timer can be woken up at the required interval to reset the variable. If some process hung, it MAY or MAY not affect the periodicity of this timer process. But if you have the timer embedded inside a particular high priority process you want to monitor, and if it hung, and "feeding the watchdog" will not execute, and the kernel will reboot you (read below - search "reboot"). http://www.mjmwired.net/kernel/Documentation/watchdog/watchdog-api.txt and more insights: http://stackoverflow.com/questions/2020468/who-is-refreshing-hardware-watchdog-in-linux (and lots of the "RELATED" questions at the side of the above page as well.) > Regards, > Vipul. > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: How can I 'getchar()' in module code?
yes, exactly - what u are describing is called "kdb". don't mixed up with "kgdb". kdb: this is debugging on the same computer - so no serial ports connection are needed. once exception occurred, you will be popped into a special debugger screen. problem is that now this debugger is running in kernel mode, inside the same computer that have the kernel module crashing, and so everything stop running, only kdb is running. (NOTE: i played with this almost like 8 or 9 years ago, and it seemed now kdb is not updated any more.) kgdb: this always require TWO computer: host + debuggee. kgdb is running inside the debuggee whose kernel has crashed, and gdb is running in host. normally connected via serial port. normally the preferred way is to run the kernel to be debugged inside the VirtualBox, or VMWare, and then gdb host is the virtual machine host. diff between the two is explained here: https://www.kernel.org/pub/linux/kernel/people/jwessel/kdb/CompileKDB.html and setup are here (mainly for kgdb): http://elinux.org/KDB http://allmybrain.com/2010/04/29/debugging-linux-kernel-modules-with-virtualbox-and-kgdb/ http://www.linuxforu.com/2011/03/kgdb-with-virtualbox-debug-live-kernel/ have fun. On Tue, Dec 3, 2013 at 8:35 PM, 乃宏周 wrote: > For debugging purpose, I want something like 'getchar()' that can pause > execution in the module code. Do any candidates I can choose? > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: about cheating upper layers
On Wed, Dec 4, 2013 at 7:49 PM, Peter Teoh wrote: > > > > On Mon, Dec 2, 2013 at 1:48 PM, Guibin(Bill) Tian wrote: > >> Hi there, >> Right now, I am trying to do such a thing. >> >> If a computer has multiple interface A and B, assume the packet is from >> device A. >> At ip layer, before the packet is transmitted to transport layer, I >> change the source address and destination address of the IP header and >> transmit to transport layer to pretend that this packet is from device B. >> Not sure whether this can work or not. >> > > feasible? yes, theoretically u can change IP address, but there is a > problem. TCP port + IP address is combined together to uniquely identify > which "socket" to pass to. And each socket is always associated with each > process. if u change that the packet will be redirected to another > process. (this process context identification is done in the upper layer > of TCP, ie, in interrupt context the packet has not been associated with > any process yet.) > > and remember there is a checksum (TCP and IP) that need to be patched > whenever u change anything. > > not sure why u want to that, but I suspect Netfilter should fulfill your > requirement as well? > > Sorry, not sure if the earlier explanation is clear to you? So to be specific: in net/ipv4/tcp_ipv4.c: This is from the ingress path (still executing in software interrupt context mode, ie, packet has not been assigned to any process yet, and so you can always modify the packet content): int tcp_v4_rcv(struct sk_buff *skb) { const struct iphdr *iph; const struct tcphdr *th; struct sock *sk; int ret; struct net *net = dev_net(skb->dev); if (skb->pkt_type != PACKET_HOST) goto discard_it; /* Count it even if it's bad */ TCP_INC_STATS_BH(net, TCP_MIB_INSEGS); if (!pskb_may_pull(skb, sizeof(struct tcphdr)) and in include/net/sock.h: /* This is the per-socket lock. The spinlock provides a synchronization * between user contexts and software interrupt processing, whereas the * mini-semaphore synchronizes multiple users amongst themselves. */ typedef struct { spinlock_t slock; int owned; wait_queue_head_t wq; /* To modify the packet, you can either do it before the above function, or after the function, but before the packet gtet assigned to its rightful owner. MY GUESS* > >> There are other interface specific information in the skb structure like >> the net_device member. If I pass the packet to transport layer only with my >> proposed modification, will the application's sockehit detect this? I >> didn't look into the socket match code, not sure if the socket match will >> check other information in skb struct beside the ip address and port number. >> >> Thanks for your help. >> >> Bill >> >> ___ >> Kernelnewbies mailing list >> Kernelnewbies@kernelnewbies.org >> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies >> >> > > > -- > Regards, > Peter Teoh > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: about cheating upper layers
On Mon, Dec 2, 2013 at 1:48 PM, Guibin(Bill) Tian wrote: > Hi there, > Right now, I am trying to do such a thing. > > If a computer has multiple interface A and B, assume the packet is from > device A. > At ip layer, before the packet is transmitted to transport layer, I change > the source address and destination address of the IP header and transmit to > transport layer to pretend that this packet is from device B. Not sure > whether this can work or not. > feasible? yes, theoretically u can change IP address, but there is a problem. TCP port + IP address is combined together to uniquely identify which "socket" to pass to. And each socket is always associated with each process. if u change that the packet will be redirected to another process. (this process context identification is done in the upper layer of TCP, ie, in interrupt context the packet has not been associated with any process yet.) and remember there is a checksum (TCP and IP) that need to be patched whenever u change anything. not sure why u want to that, but I suspect Netfilter should fulfill your requirement as well? > > There are other interface specific information in the skb structure like > the net_device member. If I pass the packet to transport layer only with my > proposed modification, will the application's sockehit detect this? I > didn't look into the socket match code, not sure if the socket match will > check other information in skb struct beside the ip address and port number. > > Thanks for your help. > > Bill > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Recovering Linux system from hung state via software
On Wed, Dec 4, 2013 at 4:13 PM, Mandeep Sandhu wrote: > > assuming one mother process is monitoring 10 child process, so inside > each > > child process, simply just setup a PERIODIC (eg, per 5 sec) mechanism to > > toggle a binary variables through IPC means. It will be reset when the > > mother process go around checking all the variable status and, if not > reset > > it therefore implies that the particular process might be hung.it can > > wait further, or continue checking other process. at the end of > checking > > ALL the process, if everything is OK, it should feed the kernel watchdog > > timer. if the kernel watchdog timer is not reset, the kernel module > will > > then reboot the system. (ie, reboot is from kernel module). > > Hold on! Why should we reboot the whole system if only some of these > processes are misbehaving?!?! Why should other processes suffer due > this? Wouldn't it be better to just kill the erroneous process (like > how most OS's anyway do, eg: "Force Quit" in Ubuntu, or chrome tabs). > > In many COTS software, the behavior of every process is highly dependent on one-another, especially some of these will talk to hardware, and other are just processing the intermediate data. When something goes wrong, it is difficult to diagnose the faults (which is why faults logging is important, and always done on flash or harddisk, but not temporary filesystem) in realtime (ie, self-diagnosis mechanism), so it is better to reboot. yes, not all process need to trigger reboot, so design it with care. eg, Apache server can always afford to be kill and restart a new one. > Or are these processes the only ones running on the system? > > -mandeep > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Recovering Linux system from hung state via software
On Fri, Nov 29, 2013 at 8:28 AM, Vipul Jain wrote: > Hi Kernel alias, > > I am a newbie and I am trying to figure out ways where in I can recover the > Linux in below two scenarios: > 1. my specific process hangs. > how to recover i cannot tell you, because it is application specific (but best is to design your system to reboot completely. eg temporary stuff or files should be stored in memory - eg, tmpfs, and rebooting will be all "gone", not erased and removed securely, but logically "gone"). And how to detect that is this: assuming one mother process is monitoring 10 child process, so inside each child process, simply just setup a PERIODIC (eg, per 5 sec) mechanism to toggle a binary variables through IPC means. It will be reset when the mother process go around checking all the variable status and, if not reset it therefore implies that the particular process might be hung.it can wait further, or continue checking other process. at the end of checking ALL the process, if everything is OK, it should feed the kernel watchdog timer. if the kernel watchdog timer is not reset, the kernel module will then reboot the system. (ie, reboot is from kernel module). > > 2. kernel gets hung partially or completely. > > I have done some reading and seems like there is softlockup and hardlockup > mechanisms in Linux source base that I can use but not sure, if yes I have > below questions: > 1. Which kernel version is minimum required for this? > 2. How do I know that soft and hard lockup are enabled in my kernel? > 3. How can I customize the behavior of default action that been taken? > 4. Can I use these two lockup mechanism to find out if my process is hung > or not? > 5. Any pointers to any docs that can help will be appreciated. > > I will greatly appreciate any help here. > > > Regards, > Vipul. > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: watchdog pet in kernel module
Hi Vipul, I have seen this in a number of commercial software running on RHEL, and on other realtime OS as well. The watchdog mechanism is always working in pair: userspace "feeding" the dog (in the kernel). (btw, feed the dog is a more usually used term than "pet" the dog. sorry for that. google for that and perhaps you can get more info?). Like Valdis said, this way you will know when userspace hang, which is the key criteria for reboot. Why do u want to detect if the kernel hang (versus busy doing something)? Theoretically that is not possible, especially when all interrupt are disabled. On Wed, Dec 4, 2013 at 6:45 AM, Vipul Jain wrote: > > > > On Tue, Dec 3, 2013 at 2:31 PM, wrote: > >> On Tue, 03 Dec 2013 13:15:32 -0800, Vipul Jain said: >> >> > currently we configure/pet the watchdog from user space via /dev/ipmi0 >> > device interface and I would like to do the pet part from kernel module. >> >> That's actually defeating the purpose. If you do it from the kernel, >> you keep the watchdog from detecting a whole set of hangs that can cause >> userspace to wedge up. >> > > Well we use different mechanism to detect user space hangs and take > corrective actions. Hence we want to separate the user space issues from > kernel space issues by using hardware watchdog pet in kernel space. > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Getting struct page pointer from virtual address
I think there is no exported function for that, but there is a global variable for that. Reason being for performance - the action virt_to_phys() is a macro to be compiled inline and more details here: http://stackoverflow.com/questions/5982125/how-to-get-a-struct-page-from-any-address-in-the-linux-kernel On Wed, Sep 4, 2013 at 1:29 AM, ajay saini wrote: > More information : > - Linux kernel version : 2.6.32 (But I would like a method which is > portable to other higher versions as well) > - I tried using follow_page, but this function is not exported from the > kernel so, can't use it. (Any reason why this function is not exported??) > > Thanks > Ajay > > -- > *From:* ajay saini > *To:* "kernelnewbies@kernelnewbies.org" > *Sent:* Tuesday, 3 September 2013 1:21 PM > *Subject:* Getting struct page pointer from virtual address > > Hey, > > I am working on a linux kernel module and I have a virtual address and mm > (struct mm_struct) for a process in this module. I can find the virtaul > memory area to which this address belongs to by using find_vma. > > Is there a function in the linux kernel which I can use in this module > (i.e. exported from the kernel) to get struct page pointer for this virtual > address. > > Thanks > Ajay > > > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Filesystem and files getting corrupted
any kernel debugging always start with dmesg output, please provide a snapshot of that. (preferably posting full listing at pastebin.com). On Tue, Jun 25, 2013 at 4:08 AM, Daniel Hilst Selli wrote: > I'm working on an embedded project based on var-som-am35 from TI. [1] > > I experiencing a lot of corruption from files and even the entire > filesystem... is there any guide on how debug filesystems corruption? > > We already tryied vfat and ext3 fs.. changed media, changed machines... > The filesystem runs on mmc card, or on usb flash drive... There is a > java aplication running on top of this filesystem, which uses JMS, that > is very I/O agressive.. > > Cheers, > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Why is that the write speed of DDR SDRAM are faster than the DDR2
these are tradeoff between DDR2 (for higher speed, but higher read latencies) vs DDR (lower speed + read latencies). so if u make the bus speed the same for both, then DDR2's higher latencies will make it slower than DDR. http://www.diffen.com/difference/DDR_vs_DDR2 and a technical comparison charts in numbers: http://www.freescale.com/webapp/sps/site/overview.jsp?code=784_LPBB_DDR On Mon, Jun 3, 2013 at 2:39 PM, devendra.aaru wrote: > Hello, > > I have two different types of hardware, one with DDR SDRAM and another > DDR2, > i tested them with bw_mem tool for write bandwidths, seems that at > higher writes of >512kbytes the DDR is faster than the DDR2(more than > 40%). But when compared to reads, DDR is slower (more than 50%). I > couldn't find any reference that explains why. AFAIK, the DDR2 works > double the faster rate than the DDR. > > > any ideas? > > Thanks, > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Analyzing Kernel call traces.
On Wed, May 8, 2013 at 3:16 PM, Shraddha Kamat wrote: > Any good tutorial for analyzing kernel call traces ? I want to > know what is the meaning of everything that appears in the call > trace and get to the exact cause of the problem. > sorry , u mean "backtrace" call trace? or kernel oops? http://www.linuxforu.com/2011/01/understanding-a-kernel-oops/ and here is another trace: http://elinux.org/Kernel_Function_Trace which depends on the instrumentation method: http://elinux.org/images/6/68/Kfiboot-9.lst http://elinux.org/Kernel_Instrumentation http://elinux.org/Instrumentation_API many of these traces, simply depends on the concept of call frames, or a range of memory addresses allocated on the stack used by the functions. above page also mentioned the use of gcc -pg, and not mentioned are other features of gcc (man gcc): -finstrument-functions -finstrument-functions-exclude-function-list=sym,sym,... -finstrument-functions-exclude-file-list=file,file,... Beware though, sometimes compilation will explicitly remove the use of frame pointer: -fomit-frame-pointer without the "ebp" and "esp" to demarcate the start and end of a frame, there is no way to know the beginning and end of a call frame, and therefore "stack trace", or "call trace" will not be accurately shown. Other possibilities are that the function names are declared with "static" as well, and u will end up with numerical offset from the nearest function with name. -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: atomic operations
in simple terms, any operation, in terms assembly instructions, which can be executed in ONE instruction, is "atomic", because, just like an atom, it cannot be broken up into parts. any instructions that is longer than one, for eg, TWO instruction, is NOT atomic, because in BETWEEN the first and 2nd instruction, something like an interrupt can come in, and affect the values of the operand when it is passed from instruction one to second instruction. To save me from reiteration: http://www.ibm.com/developerworks/library/pa-dalign/ (search for "atomicity"). http://stackoverflow.com/questions/381244/purpose-of-memory-alignment http://lwn.net/Articles/260832/ http://www.songho.ca/misc/alignment/dataalign.html http://www.cis.upenn.edu/~palsetia/cit595s08/Lectures08/alignmentOrdering.pdf Essentially, atomicity and non-alignment become problematic when u tried to to read using non-byte addressing mode with non-aligned address. On Sun, Feb 24, 2013 at 5:42 PM, Shraddha Kamat wrote: > what is the relation between atomic operations and memory alignment ? > > I read from UTLK that "an unaligned memory access is not atomic" > > please explain me , I am not able to get the relationship between > memory alignment and atomicity of the operation. > > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: atomic operations
Another good article on atomicty and data sizes: http://www.ibm.com/developerworks/library/pa-atom/ On Sun, Feb 24, 2013 at 8:50 PM, Peter Teoh wrote: > in simple terms, any operation, in terms assembly instructions, which can > be executed in ONE instruction, is "atomic", because, just like an atom, it > cannot be broken up into parts. any instructions that is longer than one, > for eg, TWO instruction, is NOT atomic, because in BETWEEN the first and > 2nd instruction, something like an interrupt can come in, and affect the > values of the operand when it is passed from instruction one to second > instruction. To save me from reiteration: > > http://www.ibm.com/developerworks/library/pa-dalign/ (search for > "atomicity"). > > http://stackoverflow.com/questions/381244/purpose-of-memory-alignment > > http://lwn.net/Articles/260832/ > > http://www.songho.ca/misc/alignment/dataalign.html > > > http://www.cis.upenn.edu/~palsetia/cit595s08/Lectures08/alignmentOrdering.pdf > > Essentially, atomicity and non-alignment become problematic when u tried > to to read using non-byte addressing mode with non-aligned address. > > On Sun, Feb 24, 2013 at 5:42 PM, Shraddha Kamat wrote: > >> what is the relation between atomic operations and memory alignment ? >> >> I read from UTLK that "an unaligned memory access is not atomic" >> >> please explain me , I am not able to get the relationship between >> memory alignment and atomicity of the operation. >> >> >> ___ >> Kernelnewbies mailing list >> Kernelnewbies@kernelnewbies.org >> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies >> > > > > -- > Regards, > Peter Teoh > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: V4L2 Framework
es in the kernel source: mem2mem_testdev.cv4l2-mem2mem.c and all the APIs u can use (inside kernel drivers) are listed above (as EXPORT symbol). On Mon, Feb 18, 2013 at 12:29 PM, Kaushal Billore < kaushalbill...@hotmail.com> wrote: > I have some doubt regarding Linux kernel V4l2 API's. > When capture application calls Reqbuff ioctl to allocate n no of buffer > which would belongs to v4l2 layer and display application calls the Reqbuff > ioctl to allocate N no of buffer which would also belongs to device memory. > > Question: > 1. V4l2 maintains the generic layer for all devices in which buffers can > be allocated by any device and can be handle by any device? > > 2. If not then while capturing the data from capture device can capture > device allocated buffer gets filled and while displaying the same data > there memory copy happens between capture buffer and output buffers? > > 3. If not then I want to capture data from capture device and display onto > display device through the v4l2 framework layer. > > Awaiting for responce! > > Thanks in advance > Kaushal > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: How controll is passed from uboot to kernel
Reading the uboot source code: In common/cmd_bootm.c: int do_bootm (cmd_tbl_t *cmdtp, int flag, int argc, char *argv[]) { And within this do_bootm_linux() is called: do_bootm_linux (cmdtp, flag, argc, argv, addr, len_ptr, verify); And inside do_bootm_linux() (platform-specific, for x86 it is lib_i386/i386_linux.c) is the load_zimage() function being called, which is effectively loading the kernel image file. On Sat, Feb 16, 2013 at 12:43 PM, Chetan C.R. wrote: > Hi All, > > I need to know how the control is passed from u-boot to kernel in Linux > operating system > > > Thanks in Advannce > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: MAX limit of file descriptor
one more: To modify system-wide limits: */etc/security/limits.conf* On Tue, Feb 12, 2013 at 5:31 PM, Peter Teoh wrote: > perhaps i can add more info, after doing more investigation: > > a. "ulimit" is a shell feature, it is not a command line binary. "man > bash" and "man sh" and u can see ulimit has different feature available for > u. > > b. ulimit control all the resources defined by the processes spawn from > the current shell onwards...ie, once ulimit is change, all child processes > from that shell onwards will change. but resources limit in another shell, > existing processes etc does not. > > c. ulimit is a userspace feature, the kernel will have all the > corresponding feature of max open files etc...but definitely it is not > unlimited like that of ulimit. > > d. to see ALL the open files u can use "lsof" and "-p" give u control to > point at which process to dig for open files. it also list all the open > connections (TCP) for uwhich is what u want. > > e. generally java applications will open many many files descriptor > concurrently: > > > http://www.java.net/forum/topic/glassfish/glassfish/too-many-open-files-issue > > (above listed 4500, and many others java apps like IBM RSA also have many). > > On Sat, Feb 9, 2013 at 1:10 PM, horseriver wrote: > >> hi:) >> >>In one process ,what is the max number of opening file descriptor ? >>Can it be set to infinite ? >> >>In network programing ,what is the essential for the maximum of >> connections >>dealed per second >> >> thanks! >> >> _______ >> Kernelnewbies mailing list >> Kernelnewbies@kernelnewbies.org >> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies >> > > > > -- > Regards, > Peter Teoh > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: MAX limit of file descriptor
perhaps i can add more info, after doing more investigation: a. "ulimit" is a shell feature, it is not a command line binary. "man bash" and "man sh" and u can see ulimit has different feature available for u. b. ulimit control all the resources defined by the processes spawn from the current shell onwards...ie, once ulimit is change, all child processes from that shell onwards will change. but resources limit in another shell, existing processes etc does not. c. ulimit is a userspace feature, the kernel will have all the corresponding feature of max open files etc...but definitely it is not unlimited like that of ulimit. d. to see ALL the open files u can use "lsof" and "-p" give u control to point at which process to dig for open files. it also list all the open connections (TCP) for uwhich is what u want. e. generally java applications will open many many files descriptor concurrently: http://www.java.net/forum/topic/glassfish/glassfish/too-many-open-files-issue (above listed 4500, and many others java apps like IBM RSA also have many). On Sat, Feb 9, 2013 at 1:10 PM, horseriver wrote: > hi:) > >In one process ,what is the max number of opening file descriptor ? >Can it be set to infinite ? > >In network programing ,what is the essential for the maximum of > connections >dealed per second > > thanks! > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: MAX limit of file descriptor
On Sun, Feb 10, 2013 at 8:29 PM, wrote: > Hi! > > On 13:10 Sat 09 Feb , horseriver wrote: > > hi:) > > > >In one process ,what is the max number of opening file descriptor ? > > Type "ulimit -a" in your shell. On my system (debian) the default is 1024. > Hi Michael, nice to see u again. BTW, many of the parameters as reported by ulimit, also has to be taken with some doubts: ulimit -a core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 47543 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 47543 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited the above is for Ubuntu 12.04 with 32-bit kernel (3.2.0) but of course we know that max file size has a limit - depending on whether it is ext2 or ext3 or ext4. cannot remember the exact nos, but general conceptual level, there is a limit. even for "CPU time"...it is limited by the underlying bit length of representation for time. as usual...i don't know the details :-(, just concept. sorry :-(. > > >Can it be set to infinite ? > > Maybe, but at least it can be set very high. > > >In network programing ,what is the essential for the maximum of > connections > >dealed per second > > - Use non blocking i/o and epoll(). Do *not* create 1 process/thread for > each > connection and do not use use select(). > - Obviously, the more memory your application uses, the more memory has to > be > put in the server. IIRC, 1 tcp connection uses ~1kb kernel memory. > - The same applies for cpu time. On the system side, you may want to > recommend > network adaptors which can be switched to polling instead of raising 1 > interrupt per packet. You should expect to see lots of small packets on > the > network. > > -Michi > -- > programing a layer 3+4 network protocol for mesh networks > see http://michaelblizek.twilightparadox.com > > _______ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: printk question:why release console_sem after logbuf_lock
Details u have to look at the source code, my guess is based on the following posting: http://kerneltrap.org/mailarchive/linux-kernel/2008/1/23/595569 https://lkml.org/lkml/2011/6/21/249 logic: a. u want to be able to do printk from anywhere. b. but every call to printk requires a console_sem lock to be acquired. after acquiring console_sem, printk actually serializes the output to a memory buffer. c. now problem arises when printk is happening very fast, and so this type of locks is ill-suited for printk(). d. later than this patch is another attempt: https://patchwork.kernel.org/patch/1760211/ https://lkml.org/lkml/2012/10/20/90 where lazy irq work is being used instead. read through the comments in the intro to the patch - it covers a lot more than i mentioned here. In Documentation/lockdep_design.txt discuss about using irq tracing to trace the lock dependencies. Lock inversion is a common computer science problemlook up wiki. On Sat, Feb 9, 2013 at 10:55 AM, buyitian wrote: > in the patch 0b5e1c5255e7ee8670e077e8224e5c2281229a5b, it releases > console_sem after logbuf_lock, the description of this patch is as below: > > Release console_sem after unlocking the logbuf_lock so that we don't > generate wakeups while holding logbuf_lock. This avoids some lock > inversion troubles once we remove the lockdep_off bits between > logbuf_lock and rq->lock (prints while holding rq->lock vs doing > wakeups while holding logbuf_lock). > There's of course still an actual deadlock where the printk()s under > rq->lock will issue a wakeup from the up() call, but lockdep won't > warn about that since semaphores are not tracked. > > could you please give me a detail example about the issue it tries > to fix? thanks. > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Kernel code interrupted by Timer
On Sun, Feb 10, 2013 at 12:22 AM, Frederic Weisbecker wrote: > 2013/2/9 Peter Teoh : > > A search in the entire subtree of arch/x86/ and including all its > > subdirectories, (for 3.2.0 kernel) return only TWO result where > > preempt_schedule_irq is called: kernel/entry_64.S and > kernel/entry_32.S. > > And the called is in fact resume_kernel(), ie, it is NOT called from > timer > > interrupt, but from wakeup context of the CPU, and is only executed ONCE > > upon waking up from hibernation. > > > > for example, calling from here: > > > > https://lkml.org/lkml/2012/5/2/298 > > > > so definitely this preempt_schedule_irq() calling from irq mode is rare > - at > > least for x86. > > The name "resume_kernel" can indeed sound like something that is > called on hibernation resume. It's actually not related at all. It's a > piece of code that is called at the end of every irq and exception > when the interrupted code was running in the kernel. If the > interrupted code was running in userspace, we jump to > resume_userspace. > well, i guessed u must be the expert here, i have yet to really digest all these...:-). thanks for the explanation. -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Kernel code interrupted by Timer
27;s actually fine. Later on, the scheduler restores the previous task > > to the middle of preempt_schedule_irq() and the irq completes its > Sorry didn't understand this sentence i.e. "scheduler restores the > previous task to the middle of preempt_schedule_irq()". > > return to what it interrupted. The state of the processor prior to the > > interrupt is stored on the task stack. So we can restore that anytime. > > Note if the irq interrupted userspace, it can do about the same thing, > > except it calls schedule() directly instead of preempt_schedule_irq(). > > > > ___ > > Kernelnewbies mailing list > > Kernelnewbies@kernelnewbies.org > > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > > > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: MAX limit of file descriptor
i can only make a general statement, may not be always true/false: in the kernel almost EVERYTHING HAS TO BE FINITEand this is cater for the fact that but at the userspace or application level, u can design structures to be infinite. eg, I used python for large number calculation, and so far it has not limits, but I am sure at the representation level, there is onebut because i don't know the datastructure used, i don't know the limits. On Sat, Feb 9, 2013 at 1:10 PM, horseriver wrote: > hi:) > >In one process ,what is the max number of opening file descriptor ? >Can it be set to infinite ? > >In network programing ,what is the essential for the maximum of > connections >dealed per second > > thanks! > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Kernel code interrupted by Timer
On Sat, Feb 9, 2013 at 4:20 PM, Peter Teoh wrote: > > > On Sat, Feb 9, 2013 at 3:51 PM, anish kumar > wrote: > >> On Sat, 2013-02-09 at 14:57 +0800, Peter Teoh wrote: >> > >> > >> > On Sat, Feb 9, 2013 at 1:47 PM, anish kumar >> > . >> > Timer interrupts is supposed to cause scheduling and scheduler >> > may or >> > may not pick up your last process(we always use the term >> > "task" in >> > kernel space) after handling timer interrupt. >> > > >> > >> > >> > >> > Sorry if I may disagree, correct me if wrong. Timer interrupt and >> > scheduler is two different thing. I just counted in the "drivers" >> > subdirectory, there are at least more than 200 places where >> > "setup_timer()" is called, and these have nothing to do with >> > scheduling. For eg, heartbeat operation etc. Not sure I >> > misunderstood something? >> Have a look at kernel/timer.c and kernel/hrtimer.c. >> There are many sched() calls in these files.This will invoke scheduler. >> > >> > > kernel/timer.c and kernel/hrtimer.c are implementing the logic outside of > timer interrupt context, ie, it is NOT executed in timer interrupt context, > but in bottom half context. the real timer interrupt context is done in > arch-specific branch: arch/x86/kernel/tsc.c, for example, and the entire > tsc.c has no scheduling concept in it. the entire file tsc.c in fact is > handling all the hardware-specific stuff - in the top-half context. > one mistake here: kernel/timer.c is running in bottom half interrupt context, which is still in interrupt context/mode.but as I glanced through the entire kernel/timer.c, there is no task scheduling called anywhere in this file. it is doing timer scheduling in fact. whereas the context switching we were discussing, that necessitate consistent state maintenance, is done in task scheduling (inside kernel/sched.c). of course sometimes timer interrupt will trigger task scheduling logic sometime, but it is not always..not sure if my statement is correct? (no time to search the source, please pardon me). > > in linux kernel scheduling is done in two ways: voluntary and involuntary > scheduling. involuntary scheduling means it is triggered by timer > interrupt. but voluntary scheduling (which is only recently introduced > into kernel for performance reasons) drastically improve the latency > numbers.voluntary scheduling is NOT triggered by timer, but ANYONE who > want to give up the CPU can call sched_cpu() to do a rescheduling. > > hope i am not wrong. > > >> > >> > -- >> > Regards, >> > Peter Teoh >> >> >> > > > -- > Regards, > Peter Teoh -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Kernel code interrupted by Timer
On Sat, Feb 9, 2013 at 3:51 PM, anish kumar wrote: > On Sat, 2013-02-09 at 14:57 +0800, Peter Teoh wrote: > > > > > > On Sat, Feb 9, 2013 at 1:47 PM, anish kumar > > . > > Timer interrupts is supposed to cause scheduling and scheduler > > may or > > may not pick up your last process(we always use the term > > "task" in > > kernel space) after handling timer interrupt. > > > > > > > > > > > Sorry if I may disagree, correct me if wrong. Timer interrupt and > > scheduler is two different thing. I just counted in the "drivers" > > subdirectory, there are at least more than 200 places where > > "setup_timer()" is called, and these have nothing to do with > > scheduling. For eg, heartbeat operation etc. Not sure I > > misunderstood something? > Have a look at kernel/timer.c and kernel/hrtimer.c. > There are many sched() calls in these files.This will invoke scheduler. > > > kernel/timer.c and kernel/hrtimer.c are implementing the logic outside of timer interrupt context, ie, it is NOT executed in timer interrupt context, but in bottom half context. the real timer interrupt context is done in arch-specific branch: arch/x86/kernel/tsc.c, for example, and the entire tsc.c has no scheduling concept in it. the entire file tsc.c in fact is handling all the hardware-specific stuff - in the top-half context. in linux kernel scheduling is done in two ways: voluntary and involuntary scheduling. involuntary scheduling means it is triggered by timer interrupt. but voluntary scheduling (which is only recently introduced into kernel for performance reasons) drastically improve the latency numbers.voluntary scheduling is NOT triggered by timer, but ANYONE who want to give up the CPU can call sched_cpu() to do a rescheduling. hope i am not wrong. > > > > -- > > Regards, > > Peter Teoh > > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Kernel code interrupted by Timer
On Sat, Feb 9, 2013 at 1:47 PM, anish kumar . > > Timer interrupts is supposed to cause scheduling and scheduler may or > may not pick up your last process(we always use the term "task" in > kernel space) after handling timer interrupt. > > > Sorry if I may disagree, correct me if wrong. Timer interrupt and scheduler is two different thing. I just counted in the "drivers" subdirectory, there are at least more than 200 places where "setup_timer()" is called, and these have nothing to do with scheduling. For eg, heartbeat operation etc. Not sure I misunderstood something? -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Kernel code interrupted by Timer
On Sat, Feb 9, 2013 at 8:08 AM, Peter Teoh wrote: > > > On Sat, Feb 9, 2013 at 1:08 AM, Gaurav Jain wrote: > >> What happens if the kernel executing in some process context (let's say >> executing a time-consuming syscall) gets interrupted by the Timer - which >> is apparently allowed in 2.6 onwards kernels. >> >> My understanding is that once the interrupt handler is done executing, we >> should switch back to where the kernel code was executing. Specifically, >> the interrupt handler for the Timer interrupt should not schedule some >> other task since that might leave kernel data in an inconsistent state - >> kernel didn't finish doing whatever it was doing when interrupted. >> > > at the microscopic level, every stream of assembly instructions can always > be broken up and intercepted by interrupt, and possibly switched into > another stream of assembly instruction or logic, the maintenance of state > "consistency" is done via context switching. > context switching is done at software level, and i am not if there is difference between process context switch or thread/task level context switching, but hardware only guarantee register context switch - and not sure if it covers all the floating point (SSE) registers too (unlikely, performance overheads)so.."consistency" is really how you write your software. and u also have multiple switching (by different CPU) all taking place independently all the time, writing into the same piece of RAM. I also know that nvidia GPU does not clean up its memory/state when switching from one process to another process, but that is beyond the control of hardware switching logic of the CPU anyway. > >> So, does the Timer interrupt handler include such a policy for the above >> case? >> >> -- >> Gaurav Jain >> >> >> >> >> _______ >> Kernelnewbies mailing list >> Kernelnewbies@kernelnewbies.org >> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies >> >> > > > -- > Regards, > Peter Teoh -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Kernel code interrupted by Timer
On Sat, Feb 9, 2013 at 1:08 AM, Gaurav Jain wrote: > What happens if the kernel executing in some process context (let's say > executing a time-consuming syscall) gets interrupted by the Timer - which > is apparently allowed in 2.6 onwards kernels. > > My understanding is that once the interrupt handler is done executing, we > should switch back to where the kernel code was executing. Specifically, > the interrupt handler for the Timer interrupt should not schedule some > other task since that might leave kernel data in an inconsistent state - > kernel didn't finish doing whatever it was doing when interrupted. > at the microscopic level, every stream of assembly instructions can always be broken up and intercepted by interrupt, and possibly switched into another stream of assembly instruction or logic, the maintenance of state "consistency" is done via context switching. > So, does the Timer interrupt handler include such a policy for the above > case? > > -- > Gaurav Jain > > > > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: hd controller
good sharing. following up on your comments: in the kernel source: block/*.c are the files for block I/O related stuff - the layer just before ATA, implementing stuff like elevator I/O etc. drivers/block/*.c: hardware-specific files that understand how to talk to each type of harddisk. drivers/scsi/*.c: generally SCSI protocol related stuff (lib*.c), but may contain device specific stuff. drivers/ide/*.c: drivers/ata/*.c: among the lowest level just before sending out port I/O operation. On Fri, Feb 8, 2013 at 8:26 AM, wrote: > On Fri, 08 Feb 2013 07:48:39 +0800, Peter Teoh said: > > > So the drivers just literally concatenate these command into a string and > > send it over to the device. > > The reason that good disk drivers are hard to write is because it isn't > *just* literally concatenating the commands - it also has to do memory > management (make sure that everybody's data ends up in the right buffers), > command queue management, elevator management (if there's multiple I/O > requests pending from userspace, what order do we issue them in?), error > recovery, power management, and a ton of other stuff... > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: hd controller
at the lowest level, SCSI/IDE/SATA all shared a common command base (perhaps with variations) - which is ATA command (because in drivers/ata/*.c u can find the symbol ATA_XXX_CMD in all the three different hardware architecture): Below is a an example specified by standard body (these command are OS agnostic): https://github.com/gcastigl/SO2C2011TP2/blob/master/doc/ATA%20-%20ATAPI%20Command%20Set.pdf Look at all the "ATA_CMD_*" command here: https://github.com/Scorpiion/Renux_u-boot/blob/master/include/ata.h So the drivers just literally concatenate these command into a string and send it over to the device. for example in drivers/ata/libata-core.c: static int ata_read_native_max_address(struct ata_device *dev, u64 *max_sectors) { unsigned int err_mask; struct ata_taskfile tf; int lba48 = ata_id_has_lba48(dev->id); ata_tf_init(dev, &tf); /* always clear all address registers */ tf.flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR; if (lba48) { tf.command = ATA_CMD_READ_NATIVE_MAX_EXT; tf.flags |= ATA_TFLAG_LBA48; } else tf.command = ATA_CMD_READ_NATIVE_MAX; the tf.command data within is ultimately send by port I/O operation. BUT.not sure of details, corrections welcome :-). On Thu, Feb 7, 2013 at 4:19 PM, horseriver wrote: > hi:) > >I am curious about how hd controller work . >When user am reaing/writing hd ,it was implemented by sending command >to hd controller's special port.Then ,how does the controller know >a new command has received? > >In this procedure , what work does the hd driver do ? > > thanks! > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: thread concurrent file operation
To generalize further u can safely say that all synchronous operation have to be thread-safe, except for some APIs as listed here: http://pubs.opengroup.org/onlinepubs/007904975/functions/xsh_chap02_09.html linux kernel may guarantee thread-safety - but this only apply to serializing data at the per-syscall level. Ie, every read() will complete, before being intercepted by another read() from another thread. But at the file level u still may get file corruption/file datastructure mangled if u mixed write/read without properly serialization at the userspace level. thus, kernel locking + userspace locking are needed - for different purpose. below discussion is useful (first answer esp): http://stackoverflow.com/questions/5268307/thread-safety-of-read-pread-system-calls in the kernel for each file descriptor, there is only one single offset value to indicate the current file pointer position. so at the userspace level, different read/write combination will affect the file pointer value - which explained also why userspace locking (for logical reasons) are needed. On Thu, Feb 7, 2013 at 6:23 PM, Peter Teoh wrote: > Multiple concurrent write() by different thread is possible, as they all > can share the same file descriptor in a single similar process, and this is > not allowed. So nevertheless, the problem you posed is not > allowed/acceptable by the kernel, so Linus himself fixed it: > > See here: > > http://lwn.net/Articles/180387/ > > And Linus patch: > > http://lwn.net/Articles/180396/ > > but my present version (3.2.0) has rcu lock over it (higher performance): > > INIT_LIST_HEAD(&f->f_u.fu_list); > atomic_long_set(&f->f_count, 1); > rwlock_init(&f->f_owner.lock); > spin_lock_init(&f->f_lock); > eventpoll_init_file(f); > /* f->f_version: 0 */ > > > On Thu, Feb 7, 2013 at 4:44 PM, Karaoui mohamed lamine < > mohar...@gmail.com> wrote: > >> >> Tahnks guys! >> >> 2013/1/30 Karaoui mohamed lamine >> >>> thanks, i think i get it. >>> >>> 2013/1/30 >>> >>> On Tue, 29 Jan 2013 20:16:26 +0100, you said: >>>> >>>> > Actually my question is : >>>> > Does POSIX specifies the fact that we need to use "lockf" to be able >>>> to do >>>> > read/write operation in different offset ? Is'n the kernel supposed to >>>> > ensure this ? >>>> >>>> If you have non-overlapping writes, the kernel will eventually sort it >>>> out >>>> for you. If your writes overlap, you'll have to provide your own >>>> locking >>>> via lockf() or similar, and synchronization via other methods. >>>> >>> >>> >> >> ___ >> Kernelnewbies mailing list >> Kernelnewbies@kernelnewbies.org >> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies >> >> > > > -- > Regards, > Peter Teoh > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: thread concurrent file operation
Multiple concurrent write() by different thread is possible, as they all can share the same file descriptor in a single similar process, and this is not allowed. So nevertheless, the problem you posed is not allowed/acceptable by the kernel, so Linus himself fixed it: See here: http://lwn.net/Articles/180387/ And Linus patch: http://lwn.net/Articles/180396/ but my present version (3.2.0) has rcu lock over it (higher performance): INIT_LIST_HEAD(&f->f_u.fu_list); atomic_long_set(&f->f_count, 1); rwlock_init(&f->f_owner.lock); spin_lock_init(&f->f_lock); eventpoll_init_file(f); /* f->f_version: 0 */ On Thu, Feb 7, 2013 at 4:44 PM, Karaoui mohamed lamine wrote: > > Tahnks guys! > > 2013/1/30 Karaoui mohamed lamine > >> thanks, i think i get it. >> >> 2013/1/30 >> >> On Tue, 29 Jan 2013 20:16:26 +0100, you said: >>> >>> > Actually my question is : >>> > Does POSIX specifies the fact that we need to use "lockf" to be able >>> to do >>> > read/write operation in different offset ? Is'n the kernel supposed to >>> > ensure this ? >>> >>> If you have non-overlapping writes, the kernel will eventually sort it >>> out >>> for you. If your writes overlap, you'll have to provide your own locking >>> via lockf() or similar, and synchronization via other methods. >>> >> >> > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Creating scheduler
well...u asked for it: http://abstract.cs.washington.edu/~shwetak/classes/ee472/assignments/lab2/lab2.pdf http://www.cs.cmu.edu/~410-s07/p3/kernel.pdf http://web.stonehill.edu/compsci/CS314/Assignments/Assignment0.pdf http://www.cs.amherst.edu/~sfkaplan/courses/2012/spring/cs261/assignments/project-1.pdf etc...googling returned me 27000 links On Thu, Feb 7, 2013 at 1:49 AM, jeshkumar...@gmail.com < jeshkumar...@gmail.com> wrote: > Hi all :), > > Can anyone suggest a good tutorial to create our own scheduler ? > > > Sent from my HTC > Excuse for typo. > > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: thread concurrent file operation
in ANY updates/changes, locking is always needed, to prevent multiple parties from updating at the same time. but there is another way: lockless updates. one form done in linux kernel is called RCU: http://en.wikipedia.org/wiki/Read-copy-update the logic is whenever someone want to change, just write the changes somewhere, so that reconstruction of the change is possible through reading the changes + existing data. (Oracle database, and indeed any database does that too.). so if multiple CPU want to write to the same place, then u still need per-CPU locks for classic RCU: http://lwn.net/Articles/305782/ But for reader, there is no need to lock: just go ahead and read - if u read AFTER the update has started, then u will be reading the older copy, and the last reader will then kick off the merging of the older copy + newer updates. http://lwn.net/2001/features/OLS/pdf/pdf/read-copy.pdf http://lwn.net/Articles/262464/ http://lwn.net/Articles/263130/ (see the picture here) but these locking are done at the low level - harddisk is data block level. For vfs_read() - its purpose is to read...and it does not prevent u from writing!!! yes, everything is left to the user at the userspace level...locking/unlocking. because it is done at the FILE level, and so if u have multiple reads and then someone come in and writeyes, there will be corruption. but that is the logic corruption, not the hardware/datablocks corruption, which the kernel aimed to protect. On Tue, Jan 29, 2013 at 11:35 PM, Karaoui mohamed lamine wrote: > Hello, > > I was looking at how a syscall read/write was done, and i found this : > > >loff_t pos = file_pos_read(f.file); >ret = vfs_read(f.file, buf, count, &pos); >file_pos_write(f.file, pos); >fdput(f); >... > > My questions are : > > Where did the locking go? I would have imaginated something like : > > >*lock(f);* >loff_t pos = file_pos_read(f.file); >ret = vfs_read(f.file, buf, count, &pos); >file_pos_write(f.file, pos); >fdput(f); >*unlock(f);* >... > > If multiple threads try to read/write at the same time, they could > read/write at the same offset ? > > If my understanding are correct, is this POSIX compliant ? > > > thanks. > > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: hard disk dirver
On Wed, Feb 6, 2013 at 1:21 PM, horseriver wrote: > hi:) > >I have a newbie question about hard ware. >At booting stage,kernel need to detect the hard device before mount it, >does this work need pci's surport? > >At loading stage ,boot loader need to move binaries from hard disk > partition >to ram,does this work need pci's surport? > hard disk I/O is in ATA bus, and PCI has it own bus on the chipset (see page 69): http://downloadmirror.intel.com/19123/eng/d525mw_d525mwv_techprodspec.pdf and page 14: http://download.intel.com/support/motherboards/desktop/d865gsa/sb/d5600601us.pdf But these are terminologies. At the source code level, (and tools as well), PCI and ATA are not differentiated much: in drivers/ata/ata_piix.c, and in drivers/pci/quirks.c both directory u can see 82801 symbols exists. For your problem i think it is a BOCHS problem...mixing with recent linux kernel (older kernel should be fine)...eg, http://forums.gentoo.org/viewtopic-t-915210-view-previous.html?sid=a003ebbc022d7f23399fc7f1c5dad424 (notice the 3.2 kernel) which is resolved via setting the PCI configuration in BOCHS as well. take a look. > thanks! > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: When does the /dev/sda1 node comes into being ?
i think it depends. some are softlinks in /dev/ some are created by udev after udevd read the configuration file, many scenario involved (just search for "util_create_path" inside udev source codes and u can what are all the situation). but for harddisk (whose partition is also the rootfs) /dev/sda is created during kernel booting up (inside the initrd file, just gunzip and extract out the cpio file, eg, view the file scripts/local and u can see it make the /dev/sd nodes based on /sys/block/ information, which in turn depends on the kernel calling _device_register() functions (there a few variations of them - organized hierarchically)). on the other hand, if /dev/sda is not the rootfs, but just a normal harddisk listed in /etc/fstab, then likely it is mounted by udev, detecting it, and then calling (indirectly from userspace to kernel) sd_probe_async(), which will then printk() out the "Write Protect is off" message in your dmesg output - anytime u plug in the harddisk u can see this. On Wed, Feb 6, 2013 at 1:26 AM, horseriver wrote: > hi:) > > During booting period .every device will have a node at /dev/ folder. > what is the detail of ths procedure? > > thanks! > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: How to analyze kernel Oops dump
perhaps let me try: The cause of crash is here: [ 493.113464] Unable to handle kernel paging request at virtual address f6b9f777 [ 493.124298] pgd = ec4c4000 [ 493.127166] [f6b9f777] *pgd= ie, value of page directory at 0xec4c4000 is zero. at the time of crash the set of register values are: [ 493.169158] PC is at __kmalloc_track_caller+0xa4/0x1ec [ 493.174591] LR is at 0x80569dc0 [ 493.177917] pc : [<801094d8>]lr : [<80569dc0>]psr: a113 [ 493.177947] sp : 80569dc0 ip : 89011b70 fp : 80569dfc [ 493.190124] r10: 1fea r9 : 0001 r8 : [ 493.195648] r7 : 0940 r6 : 00d1 r5 : ed002900 r4 : f6b9f777 [ 493.202575] r3 : 80568000 r2 : r1 : 08aa8000 r0 : 80589c00 [ 493.209503] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel [ 493.217254] Control: 10c5387d Table: ec4c406a DAC: 0015 Take the same version of the kernel source, and u can see that line 3415 matches exactly the warning message in the error log: size_t ksize(const void *object) { struct page *page; if (unlikely(object == ZERO_SIZE_PTR)) return 0; page = virt_to_head_page(object); if (unlikely(!PageSlab(page))) { WARN_ON(!PageCompound(page)); => this is line 3415 return PAGE_SIZE << compound_order(page); } return slab_ksize(page->slab); } EXPORT_SYMBOL(ksize); ==> exported symbols results in the kernel image having "ksize" as the symbol near the crash point - which is located +0x70 from "ksize". As for the reason the page's compound page attributes has not been set correctly.u have to read the history: [ 494.068664] Backtrace: [ 494.071289] [<80109434>] (__kmalloc_track_caller+0x0/0x1ec) from [<80335ec0>] (__alloc_skb+0x60/0xfc) [ 494.081085] [<80335e60>] (__alloc_skb+0x0/0xfc) from [<80336530>] (__netdev_alloc_skb+0x2c/0x54) [ 494.090423] [<80336504>] (__netdev_alloc_skb+0x0/0x54) from [<7f078788>] (stmmac_poll+0x590/0x794 [stmmac]) [ 494.100738] r4:ed0b84c0 r3: [ 494.104553] [<7f0781f8>] (stmmac_poll+0x0/0x794 [stmmac]) from [<8033f23c>] (net_rx_action+0x88/0x1f0) [ 494.114440] [<8033f1b4>] (net_rx_action+0x0/0x1f0) from [<80045fb4>] (__do_softirq+0x12c/0x260) [ 494.123657] [<80045e88>] (__do_softirq+0x0/0x260) from [<8004659c>] (irq_exit+0x58/0xb0) [ 494.132263] [<80046544>] (irq_exit+0x0/0xb0) from [<8000fa08>] (handle_IRQ+0x8c/0xc8) [ 494.140563] r4:0078 r3:020c [ 494.144378] [<8000f97c>] (handle_IRQ+0x0/0xc8) from [<80008658>] (gic_handle_irq+0x48/0x6c) [ 494.153228] r5:80569f40 r4:fa212000 [ 494.157043] [<80008610>] (gic_handle_irq+0x0/0x6c) from [<8000e600>] (__irq_svc+0x40/0x70) [ 494.165802] Exception stack(0x80569f40 to 0x80569f88) >From the above, I can only guess the possible calling sequence are as below: In net/core/skbuff.c: 170 struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, 171 int fclone, int node) 172 { xx 200 size = SKB_WITH_OVERHEAD(ksize(data)); 201 prefetchw(data + size); 202 notice the _alloc_skb()==>ksize(), which ended up with *pgd error above? looked also a few functions below stmmac_poll() (as the offset 0x590 is quite far away from stmmac_poll(), so it is unlikely to be this function itself, as other subsequent function after this is declared with "static", meaning that it does not have symbol, so disassembly-wise will still use the "stmmac_poll" symbol. Seemed like descriptor related bug. See this: http://comments.gmane.org/gmane.linux.network/236183 whose version comes after 3.4.0, or 3.4.6 - to be specific: http://lwn.net/Articles/507526/ -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: How the follow Starts in Android-Kernel
http://duartes.org/gustavo/blog/post/how-computers-boot-up this is for x86, not for ARM though. On Wed, Feb 6, 2013 at 10:30 AM, Peter Teoh wrote: > > normally in embedded uboot is the bootloader. and to trace this is > simple: > > a. understand how uboot works - and this is highly platform specific > (uboot is highly hardware dependent)...and examine the point where control > passed is passed to kernel image file (which still run at 16 bit real > mode), and from there u can trace everything. > > b. well u need assembly, as everything starting is written in assembly. > for ARM (as u asked for Android), the place is "start_kernel" inside: > > arch/arm/kernel/head-common.S > > and then u must learn linker scripting (for ARM is > arm/kernel/vmlinux.ld.S) as well, that is how u tell the compiler to > generate a image that can be loaded directly into memory and executed > directly on the hardware in memory - using the hardware-specific reset > vector as the starting point. there is no loader at this stage to load > the binary. (uboot will load it as a image, but executeable). > > the rest is yours... > > On Mon, Feb 4, 2013 at 12:34 PM, Ranganath T.M wrote: > >> Hi All, >> >> I am trying to find out how the kernel will *start* from the uboot and >> how the kernel will call there respective static modules which are built as >> *.o* file and also how the *probe* of every modules will be called. >> >> Thanks And Regards >> Ranganath >> >> ___ >> Kernelnewbies mailing list >> Kernelnewbies@kernelnewbies.org >> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies >> >> > > > -- > Regards, > Peter Teoh > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: How the follow Starts in Android-Kernel
normally in embedded uboot is the bootloader. and to trace this is simple: a. understand how uboot works - and this is highly platform specific (uboot is highly hardware dependent)...and examine the point where control passed is passed to kernel image file (which still run at 16 bit real mode), and from there u can trace everything. b. well u need assembly, as everything starting is written in assembly. for ARM (as u asked for Android), the place is "start_kernel" inside: arch/arm/kernel/head-common.S and then u must learn linker scripting (for ARM is arm/kernel/vmlinux.ld.S) as well, that is how u tell the compiler to generate a image that can be loaded directly into memory and executed directly on the hardware in memory - using the hardware-specific reset vector as the starting point. there is no loader at this stage to load the binary. (uboot will load it as a image, but executeable). the rest is yours... On Mon, Feb 4, 2013 at 12:34 PM, Ranganath T.M wrote: > Hi All, > > I am trying to find out how the kernel will *start* from the uboot and > how the kernel will call there respective static modules which are built as > *.o* file and also how the *probe* of every modules will be called. > > Thanks And Regards > Ranganath > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Linux Kernel Networking document (free, 178 pages doc)
http://www.haifux.org/lectures.html This link has even more lectures. On Mon, Feb 4, 2013 at 11:55 AM, Peter Teoh wrote: > Good sharing and info. I thought it is also useful to share your > lectures materials at: > > http://www.haifux.org/rami_rosen.html > > which I must highlight has lots of work done since 2007. Keep up the > good work!! > > > On Tue, Jan 29, 2013 at 12:53 AM, Rami Rosen wrote: > >> Hi everyone, >> You can find here an up to date and detailed document in pdf (178 >> pages) about Linux Kernel Networking; going deep into design and >> implementation details as well as the theory behind it: >> http://media.wix.com/ugd//295986_931b8bcf34d93419d46e05b5aa5d0216.pdf >> >> I believe that developers/sysadmins/researchers/students may find help >> with it. >> >> >> regards, >> Rami Rosen >> >> http://ramirose.wix.com/ramirosen >> >> ___ >> Kernelnewbies mailing list >> Kernelnewbies@kernelnewbies.org >> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies >> > > > > -- > Regards, > Peter Teoh > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Linux Kernel Networking document (free, 178 pages doc)
generally, anything u write for ext2, should still be valid for ext3, and ext4. in the sense that the features are backward compatible. sizing limits may have increased, but OLD working mechanism should still be validexcept for some. so ext2 fs should still be mountable as ext4, but not vice versa, once some flag is enabled (I think it is xattr). and if the flag is not enabled, and the journal logs is clean, then ext4 fs is also mountable as ext2 fs: http://superuser.com/questions/408822/ext4-converted-mounted-as-ext2 http://computer-forensics.sans.org/blog/2011/06/14/digital-forensics-mounting-dirty-ext4-filesystems http://en.wikipedia.org/wiki/Extended_file_attributes On Sun, Feb 3, 2013 at 12:26 AM, Rami Rosen wrote: > Hi, > > ext2 and ext3 are kind of obsolete now. > > Indeed, ext4 was integrated into Linux kernel back in 2008. > Amongs its known features which do not exist in ext3 are support for > huge files (like 1 EB (exabyte or somtimes termed exbibyte); 1 EB is > 1024 PB (petabyte) whereas > 1 PB is 1024 TB (terabyte). > a directory can contain a maximum of 64,000 subdirectories (whereas we > have 32,000 in ext3) > Amongst its other features are Journal checksumming, Multiblock > allocator, Faster file system checking and more. > > > If you prefer to start with simpler implementations, ext3 is of course > simpler, and of course ext2 is even simpler than ext3. > > But in case you intend to start with ext2/ext3, and later perform > a pass on all your documentation to update it to ext4, take into > consideration that this will take quite a time; depending on how deep > you intend to delve into implementation details. > > Good luck! > > Regards, > Rami Rosen > http://ramirose.wix.com/ramirosen > > > > On Sat, Feb 2, 2013 at 11:43 AM, Shubham Sharma > wrote: > > Hi, > > > > I understand that ext2 and ext3 are kind of obsolete now. But AFAIK, > there > > is not much difference in ext3 and ext4. > > > > Moreover for a newbie , it is better to start with ext3. What you think ? > > > > Regards > > Shubham > > > > > > On Fri, Feb 1, 2013 at 2:15 AM, Rami Rosen wrote: > >> > >> Hi, > >> Have you considered to start with ext4? > >> it seems that ext3, ext2 are a bit out of fashion, > >> > >> Regards, > >> Rami Rosen > >> http://ramirose.wix.com/ramirosen > >> > >> > >> On Thu, Jan 31, 2013 at 8:58 PM, shubham > wrote: > >> > Thanks Rami, > >> > > >> > I am also trying to understand ext3 and write some document for the > >> > same. > >> > > >> > Regards > >> > Shubham > >> > > >> > > >> > On 31-Jan-13 12:51 AM, Rami Rosen wrote: > >> >> > >> >> HI, > >> >> I will try to write something for Linux Filesystems (and maybe for > >> >> other subsystems) but this will probably take a lot of time. > >> >> > >> >> Regards, > >> >> Rami Rosen > >> >> http://ramirose.wix.com/ramirosen > >> >> > >> >> > >> >> On Wed, Jan 30, 2013 at 5:44 PM, shubham > >> >> wrote: > >> >>> > >> >>> Thanks for sharing the document. > >> >>> > >> >>> I hope we could have such documents for other subsystems as well. > >> >>> > >> >>> Regards > >> >>> Shubham > >> >>> > >> >>> > >> >>> On 28-Jan-13 10:23 PM, Rami Rosen wrote: > >> >>>> > >> >>>> Hi everyone, > >> >>>> You can find here an up to date and detailed document in pdf (178 > >> >>>> pages) about Linux Kernel Networking; going deep into design and > >> >>>> implementation details as well as the theory behind it: > >> >>>> > http://media.wix.com/ugd//295986_931b8bcf34d93419d46e05b5aa5d0216.pdf > >> >>>> > >> >>>> I believe that developers/sysadmins/researchers/students may find > >> >>>> help > >> >>>> with it. > >> >>>> > >> >>>> > >> >>>> regards, > >> >>>> Rami Rosen > >> >>>> > >> >>>> http://ramirose.wix.com/ramirosen > >> >>>> > >> >>>> ___ > >> >>>> Kernelnewbies mailing list > >> >>>> Kernelnewbies@kernelnewbies.org > >> >>>> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > >> >>> > >> >>> > >> > > > > > > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Linux Kernel Networking document (free, 178 pages doc)
Good sharing and info. I thought it is also useful to share your lectures materials at: http://www.haifux.org/rami_rosen.html which I must highlight has lots of work done since 2007. Keep up the good work!! On Tue, Jan 29, 2013 at 12:53 AM, Rami Rosen wrote: > Hi everyone, > You can find here an up to date and detailed document in pdf (178 > pages) about Linux Kernel Networking; going deep into design and > implementation details as well as the theory behind it: > http://media.wix.com/ugd//295986_931b8bcf34d93419d46e05b5aa5d0216.pdf > > I believe that developers/sysadmins/researchers/students may find help > with it. > > > regards, > Rami Rosen > > http://ramirose.wix.com/ramirosen > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: How to wake_up the wait_queue of a socket?
On Sat, Jan 19, 2013 at 1:36 AM, horseriver wrote: > On Fri, Jan 18, 2013 at 10:18:19AM +0800, Peter Teoh wrote: > > essentially, when the packet arrive, it will be assigned to the correct > > process based on IP address + port matching, and then the corresponding > > process's blocked scheduling status will be changed to continue > execution, > > so that when the scheduler next selection of runnable process will pick > him > > out for continue execution. The process will then pick his data up from > > the network queue. > > > > Thanks! > > If there is no event occured on one socket descriptor , > will the poll operation on this socket descriptor be blocked ? > I/O mechanism have two types: blocking and non-blocking. by definition: poll is non-blocking, and select() is blocking. In general that is true for kernel source as well. For details and implementations there may be ambiguity. For eg, manpage say poll may has a timeout for blocking, and inside the kernel source: in fs/select.c's definition for select() syscall: SYSCALL_DEFINE5(select, int, n, fd_set __user *, inp, fd_set __user *, outp, fd_set __user *, exp, struct timeval __user *, tvp) { struct timespec end_time, *to = NULL; struct timeval tv; int ret; if (tvp) { if (copy_from_user(&tv, tvp, sizeof(tv))) return -EFAULT; to = &end_time; if (poll_select_set_timeout(to, tv.tv_sec + (tv.tv_usec / USEC_PER_SEC), (tv.tv_usec % USEC_PER_SEC) * NSEC_PER_USEC)) return -EINVAL; } ret = core_sys_select(n, inp, outp, exp, to); ret = poll_select_copy_remaining(&end_time, tvp, 1, ret); And for syscall of poll() (same file): SYSCALL_DEFINE3(poll, struct pollfd __user *, ufds, unsigned int, nfds, long, timeout_msecs) { struct timespec end_time, *to = NULL; int ret; if (timeout_msecs >= 0) { to = &end_time; poll_select_set_timeout(to, timeout_msecs / MSEC_PER_SEC, NSEC_PER_MSEC * (timeout_msecs % MSEC_PER_SEC)); } So there is this common file poll_select_set_timeout() called by boththe details is even more confusing - shall stop here. A good article on epoll etc: http://www.eecs.berkeley.edu/~sangjin/2012/12/21/epoll-vs-kqueue.html > > > ___ > > > Kernelnewbies mailing list > > > Kernelnewbies@kernelnewbies.org > > > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > > > > > > > > > > > -- > > Regards, > > Peter Teoh > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: no error thrown with exit(0) in the child process of vfork()
On Sat, Jan 19, 2013 at 10:43 AM, Peter Teoh wrote: > > > On Sat, Jan 19, 2013 at 5:49 AM, wrote: > >> On Fri, 18 Jan 2013 19:59:38 +0530, Niroj Pokhrel said: >> >> > I have been trying to create a process using vfork(). And both of the >> child >> > and the parent process execute it in the same address space. So, if I >> > execute exit(0) in the child process, it should throw some error right. >> >> Why do you think it should throw an error? >> >> > Since the execution is happening in child process first and if I release >> > all the resources by using exit(0) in the child process then parent >> should >> > be deprived of the resources and should throw some errors right ?? >> >> No, because those resources that were shared across a fork() or vfork() >> were in >> general *multiple references* to the same resource. >> >> > Yes, correct, Valdis is right. Normally, when u free resources (which is > what "exit()" will do), u must also remember to check something call > "reference count". > > Basic malloc() and free() memory management internal data structure also > comes with other info like size (which is 4 bytes BEHIND the first byte > where the pointer points to, and other info). More info: > > http://stackoverflow.com/questions/1957099/how-do-free-and-malloc-work-in-c > > but what is lacking is reference counting. But this feature is available > in Java and C++ libraries for memory allocation. > > Concept discussed here: > > http://stoneship.org/essays/c-reference-counting-and-you/ > > Interesting > > >> As an example - imagine a flagpole. You grab it with your hand, you're >> now holding it. You invite your friend to come over and grab it with >> his hand - now he's holding it too. >> >> But either one of you can let go of the flagpole - and the other one is >> still holding the flagpole until *they* let go. And the order you let >> go doesn't matter in this case - which is important because your example >> code has a race condition >> >> Note that there are other cases where the order people let go *does* >> matter. >> This is when you start having to worry about "locking order" and things >> like >> that. >> >> > In the following code, however the process ran fine even though I have >> > exit(0) in the child process >> >> > #include >> > #include >> > #include >> > #include >> > int main() >> > { >> > int val,i=0; >> > val=vfork(); >> > if(val==0) >> > { >> > printf("\nI am a child process.\n"); >> >> Note that printf() gets interesting due to stdio buffering. You probably >> want to call setbuf() and guarantee line-buffering of the output if you're >> playing these sorts of games - the buffering can totally mask a real race >> condition or other bug. >> >> > printf(" %d ",i++); >> > exit(0); >> > } >> > else >> > { >> >> /* race condition here - may want wait() or waitpid() to synchronize? */ >> >> > printf("\nI am a parent process.\n"); >> > printf(" %d ",i); >> > } >> > return 0; >> > } >> > // The program is running fine . >> > But as I have read it should throw some error right ?? I don't know >> what I >> > am missing . Please point out the point I'm missing. Thanking you in >> > advance. >> >> You're also missing the fact that after the vfork(), there's no real >> guarantee of which will run first - which means that the parent can race >> and output the 'printf("%d",i)" *before* the child process gets a chance >> to do the i++. >> >> > I don't think there is any issue here (racing, or child calling exit > before parent called exit()). Read the man-page: > >vfork() differs from fork(2) in that the parent is suspended until > the >child terminates (either normally, by calling _exit(2), or > abnormally, >after delivery of a fatal signal), or it makes a call to > execve(2). >Until that point, the child shares all memory with its parent, > includ‐ >ing the stack. The child must not return from the current > function or >call exit(3), but may call _exit(2). > > > So: > > 1. if parent is suspended, it also means
Re: no error thrown with exit(0) in the child process of vfork()
d that the > child would run first, on the theory that the child would often do > something > short that the parent was waiting on, so scheduling parent-first would just > result in the parent running, blocking to wait, and we end up running the > child anyhow before the parent could continue. It broke an *amazing* > amount > of stuff in userspace because often the child would exit() before the > parent was > ready to deal with the child process's termination. Usual failure mode was > the parent would set a SIGCHLD handler, and wait for the signal which never > happened because the SIGCHLD actually fired *before* the handler was set > up). > > (And on non-cache-coherent systems, it's even possible that the i++ happens > on a different CPU first, and the CPU running the parent process never > becomes > aware of it. See 'Documentation/memory-barriers.txt' in the Linux source > for more info on how this works for data inside the kernel. This example > is out in userspace, so other techniques are required instead to do > cross-CPU > synchronization. > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: How to wake_up the wait_queue of a socket?
essentially, when the packet arrive, it will be assigned to the correct process based on IP address + port matching, and then the corresponding process's blocked scheduling status will be changed to continue execution, so that when the scheduler next selection of runnable process will pick him out for continue execution. The process will then pick his data up from the network queue. hope I have not made any mistake in my logic? On Tue, Jan 15, 2013 at 8:36 AM, horseriver wrote: > On Tue, Jan 15, 2013 at 12:25:10PM -0500, valdis.kletni...@vt.edu wrote: > > On Mon, 14 Jan 2013 17:50:03 +0800, horseriver said: > > > > >When one datagram has reached , How to wake_up the wait_queue of > that socket ? > > > > Please clarify your question - I'm not sure which of the following you > mean: > > > 1) How does the kernel wake up the waiting process when a datagram > arrives? > > This is my mean ! > > Thanks > > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: working of fork and exec
Hi Mulyadi, Great to see you again! Sorry, can I fork on your explanation to explain further about fork? Yes, "fork" is at the core of process management, scheduling and all that: http://www.ibm.com/developerworks/linux/library/l-linux-process-management/ a good picture of process splitting up (forking) is here: http://www.linux-tutorial.info/modules.php?name=MContent&pageid=83 what happened to all the IPC after forking? http://hzqtc.github.com/2012/07/linux-ipc-with-pipes.html http://static.usenix.org/event/usenix2000/general/reumann/reumann_html/node9.html Generally, the last thing u should read is the kernel source code, though it also has the last word to be said for fork() :-). On Fri, Jan 18, 2013 at 1:59 AM, Mulyadi Santosa wrote: > Hi :) > > On Fri, Jan 18, 2013 at 12:02 AM, Niroj Pokhrel > wrote: > > Hi all, > > I have been using fork and exec for sometime. But I have no idea about > what > > are the things done by the kernel when we fork or exec and how things > work. > > How the kernel load new program and what all things are done ... Can > > anybody please explain me this ? Thank you in advance. > > this is too broad to answer, but in general fork() does: > - preparing new address space > - preparing new task_struct > - doing COW (copy on write), so newly born child initially simply use > parent's pages > > in exec() case, instead of COW, you load the target binary. It does so > by the work of loader in user space and ELF interpreter in the kernel > space. > > -- > regards, > > Mulyadi Santosa > Freelance Linux trainer and consultant > > blog: the-hydra.blogspot.com > training: mulyaditraining.blogspot.com > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: what is the function of tcp_prequeue ?
On Wed, Jan 16, 2013 at 2:50 PM, horseriver wrote: > hi: > > what is the function of tcp_prequeue ? > > Basically there are 3 types of TCP queuing (use google translate if u need non-Chinese): http://www.360doc.com/content/09/0518/15/36491_3551831.shtml See here for a detailed overview: http://e-university.wisdomjobs.com/linux/chapter-189-277/sending-the-data-from-the-socket-through-udp-and-tcp.html > thanks! > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: what is the difference between poll and epoll ?
to wait on a particular event based on the file descriptor. http://stackoverflow.com/questions/9167752/how-does-the-poll-function-work-in-c http://linux.die.net/man/3/poll look at the example if u don't understand the man page: http://linux.byexamples.com/archives/133/write-a-function/ and this comes with explanation + example: http://www.linux-mag.com/id/357/ On Mon, Jan 14, 2013 at 4:20 AM, horseriver wrote: > hi: > > what is the function of a file's poll function ? > > > thanks! > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: to implement cntl+c from shell script on minicom
Sorry I have not done any runscript programming before, but reading this: http://lists.alioth.debian.org/pipermail/minicom-devel/2008/000904.html (search for the "^C") and referring to the documentation: http://linux.die.net/man/1/runscript and assuming the above example script runs well, it seemed that the difference now is that your script has a backslash before the control character? Just my guess On Sat, Jan 12, 2013 at 8:04 PM, laliteshwar yadav wrote: > Hi Tushar, > I am facing a problem with the above implementation. > > When target is powered on, logs coming started on minicom. > After 1 second the message is coming as "Executing boot script in 3.000 > seconds - enter ^C to abort". > > Here, we need to give cntl+c command to stop the target from auto-boot. We > want to flash new image into it. > > Our task is to automate the process. > > I tried with the following code to run through runscript. > > set search_string="Executing boot script in 3.000 seconds" > > timeout 50 > verbose on > > send "\n\r\n\r" > expect { > "$search_string" break > timeout 2 goto abort > } > > abort: > print \nGiving cntl+c command on minicom > send "\^C\r" > send "\^C" > send \^C\r > send ^C\r > expect { > "RedBoot>" break > timeout 3 goto panic > } > print \n!!Bye Bye runscript!!! > sleep 2 > > panic: > print \n!!Bye Bye Minicom!!!\n > ! killall -2 minicom > > > What i am observing is , first this script is running then minicom logs > start comming. Actually it should be as first some logs should come till > the search string. Then our command cntl+c should run. > > please help me.. > > Thank you in advance.. > > > Regards, > lalit > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: What is asmlinkage ?
It is defined in include/linux/linkage.h. And more info here: http://pix.cs.olemiss.edu/csci523/kernelidioms Part of it quoted below - ultimately it falls on GCC feature (" __attribute__((regparm(0)))"): The CPP_ASMLINKAGE __attribute__((regparm(0))) Macro asmlinkage macro defines as: #define CPP_ASMLINKAGE __attribute__((regparm(0))) which defines as: #define extern "C" __attribute__((regparm(0))) This is used in the system call interface where C library routines enter the kernel after setting up their arguments and executing the trap instruction (INT 80) to enter the kernel. The "asmlinkage" tag really should read "C language linkage." GCC takes a i386 specific __attribute__((regparm(0))) that causes the compiler to pass integer data type arguments in the stack instead of using regesters. Functions that take a variable number of arguments will continue to be passed all of their arguments on the stack. On Fri, Jan 11, 2013 at 2:56 PM, Rajat Sharma wrote: > > > it is defined even in much earlier release: > http://lxr.free-electrons.com/ident?v=2.6.32;i=asmlinkage > > There seems to be no definition for arm here too. I literally meant > definition as '#define asmlinkage' not the usage of it. For arm it is none > so default defined in include/linux/linkage.h is used which is nothing > special and just extern 'C' declaration to avoid garbled naming of C++ > linkage, thats it. > > -Rajat > > > On Fri, Jan 11, 2013 at 12:00 PM, Peter Teoh wrote: > >> >> >> On Fri, Jan 11, 2013 at 1:35 PM, Rajat Sharma wrote: >> >>> > asmlinkage is defined for almost all arch: >>> > grep asmlinkage arch/arm/*/* and u got the answer. >>> >>> I didn't see a definition of macro atleast in linux source I was >>> browsing (3.2.0), Could you please point out to any one you have found. >>> >> >> it is defined even in much earlier release: >> >> http://lxr.free-electrons.com/ident?v=2.6.32;i=asmlinkage >> >> for example, and every arch possible has a use of it. >> >> -- >> Regards, >> Peter Teoh > > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: why are scheduling domains used in multiprocessor systems
On Thu, Jan 10, 2013 at 6:09 PM, Bond wrote: > On Thu, Jan 10, 2013 at 9:00 AM, Preeti U Murthy > wrote: > > d1's 'groups',both the sd0s.Here is > > the next advantage.It needs information about the sched group alone and > > will not bother about the individual cpus in it.it checks if > > load(sd0[cpu2,cpu3]) > load(sd0[cpu0,cpu1]) > > Only if this is true does it go on to see if cpu2/3 is more loaded.If > > there were no scheduler domain or groups,we would have to see the states > > of cpu2 and cpu3 in two iterations instead of 1 iteration like we are > > doing now. > > Thanks Peter and preeti, I had seen that intel link and had read but > was not very clear with it, > with both explanations and new links I am clear. > Sorry, I am still learning all these. On top of scheduling domain, there is also cpusets, and both are intertwined (for eg, look into sched_fair.c), and cpu inside cpuset can be offline/online, or made allow/disallowed to be used. I know not the difference between cpusets and sched_domain - conceptually. Any guidance? -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: What is asmlinkage ?
On Fri, Jan 11, 2013 at 1:35 PM, Rajat Sharma wrote: > > asmlinkage is defined for almost all arch: > > grep asmlinkage arch/arm/*/* and u got the answer. > > I didn't see a definition of macro atleast in linux source I was browsing > (3.2.0), Could you please point out to any one you have found. > it is defined even in much earlier release: http://lxr.free-electrons.com/ident?v=2.6.32;i=asmlinkage for example, and every arch possible has a use of it. -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: What is asmlinkage ?
A good example is system call they are passed using registers IIRC > >>>> > >>>> > >>>> -- > >>>> regards, > >>>> > >>>> Mulyadi Santosa > >>>> Freelance Linux trainer and consultant > >>>> > >>>> blog: the-hydra.blogspot.com > >>>> training: mulyaditraining.blogspot.com > >>> > >>> > >>> > >>> ___ > >>> Kernelnewbies mailing list > >>> Kernelnewbies@kernelnewbies.org > >>> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > >>> > >> > >> ___ > >> Kernelnewbies mailing list > >> Kernelnewbies@kernelnewbies.org > >> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > > > > > > > > ___ > > Kernelnewbies mailing list > > Kernelnewbies@kernelnewbies.org > > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > > > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: what is the difference between poll and epoll ?
Read this (classic answer): http://stackoverflow.com/questions/4093185/whats-the-difference-between-epoll-poll-threadpool and from below: http://stackoverflow.com/questions/4039832/select-vs-poll-vs-epoll Which will bring you to: http://daniel.haxx.se/docs/poll-vs-select.html and http://www.kegel.com/c10k.html and the three are explained in depth here: http://www.makelinux.net/ldd3/chp-6-sect-3 The difference are also explained here: http://www.winddisk.com/2012/03/28/epoll%E4%B8%8Eselectpoll%E7%9A%84%E5%8C%BA%E5%88%AB/ and comes with a pictorial diagram as u have requested. On Tue, Jan 8, 2013 at 4:35 AM, horseriver wrote: > hi: > >I know epoll is event triger model ,but I do not know internel > >surpport for it . > >is there some illustration for epoll's frame or internel > implementation? > > thanks! > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: why are scheduling domains used in multiprocessor systems
On Wed, Jan 9, 2013 at 4:03 PM, Bond wrote: > Hi, > please see this question > > http://stackoverflow.com/questions/14229793/what-does-struct-sched-domain-stands-for-in-include-linux-sched-h-scheduling-do > > I checked following > http://lwn.net/Articles/169277/ and following > http://www.kernel.org/doc/Documentation/scheduler/sched-domains.txt > the first line of kernel.org doc says > . Each CPU has a "base" scheduling domain (struct > sched_domain).. > and second para says > " each scheduling domain spans a number of CPUs (stored in the ->span > field)." > third para says > " Each scheduling domain must have one or more CPU > groups.. > The intersection of cpumasks from any two of these groups > MUST be the empty set." > then some where in doc it says > "Balancing within a sched domain occurs between groups. That is, each group > is treated as one entity." the doc in details talks about the > implementation of > > scheduling domains and mentions that CPUs should belong to one of the > scheduling domain in a way that > cpumasks intersection should be an empty set > > The answer of the question that I want to know is > why is a scheduling domain actually needed? > > _ > > CPU scheduling involving many configuration and factors. https://www.cs.unm.edu/~eschulte/classes/cs587/data/10.1.1.59.6385.pdf Goto page 18 for definition of scheduler domain, and it says: "Each node in a system has a scheduler domain that points to its parent scheduler domain. A node might be a uniprocessor system, an SMP system, or a node within a NUMA system." this complex hierarchies of CPU is normally associated with hardware physical proximity CPU (just one factors) or the speed of bus that connect between CPU. Not all CPU are connected to all other CPU, but perhaps only two or 4 other CPU, and therefore, when u transfer data between CPU, it is necessary to build these proximities information into the kernel, to minimize costs of data transfer between CPU. 90% (or more) of supercomputers (with thousands of CPU) are run by Linux kernel, and clearly each CPU can only have a few neighboring CPU. Other factors involved power-management: when your processing usage goes down, u have to shut down the CPU - leaving only the bare minimum to be running. Organizing in some hierarchies facilitate this scheduling algorithm. http://www.intel.com/technology/itj/2007/v11i4/9-process/6-linux-scheduler.htm http://www.cs.stonybrook.edu/~porter/courses/cse506/f12/slides/scheduling.pdf http://www.cs.stonybrook.edu/~porter/courses/cse506/f12/slides/scheduling2.pdf -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Linux Kernel Map
http://www.makelinux.net/kernel_map/ -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: /usr/ld Not enough room for program headers
On Wed, Jan 9, 2013 at 6:36 AM, horseriver wrote: > On Wed, Jan 09, 2013 at 01:28:12PM +0800, Peter Teoh wrote: > > On Sun, Jan 6, 2013 at 11:17 AM, horseriver > wrote: > > > VSYSCALL_BASE = 0xe000; > > SECTIONS > { > . = VSYSCALL_BASE ; > > .hash : { *(.hash) }:text > .dynsym : { *(.dynsym) } > .dynstr : { *(.dynstr) } > .gnu.version: { *(.gnu.version) } > .gnu.version_d : { *(.gnu.version_d) } > .gnu.version_r : { *(.gnu.version_r) } > I suspect something wrong with VSYSCALL_BASE + value here. look at this: http://marcbug.scc-dc.com/svn/repository/trunk/linuxkernel/linux-2.6.16-mcemu/arch/x86_64/ia32/vsyscall.lds and doing a diff with your ld script, there is not much diff, except for the VSYSCALL_BASE + SIZEOF_HEADER portion. Read here to understand how SIZEOF_HEADER is calculated: http://www.math.utah.edu/docs/info/ld_3.html#SEC13 Not sure why do u want to shift the whole section by SIZEOF_HEADER down in bytes? -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: /usr/ld Not enough room for program headers
On Sun, Jan 6, 2013 at 11:17 AM, horseriver wrote: > On Fri, Jan 04, 2013 at 11:34:24AM +0400, Игорь Пашев wrote: > > 2013/1/4 horseriver > > > > > > Not enough room for program headers > > > > > > > > Try to search the Web for this. E. g.: > > http://lists.gnu.org/archive/html/bug-gnu-utils/2002-08/msg00176.html > > thanks! > > in my compile option. I have specifiedmy ld-script file ,and there is no > SIZEOF_HEADER in that file , > > can u show us your ld script? according to the msg00176.html above, it is possible to arise because u have place your other section wrongly (eg, .text), and nothing to do with SIZEOF_HEADER. > but where this error come from ? > > > > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: calling system call in arm from user space
On Wed, Dec 26, 2012 at 1:04 PM, Niroj Pokhrel wrote: > Hi, > I have written a system call and build it with kernel for Arm > architecture. However, I'm confused to use it to call it from the user > space. As it is in x86, where we can simply call by using sycall() function > and the return value is returned by the syscal() itself. > In Arm, I tried to write an assembly language program and was able to call > the system call using the assembly code but what I'm care to show us how you called system call in assembly in ARM? > confused is how to call this function using C program. I tried using > inline assembly but it didn't work. Further, if I can implement it using > inline assembly then return value will be in r0 and how can I move this > value to the user variable. > Thanking you in advance. > > arch/arm/kernel/entry-common.S (and kernel/calls.S) pair up together to implement the pre-syscall and post-syscall wrapper as you have asked. perhaps u can try to understand the code first? > -- > Niroj Pokhrel > Software Engineer, > Samsung India Software Operations > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Locating the keyboard driver (and replacing it)
This article gave a very indepth coverage of the keyboard processing in linux: http://www.phrack.com/issues.html?issue=59&id=14&mode=txt http://www.gadgetweb.de/programming/39-how-to-building-your-own-kernel-space-keylogger.html Not sure about your architecture, but for my Lenovo laptop, when I do a "cat /dev/input/by-path/platform-i8042-serio-0-event-kbd" and redirect to a file, every single key input I entered is captured into the the file. Therefore, looking into the kernel source, we can infer the files drivers/input/serio/i8042.c are responsible for the keyboard processing. Of course, this file is compiled into the kernel, not as a kernel module. So if u want to make any changes, instead of recompile the kernel and rebooting, one way to do dynamically is called "inline hooking" - look elsewhere for this method. It is explained in the following article: http://www.phrack.com/issues.html?issue=59&id=14&mode=txt but note the difference between the Phrack's interception and intercepting the API inside the i8042.c: when you do a "cat /dev/input/by-path/platform-i8042-serio-0-event-kbd" the keyboard entry is always captured - irregardless of whichever windows/terminal you are in. But the Phrack's method is cleaner - it is intercepting at the tty (eg drivers/tty/n_tty.c:receive_buf() inside the kernel source) level - so if you switch over to another window, the input got switch away - it is thus targetted to only that TTY. And btw, USB keyboard's processing path is altogether different againanother http://www.lrr.in.tum.de/Par/arch/usb/download/usbdoc/usbdoc-1.32.pdf and perhaps u can read here many good writeups: http://stackoverflow.com/search?q=usb+keyboard+kernel On Fri1, Dec 14, 2012 at 3:46 PM, manty kuma wrote: > Hi,11 > > I have written a small module that toggles the capslock LED. To > demonstrate it i want to replace the Existing keyboard module with mine. I > tried lsmod|grep "key" without any success. also checked /proc/modules. I > couldnot find any clue regarding the name of the module i need to > uninstall. So, How can i remove the existing keyboard module and insert > mine? > > Regards, > Manty > > > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: keyboard driver question
If your keyboard is not USB based, then perhaps article like this is possibly your answer: http://www.computer-engineering.org/ps2keyboard/ http://eduunix.ccut.edu.cn/index2/html/linux/Sybex%20Linux%20Power%20Tools%202003/6222final/LiB0023.html http://freeworld.thc.org/papers/writing-linux-kernel-keylogger.txt Since your KB is usb-based, u can look here for internal info: http://www.emntech.com/docs/USB_KeyBoard_Driver_eMNTech.pdf Inside there is a picture on the overall flow. Essentially is the usb_kbd_probe() function. Your problem of linking/delinking the KB may also be answered by: http://unix.stackexchange.com/questions/12005/how-to-use-linux-kernel-driver-bind-unbind-interface-for-usb-hid-devices Another good ref is: http://www.linux.it/~rubini/docs/usb/usb.html as it simplified the complex flow of USB processing in the kernel for HID part in particular. A good analogy to your problem is the apple keyboard: http://www.cyberciti.biz/faq/linux-apple-usb-keyboard-driver-installation/ and looking into implementation drivers/hid/hid-apple.c (kernel source) perhaps can give u some insight. Another thing is the non-kernel processing of scancode: http://eduunix.ccut.edu.cn/index2/html/linux/Sybex%20Linux%20Power%20Tools%202003/6222final/LiB0023.html As describe within, X windows keymap may also be used to change the mapping. http://www.in-ulm.de/~mascheck/X11/xmodmap.html http://bochs.sourceforge.net/doc/docbook/user/keymap.html http://madduck.net/docs/extending-xkb/ http://www.pixelbeat.org/docs/xkeyboard/ On Fri, Jan 4, 2013 at 2:17 AM, Racz Zoli wrote: > Hi. > > I`m sorry if this isn`t the right place to post my question, but first I > tried posting it on forum.kernelnewbies.org and nobody answered. Here`s > my question: > > > I have a Gembird kb-9140l keyboard with some multimedia keys which are not > working on linux. I thought about writing my own driver for it, so as a > start, I wrote a small module, which registers an interrupt handler on irq > 1 with the IRQF_SHARED flag. In the handler function I put a simple printk > with the scancode read from the keyboard. The problem is, that the handler > never gets executed. I searched on google, and found that because the > native driver doesn`t share its interrupt with another modules, before I > call request_irq I have to free the original interrupt handler from the > native driver. This would make my computer practically unusable until I > reboot, but at least I would see, it works, but it doesn`t. The original > driver works fine after I insert my module, and the interrupt handler still > doesn`t get called. The weird thing is, when I remove my module, my handler > executes ones, and the scancode is 0xFE. > > The code is the following: > > #include > #include > #include > #include > #include > > > MODULE_LICENSE("Dual BSD/GPL"); > > static int gembirdkb_init(void); > static void gembirdkb_exit(void); > > > irq_handler_t irq_handler (int irq, void *dev_id, struct pt_regs *regs) > { > static unsigned char scancode; > > scancode = inb (0x60); > > printk("gembirdkb: irq handled... scancode: %d\n",scancode); > > return (irq_handler_t) IRQ_HANDLED; > } > > > static int gembirdkb_init(void) > { > int ret; > > /* free original interrupt handler */ > // free_irq(1, NULL); > > ret = request_irq (1, (irq_handler_t) irq_handler, IRQF_SHARED, > "gembirdkb", (void *)&irq_handler); > > printk("gembirdkb: request_irq result: %d\n", ret); > > return ret; > } > > static void gembirdkb_exit(void) > { > free_irq(1, (void *)&irq_handler); > } > > > module_init(gembirdkb_init); > module_exit(gembirdkb_exit); > > Is there any way I can remove the native driver, or I need to recompile > the kernel without it, and insert mine? > > P.s.: Why every topic on the forum is full with questions about mac, > iphone, samsung galaxy etc.? > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: trace MKFS
when u execute "mkfs", based on your "-t" filesystem passed in to mkfs, one of the following command line utility will be executed: mkfs.cramfsmkfs.ext4 mkfs.minix mkfs.reiserfs mkfs.bfs mkfs.ext2 mkfs.ext4dev mkfs.msdos mkfs.vfat mkfs.btrfs mkfs.ext3 mkfs.jfs mkfs.ntfs mkfs.xfs and for each of the above command line there is a fs utility that include it. Look into the source for good understanding. For ext2/ext3 fs, it is called e2fsprogs. So in Ubuntu (or Debian-based distro) u do a "apt-get source e2fsprogs" to get the source: reading the source of mkfs's main() function: http://pastebin.com/xcsB6GUC u can see that after lots of code on setting structures in memory, it start by writing the inode table etc: write_inode_tables(fs, lazy_itable_init, itable_zeroed); create_root_dir(fs); create_lost_and_found(fs); reserve_inodes(fs); create_bad_block_inode(fs, bb_list); Following through the source code is much more understandable than going through output of "strace", which records all the interface with the kernel. Follow through the following slide: http://www.geego.com/free-linux-lpic-training-material-study-guide/lpic1-modules/4-5/ext2-ext3.html and forward a few slides and u will understand that mkfs is just making the header structures on the harddisk to contain the definition of the FS : Similarly u can find many university courses on filesystem internal, eg: http://scx010c06a.blogspot.sg/2012/03/second-extended-file-system-ext2.html Generally, real-life analysis of the harddisk/filesystem is done in forensic, so if u googling for fs forensics u can find lots of tools that walk the harddisk for the different components: http://www.dfrws.org/2007/proceedings/p55-barik.pdf http://www.cs.kau.se/~stefan/forensics/chapter14-15.pdf http://www.blackhat.com/presentations/bh-asia-03/bh-asia-03-grugq/bh-asia-03-grugq.pdf http://www.dfrws.org/2007/proceedings/p55-barik_pres.pdf and this is forensics of ext4 filesystem: http://www.dfrws.org/2012/proceedings/DFRWS2012-13.pdf Understanding "mkfs", is really as good as understanding FS internals. On Fri, Jan 4, 2013 at 11:12 PM, KASHISH BHATIA < kashish.bhatia1...@gmail.com> wrote: > Hi, > > I want to trace the overall flow of mkfs inside linux kernel. Specifically > want to know which > kernel fs data structures are affected when we run "mkfs" ? > What all "mkfs" command writes on the block device when we run the > command? > Are there any good documents which can explain the same? > > -- > > Regards, > Kashish Bhatia > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Why "lsusb" return nothing?
THank you for your help. This is the result: mount -t usbdevfs none /proc/bus/usb mount: mount point /proc/bus/usb does not exist mkdir /proc/bus/usb mkdir: cannot create directory `/proc/bus/usb': No such file or directory And supposed I tried a directory that exist: mount -t usbdevfs none /proc/bus mount: unknown filesystem type 'usbdevfs' The exact mirror (before the problem start I mirrored the system) is still working today, and I have not find any difference between the two version so far. On Mon, Oct 1, 2012 at 12:51 PM, wrote: > This steps helped me when I had same problem in SUSE9. > > The Reason is "/proc/bus/usb/ doesn't has any entry where actually lsusb > searches to show USB BUS devices.To make that happen you have to manually . > Mount the Bus devices using below command. > > mount -t usbdevfs none /proc/bus/usb/ > > And you are done. > Now lsusb should show all USB BUS devies. > > Thanks > Ashish Bunkar > > -Original Message- > From: kernelnewbies-boun...@kernelnewbies.org [mailto: > kernelnewbies-boun...@kernelnewbies.org] On Behalf Of Peter Teoh > Sent: Saturday, September 29, 2012 7:12 AM > To: kernelnewbies@kernelnewbies.org > Subject: Why "lsusb" return nothing? > > I entered "lsusb" at the command line (as root) and nothing is return, not > even any error message. > > Doing a strace the last few lines are: > > open("/dev/bus/usb", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 > ENOENT (No such file or directory) open("/proc/bus/usb", > O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 ENOENT (No such file or > directory) > > What happened? > > This is Ubuntu 10.04 (it used NOT to be like that, not sure I what did > wrong last time). But running a VirtualBox INSIDE this same OS, I > was able to get result from "lsusb" (after enabling the USB devices in > VirtualBox interface) and strace gives result: > > open("/dev/bus/usb/001/002", O_RDWR)= 3 > ioctl(3, USBDEVFS_IOCTL, 0xbff6f75c)= -1 ENOTTY (Inappropriate > ioctl for device) > close(3)= 0 > open("/dev/bus/usb/001/001", O_RDWR)= 3 > > Why the difference? > > -- > Regards, > Peter Teoh > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Facing trouble in creating a packet in kernel space
Yes, Michael has a point. proxy is easier than kernel.I used Webscarab for this. Alternatively, another tool I used is scapy (no proxy setup is needed). And I must say it is a FANTASTIC tool for this purpose. First u capture with wireshark, and then replay via scapy, which has a function called "fuzz()" for this purpose. http://www.secdev.org/conf/scapy_pacsec05.handout.pdf http://media.packetlife.net/media/library/36/scapy.pdf http://theitgeekchronicles.files.wordpress.com/2012/05/scapyguide1.pdf it is low level enough, for you to fuzz at different protocol inside each packet. On Wed, Sep 26, 2012 at 2:43 AM, wrote: > Hi! > > On 16:28 Tue 25 Sep , Rifat Rahman wrote: >> Hello there, >> >> I need to mangle rtp packets in kernel space. So far I am new in kernel >> module programming. I am trying to implement a module for netfilter hooks. >> For the first time as exercise, I am trying to write smaller modules. Let >> me explain what I am actually doing now. >> >> I have an echo client and server. The server runs on port 6000. Both are on >> different machines (May be VMs in bridge filter mode). The client sends udp >> message and the server just echoes it back. Let us suppose the client sends >> "some message" as data. Then now I am trying to write a module for the >> client machine that will append "12345" after the data so that the server >> will get "some message12345" and echo it back. Now there are various things >> I did faced. I relied on the NF_IP_POST_ROUTING hook. > > I do not understand why you try to do this in the kernel at all. Why does the > client app not just send "some message12345" itself? If you want to mange the > data in transit, why not use a transparent proxy instead? > http://stackoverflow.com/questions/5615579/how-to-get-original-destination-port-of-redirected-udp-message > >> At first, I copied the data to a temporary storage, and then add 12345 with >> that. Then I increase skb->tail using skb_put(). Then I memset() 0 to the >> packet data, and copy the temporary storage with that. Then as the >> procedure, NF_ACCEPT is returned. There are certain checking points like >> the udph->dest == 6000 etc. etc. When I use skb_put(), my system hangs out >> after two or three minutes. > > What does it do exactly? If you do skb_put() and there is no space, you should > get something like "skb_over_panic". > >> When I dmesg to be certain that everything goes >> right, I find it OK. But, suppose once I send a message like "This is a >> pretty big message" and another time I send "small message" then I get just >> "small message12345g message" that means, the bigger message is stored >> somewhere I don't understand. I tried with skb_add_data() but that works >> incorrectly here, I understand it's my fault. I just can't figure it out. > > Could it be that the small message happened to allocate the same memory the > previous packet used and thus has some unallocated data at the end? > >> Now, one thing came in my mind, if it's not possible, should I create new >> packets for that data appending? I find skb->end - skb->tail is not so big. > > You might have to do so in some cases. But it might have some side effects > nobody would think about. For example, take a look at this: > http://lxr.linux.no/#linux+v3.5.4/net/sched/cls_cgroup.c#L117 > It essentially means that the packet queue layer2 accesses data all the way up > to the socket layer. If you just copy the data, this will break. More things > like this may exist. > > You might be also able to allocate a larger buffer and reuse the sk_buff. It > might be less painful. > >> But ultimately I have to merge two or three packets into one packet and >> then skb_put() will not suffice for me. Then the point comes, I can use >> alloc_skb(), skb_reserve(), skb_header_pointer() and other skb manipulation >> functions, but I don't understand how can I drop the packet got (should I >> return NF_DROP?) > > There should be a way to drop packets inside netfilter rules (maybe not in > postrouting tough). I did not look into the code right now. Why not try > returning NF_DROP and see if it leaks? > >> and how can I route my created packets in the packet flowing path? > > You could do it the dirty way and just call dev_queue_xmit(). The packet will > be directly sent to the device without going through all hook (including > yours) > a second time. You have to be careful about the udp checksum and > fragmentation. > Also, if ipsec is in use, it will
Re: Why "lsusb" return nothing?
Furthermore, if u look at the /sys/bus/usb interface: . ./uevent ./devices ./devices/usb1 ./devices/1-0:1.0 ./devices/usb2 ./devices/2-0:1.0 ./devices/1-1 ./devices/1-1:1.0 ./devices/2-1 ./devices/2-1:1.0 ./devices/1-1.1 ./devices/1-1.1:1.0 ./devices/1-1.2 ./devices/1-1.2:1.0 ./devices/1-1.3 ./devices/1-1.3:1.0 ./devices/1-1.4 ./devices/1-1.4:1.0 ./devices/1-1.5 ./devices/1-1.5:1.0 ./devices/1-1.5:1.1 ./devices/2-1.2 ./devices/2-1.2:1.0 ./devices/2-1.3 ./devices/2-1.3:1.0 ./devices/2-1.6 ./devices/2-1.6:1.0 ./drivers ./drivers/usbfs ./drivers/usbfs/module ./drivers/usbfs/uevent ./drivers/usbfs/unbind ./drivers/usbfs/bind ./drivers/usbfs/new_id ./drivers/usbfs/remove_id ./drivers/usbfs/1-1.3:1.0 ./drivers/hub ./drivers/hub/module ./drivers/hub/uevent ./drivers/hub/unbind ./drivers/hub/bind ./drivers/hub/new_id ./drivers/hub/remove_id ./drivers/hub/1-0:1.0 ./drivers/hub/2-0:1.0 ./drivers/hub/1-1:1.0 ./drivers/hub/2-1:1.0 ./drivers/usb ./drivers/usb/uevent ./drivers/usb/unbind ./drivers/usb/bind ./drivers/usb/usb1 ./drivers/usb/usb2 ./drivers/usb/1-1 ./drivers/usb/2-1 ./drivers/usb/1-1.1 ./drivers/usb/1-1.2 ./drivers/usb/1-1.3 ./drivers/usb/1-1.4 ./drivers/usb/1-1.5 ./drivers/usb/2-1.2 ./drivers/usb/2-1.3 ./drivers/usb/2-1.6 ./drivers/usb-storage ./drivers/usb-storage/1-1.1:1.0 ./drivers/usb-storage/module ./drivers/usb-storage/uevent ./drivers/usb-storage/unbind ./drivers/usb-storage/bind ./drivers/usb-storage/remove_id ./drivers/usb-storage/2-1.2:1.0 ./drivers/usbhid ./drivers/usbhid/1-1.2:1.0 ./drivers/usbhid/module ./drivers/usbhid/uevent ./drivers/usbhid/unbind ./drivers/usbhid/bind ./drivers/usbhid/new_id ./drivers/usbhid/remove_id ./drivers/uvcvideo ./drivers/uvcvideo/1-1.5:1.0 ./drivers/uvcvideo/1-1.5:1.1 ./drivers/uvcvideo/module ./drivers/uvcvideo/uevent ./drivers/uvcvideo/unbind ./drivers/uvcvideo/bind ./drivers/uvcvideo/new_id ./drivers/uvcvideo/remove_id ./drivers_probe ./drivers_autoprobe this I guessed account for some of the hardware (like USB mass storage device) still working, whereas those that depend on the /dev/bus/usb interface is not working (eg, Android's adb) another symptom is that when i "mkdir -p /dev/bus/usb" directory, by inserting a new USB harddisk, the directory is immediately deleted. and now i can access the newly inserted harddisk, and dmesg returns: [26949.222877] sd 12:0:0:0: [sdc] Mode Sense: 38 00 00 00 [26949.225095] sd 12:0:0:0: [sdc] No Caching mode page present [26949.225099] sd 12:0:0:0: [sdc] Assuming drive cache: write through [26949.230715] sd 12:0:0:0: [sdc] No Caching mode page present [26949.230719] sd 12:0:0:0: [sdc] Assuming drive cache: write through [26949.282965] sdc: sdc1 sdc2 sdc4 [26949.288972] sd 12:0:0:0: [sdc] No Caching mode page present [26949.288977] sd 12:0:0:0: [sdc] Assuming drive cache: write through [26949.288980] sd 12:0:0:0: [sdc] Attached SCSI disk seemingly noproblem - but "lsusb" returned NOTHING. In fact I had also mirror the entire system into another different hardware - before the /dev/bus/usb non-availability happened, and this same system is still working fine. So I am quite sure it is a udev thing, just trying my luck if anyone know the answer? On Sat, Sep 29, 2012 at 9:42 AM, Peter Teoh wrote: > I entered "lsusb" at the command line (as root) and nothing is return, > not even any error message. > > Doing a strace the last few lines are: > > open("/dev/bus/usb", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 > ENOENT (No such file or directory) > open("/proc/bus/usb", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 > ENOENT (No such file or directory) > > What happened? > > This is Ubuntu 10.04 (it used NOT to be like that, not sure I what did > wrong last time). But running a VirtualBox INSIDE this same OS, I > was able to get result from "lsusb" (after enabling the USB devices in > VirtualBox interface) and strace gives result: > > open("/dev/bus/usb/001/002", O_RDWR)= 3 > ioctl(3, USBDEVFS_IOCTL, 0xbff6f75c)= -1 ENOTTY (Inappropriate > ioctl for device) > close(3) = 0 > open("/dev/bus/usb/001/001", O_RDWR)= 3 > > Why the difference? > > -- > Regards, > Peter Teoh -- Regards, Peter Teoh ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies