Re: [Haifux] SSD and linux
Hi Doron The place where the current process goes to sleep and waits until the page is swapped in is indeed in generic_make_request() (called from submit_bio()) There is a call to block_wait_queue_running(q); which moves this process to wait and calls for schedule() [prepare_to_wait_exclusive() and after that io_schedule()]. Thus, this seems to be a place for a busy loop. You must be careful though with what you change and make sure not to break some other code path, that assumes certain things done in this code path. For example, if you are not going to put this process in the wait queue, you must be careful what will happen when the io operation will finish and will want to remove this process from the wait queue and wake it up. Gabi P.S. I was referring to version 2.6.11 http://lxr.linux.no/linux+v2.6.11/drivers/block/ll_rw_blk.c#L2595 _ From: Doron Zuckerman [mailto:[EMAIL PROTECTED] Sent: Thursday, September 18, 2008 12:28 PM To: [EMAIL PROTECTED]; [EMAIL PROTECTED]; haifux@haifux.org Subject: Re: [Haifux] SSD and linux Hi Gabi and Muli, I'm sorry about the mistake- you understood me correctly. I'm not sure it will speed up the OS, however I'm doing an academic research on the matter as part of a project I'm taking, and I plan to check this point. The leading thought was that since the SSD is not a mechanical drive, pages can be brought faster in this way, and there is no need to context switch, thus, avoiding the overhead included. Yes I plan to use the polling system (busy-wait) , and I'm looking for the kernel part in the pagefault handling mechanism in which the process is suspended in order to prevent it. So far I found the function __generic_make_request in file ll_blk. This function calls a sub function named might_sleep. I have deleted the call to this function whenever I'm in a pagefault, however I'm not sure if this function casuses the sleep, or is just used for debugging in order to check if we entered a suspend state. My question is if this is the function I should change in order to accept the change I'm willing to get, or if the change should be made in q-make_request_fn which, according to my understanding, belongs to the specific driver I'm using. Please help me find the specific place I'm looking for that would make the desired change. Thank you very much, Doron. On Tue, Sep 16, 2008 at 2:42 PM, gabik [EMAIL PROTECTED] https://mail.google.com/mail?view=cmtf=0[EMAIL PROTECTED] wrote: Hello Doron Why do you think it will speed up the OS? What do you plan to do until the page is swapped in? Busy loop? About your solution: handle_mm_fault is called from within page fault handler (do_page_fault http://lxr.linux.no/linux+v2.6.26.5/+code=do_page_fault ()). So what is the rational behind calling handle_mm_fault not from inside pagefault handler? Where would you call it from instead and what do you plan to do when you are in the page fault? Probably what you meant is, in order not to do context switch due to page fault, is to call handle_mm_fault as usual, but not to raise need_resched flag, so as not to trigger a context switch in case of a major page fault. Gabi _ From: [EMAIL PROTECTED] https://mail.google.com/mail?view=cmtf=0[EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] https://mail.google.com/mail?view=cmtf=0[EMAIL PROTECTED] ] On Behalf Of Doron Zuckerman Sent: Tuesday, September 16, 2008 12:31 PM To: haifux@haifux.org https://mail.google.com/mail?view=cmtf=0[EMAIL PROTECTED] Cc: Ronen Gruengras Subject: [Haifux] SSD and linux Hi all, I have a question regarding the linux kernel (for those of you who are familiar with it). I'm looking for a way to add a change to the linux kernel in order to check if I can make it more compatible with my Asus EEE-PC. I would like to change the kernel in such way that it will not do a context switch every time there is a page fault and will wait for the required page to be brought from the SSD (Solid State Drive), then continue as usual. In Such way, I plan to check if I can fasten the speed of the Operating System (Ubuntu for EEE). I thought of adding a TIF flag in the process descriptor (thread_info_32.h) that will tell me if I'm currently in a pagefault and then change the fault_32.c in such way that it will do the handle_mm_fault(mm,vma, address, write_; only if there is no pagefault at the moment. Can you suggest any other solution possible or tell me what you think about this solution. I would really appreciate any help with this, Doron. ___ Haifux mailing list Haifux@haifux.org http://hamakor.org.il/cgi-bin/mailman/listinfo/haifux
Re: [Haifux] SSD and linux
Hi Muli, It seems like a good idea to check the time of a single block read against a single context switch, we'll try looking more into it. Try to find the place where the faulting process is put to sleep and convert that code to busy wait instead, terminating the busy-wait when the page has been brought in. That's exactly what we are looking for, so far with no success... We tried following the page-fault path and got all the way to the call to q-make_request_fn (in the __generic_make_request function in the block\ll_rw_blk.c file). Till there we couldn't find anything that can put the current process into waiting. Our guess is that it is done somewere inside this function. Do you have any idea where we can find this? Thanks, Ronen Doron On Thu, Sep 18, 2008 at 4:49 PM, Muli Ben-Yehuda [EMAIL PROTECTED] wrote: On Thu, Sep 18, 2008 at 12:27:36PM +0300, Doron Zuckerman wrote: I'm not sure it will speed up the OS, however I'm doing an academic research on the matter as part of a project I'm taking, and I plan to check this point. I'm pretty sure it won't. The leading thought was that since the SSD is not a mechanical drive, pages can be brought faster in this way, and there is no need to context switch, thus, avoiding the overhead included. I suggest a much simpler exercise: (a) time how long it takes to read a block of data from the SSD (b) time how long a context switch takes See that (b) is orders of magnitude faster than (a). So far I found the function __generic_make_request in file ll_blk. This function calls a sub function named might_sleep. I have deleted the call to this function whenever I'm in a pagefault, however I'm not sure if this function casuses the sleep, or is just used for debugging in order to check if we entered a suspend state. might_sleep() is a debugging aid, which is used by code that might sleep in order to check that it hasn't been called in a context where you can't sleep (non-process context such as an interrupt handler). My question is if this is the function I should change in order to accept the change I'm willing to get, or if the change should be made in q-make_request_fn which, according to my understanding, belongs to the specific driver I'm using. Neither. Take a look at the page fault path for a major fault. What it does (from 10,000 feet) is initiate reading the page from disk, and then going t sleep until the page is ready. Going to sleep in the page fault path is what causes the context switch you want to avoid. What you want to do instead of going to sleep is busy-wait for the data. Try to find the place where the faulting process is put to sleep and convert that code to busy wait instead, terminating the busy-wait when the page has been brought in. Cheers, Muli -- Workshop on I/O Virtualization (WIOV '08) Co-located with OSDI '08, Dec 2008, San Diego, CA http://www.usenix.org/wiov08 ___ Haifux mailing list Haifux@haifux.org http://hamakor.org.il/cgi-bin/mailman/listinfo/haifux
Re: [Haifux] SSD and linux
Doron You can work on 2.6.24 if you prefer. I just picked some version and checked on it. (for some reason there is no arch/i386 in 2.6.24. Maybe they have renamed it into x86?) As for which function to use: What you want to change is not the place where the io request is done, but the place where the process puts itself in the wait queue, removes itself from the runqueue and calls for schedule(). In 2.6.11 this is done in function block_wait_queue_running(q). I have not checked what q-make_request_fn(q,bio) does exactly, but from your description, it issues a request to the driver. This is probably done by simply adding some request struct with instruction on what to do to the driver request data structure (and maybe signaling the driver in some way). After doing that the process puts itself to sleep (via block_wait_queue_running). When the driver finishes handling the request (A LOT OF time from now), it raises HW interrupt and this interrupt will wake the waiting process and put it back in the run queue. Sometime later the process will be scheduled to run and it will continue from the next place after the call for schedule(). Gabi P.S. I think it would be wise to check first what Muli has suggested - compare times. _ From: Doron Zuckerman [mailto:[EMAIL PROTECTED] Sent: Thursday, September 18, 2008 6:29 PM To: gabik; haifux@haifux.org Cc: Ronen Gruengras Subject: Re: [Haifux] SSD and linux Hi Gabi, First of all thanks for you're help. We are currently using kernel 2.6.24, and couldn't find any call to the function block_wait_queue_running(q) there. It seems to handle things a bit differently. Moreover I looked at the code of kernel 2.6.11 and from what I can understand, it seems to me like the block_wait_queue_running(q) function only waits on the IO queue for the IO to be ready, and not for the IO request (reading the page from the disk) to be done (it is called before the IO request is made). Correct me if I'm wrong but isn't the call to the function q-make_request_fn(q,bio) what makes the actual request to the device, and therefore the place which is responsible for waiting for the result of that request? P.S. I don't mind switching to kernel 2.6.11 (or any other for that matter) as long as I can make the changes I need. Thanks, Ronen Doron On Thu, Sep 18, 2008 at 4:49 PM, gabik [EMAIL PROTECTED] wrote: Hi Doron The place where the current process goes to sleep and waits until the page is swapped in is indeed in generic_make_request() (called from submit_bio()) There is a call to block_wait_queue_running(q); which moves this process to wait and calls for schedule() [prepare_to_wait_exclusive() and after that io_schedule()]. Thus, this seems to be a place for a busy loop. You must be careful though with what you change and make sure not to break some other code path, that assumes certain things done in this code path. For example, if you are not going to put this process in the wait queue, you must be careful what will happen when the io operation will finish and will want to remove this process from the wait queue and wake it up. Gabi P.S. I was referring to version 2.6.11 http://lxr.linux.no/linux+v2.6.11/drivers/block/ll_rw_blk.c#L2595 _ From: Doron Zuckerman [mailto:[EMAIL PROTECTED] https://mail.google.com/mail?view=cmtf=0[EMAIL PROTECTED] ] Sent: Thursday, September 18, 2008 12:28 PM To: [EMAIL PROTECTED] https://mail.google.com/mail?view=cmtf=0[EMAIL PROTECTED] ; [EMAIL PROTECTED] https://mail.google.com/mail?view=cmtf=0[EMAIL PROTECTED] ; haifux@haifux.org https://mail.google.com/mail?view=cmtf=0[EMAIL PROTECTED] Subject: Re: [Haifux] SSD and linux Hi Gabi and Muli, I'm sorry about the mistake- you understood me correctly. I'm not sure it will speed up the OS, however I'm doing an academic research on the matter as part of a project I'm taking, and I plan to check this point. The leading thought was that since the SSD is not a mechanical drive, pages can be brought faster in this way, and there is no need to context switch, thus, avoiding the overhead included. Yes I plan to use the polling system (busy-wait) , and I'm looking for the kernel part in the pagefault handling mechanism in which the process is suspended in order to prevent it. So far I found the function __generic_make_request in file ll_blk. This function calls a sub function named might_sleep. I have deleted the call to this function whenever I'm in a pagefault, however I'm not sure if this function casuses the sleep, or is just used for debugging in order to check if we entered a suspend state. My question is if this is the function I should change in order to accept the change I'm willing to get, or if the change should be made in q-make_request_fn which, according to my understanding, belongs to the specific driver I'm using. Please help me find the specific place I'm looking for that would make the desired change. Thank you very much, Doron. On Tue, Sep
Re: [Haifux] SSD and linux
q-make_request_fn seems to call a function __make_request() [blk_init_queue_node initializes a pointer to this function] __make_request calls get_request_wait() which in turn calls prepare_to_wait_exclusive() and later io_schedule(). So the logic is exactly like in 2.6.11, but a bit more complex. Gabi _ From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Doron Zuckerman Sent: Thursday, September 18, 2008 6:50 PM To: Muli Ben-Yehuda; Ronen Gruengras; haifux@haifux.org Subject: Re: [Haifux] SSD and linux Hi Muli, It seems like a good idea to check the time of a single block read against a single context switch, we'll try looking more into it. Try to find the place where the faulting process is put to sleep and convert that code to busy wait instead, terminating the busy-wait when the page has been brought in. That's exactly what we are looking for, so far with no success... We tried following the page-fault path and got all the way to the call to q-make_request_fn (in the __generic_make_request function in the block\ll_rw_blk.c file). Till there we couldn't find anything that can put the current process into waiting. Our guess is that it is done somewere inside this function. Do you have any idea where we can find this? Thanks, Ronen Doron On Thu, Sep 18, 2008 at 4:49 PM, Muli Ben-Yehuda [EMAIL PROTECTED] wrote: On Thu, Sep 18, 2008 at 12:27:36PM +0300, Doron Zuckerman wrote: I'm not sure it will speed up the OS, however I'm doing an academic research on the matter as part of a project I'm taking, and I plan to check this point. I'm pretty sure it won't. The leading thought was that since the SSD is not a mechanical drive, pages can be brought faster in this way, and there is no need to context switch, thus, avoiding the overhead included. I suggest a much simpler exercise: (a) time how long it takes to read a block of data from the SSD (b) time how long a context switch takes See that (b) is orders of magnitude faster than (a). So far I found the function __generic_make_request in file ll_blk. This function calls a sub function named might_sleep. I have deleted the call to this function whenever I'm in a pagefault, however I'm not sure if this function casuses the sleep, or is just used for debugging in order to check if we entered a suspend state. might_sleep() is a debugging aid, which is used by code that might sleep in order to check that it hasn't been called in a context where you can't sleep (non-process context such as an interrupt handler). My question is if this is the function I should change in order to accept the change I'm willing to get, or if the change should be made in q-make_request_fn which, according to my understanding, belongs to the specific driver I'm using. Neither. Take a look at the page fault path for a major fault. What it does (from 10,000 feet) is initiate reading the page from disk, and then going t sleep until the page is ready. Going to sleep in the page fault path is what causes the context switch you want to avoid. What you want to do instead of going to sleep is busy-wait for the data. Try to find the place where the faulting process is put to sleep and convert that code to busy wait instead, terminating the busy-wait when the page has been brought in. Cheers, Muli -- Workshop on I/O Virtualization (WIOV '08) Co-located with OSDI '08, Dec 2008, San Diego, CA http://www.usenix.org/wiov08 ___ Haifux mailing list Haifux@haifux.org http://hamakor.org.il/cgi-bin/mailman/listinfo/haifux
Re: [Haifux] SSD and linux
On Thu, Sep 18, 2008 at 06:56:09PM +0300, gabik wrote: Doron You can work on 2.6.24 if you prefer. I just picked some version and checked on it. (for some reason there is no arch/i386 in 2.6.24. Maybe they have renamed it into x86?) Yes. arch/x86 is now for both 32 and 64 bits. Cheers, Muli -- Workshop on I/O Virtualization (WIOV '08) Co-located with OSDI '08, Dec 2008, San Diego, CA http://www.usenix.org/wiov08 ___ Haifux mailing list Haifux@haifux.org http://hamakor.org.il/cgi-bin/mailman/listinfo/haifux
Re: [Haifux] SSD and linux
On Thu, Sep 18, 2008 at 06:49:58PM +0300, Doron Zuckerman wrote: Do you have any idea where we can find this? I haven't looked at those bits recently, but it sounds like Gabi is pointing you to the right path. In any case, to be honest, I think what you propose doesn't make sense, even as research. Look at it this way. When does busy waiting makes sense? When the overhead of sleeping is offset by the useful work that gets done while you sleep (or when you can't sleep). So let's say that the overhead of a context switch is T_c. Switching to some other task and back will cost 2*T_c. Assuming that any work that the task you switch to does is useful, busy waiting makes sense only if you can resume executing the faulting task within 2*T_c time. So, unless you can read the frame from the SSD within 2*T_c time (which I highly doubt...) busy waiting does not make sense. Another point to consider is that if you are running on a UP machine and your kernel isn't preemptible, and the work to submit the I/O to disk happens in some other context than the one you run in, if you busy wait the I/O may never get submitted, and you'll busy wait forever! Cheers, Muli -- Workshop on I/O Virtualization (WIOV '08) Co-located with OSDI '08, Dec 2008, San Diego, CA http://www.usenix.org/wiov08 ___ Haifux mailing list Haifux@haifux.org http://hamakor.org.il/cgi-bin/mailman/listinfo/haifux
Re: [Haifux] SSD and linux
On Tue, Sep 16, 2008 at 12:31:09PM +0300, Doron Zuckerman wrote: Hi all, I have a question regarding the linux kernel (for those of you who are familiar with it). I'm looking for a way to add a change to the linux kernel in order to check if I can make it more compatible with my Asus EEE-PC. I would like to change the kernel in such way that it will not do a context switch every time there is a page fault and will wait for the required page to be brought from the SSD (Solid State Drive), then continue as usual. We context switch because the task (thread) cannot continue working until the page is paged in from the disk. If we don't context switch, and the thread cannot continue running until the page fault is resolved, what will the OS do in the meantime? Note that even though the EEE has an SSD drive, it's still several orders of magnitude slower than the time the context switch takes. Cheers, Muli -- Workshop on I/O Virtualization (WIOV '08) Co-located with OSDI '08, Dec 2008, San Diego, CA http://www.usenix.org/wiov08 ___ Haifux mailing list Haifux@haifux.org http://hamakor.org.il/cgi-bin/mailman/listinfo/haifux
Re: [Haifux] SSD and linux
Hello Doron Why do you think it will speed up the OS? What do you plan to do until the page is swapped in? Busy loop? About your solution: handle_mm_fault is called from within page fault handler (do_page_fault http://lxr.linux.no/linux+v2.6.26.5/+code=do_page_fault ()). So what is the rational behind calling handle_mm_fault not from inside pagefault handler? Where would you call it from instead and what do you plan to do when you are in the page fault? Probably what you meant is, in order not to do context switch due to page fault, is to call handle_mm_fault as usual, but not to raise need_resched flag, so as not to trigger a context switch in case of a major page fault. Gabi _ From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Doron Zuckerman Sent: Tuesday, September 16, 2008 12:31 PM To: haifux@haifux.org Cc: Ronen Gruengras Subject: [Haifux] SSD and linux Hi all, I have a question regarding the linux kernel (for those of you who are familiar with it). I'm looking for a way to add a change to the linux kernel in order to check if I can make it more compatible with my Asus EEE-PC. I would like to change the kernel in such way that it will not do a context switch every time there is a page fault and will wait for the required page to be brought from the SSD (Solid State Drive), then continue as usual. In Such way, I plan to check if I can fasten the speed of the Operating System (Ubuntu for EEE). I thought of adding a TIF flag in the process descriptor (thread_info_32.h) that will tell me if I'm currently in a pagefault and then change the fault_32.c in such way that it will do the handle_mm_fault(mm,vma, address, write_; only if there is no pagefault at the moment. Can you suggest any other solution possible or tell me what you think about this solution. I would really appreciate any help with this, Doron. ___ Haifux mailing list Haifux@haifux.org http://hamakor.org.il/cgi-bin/mailman/listinfo/haifux