Re: [net-next PATCH v2 5/8] net: Track start of busy loop instead of when it should end
On Fri, Mar 24, 2017 at 4:16 AM, Eric Dumazetwrote: > On Thu, 2017-03-23 at 22:55 -0700, Alexander Duyck wrote: > >> Right, but time_after assumes roll over. When you are using a time >> value based off of local_clock() >> 10, you don't ever roll over when >> you do addition. Just the clock rolls over. At least on 64 bit >> systems. >> >> So if local time approaches something like all 1's, and we have >> shifted it by 10 it is then the max it can ever reach is >> 0x003F. I can add our loop time to that and it won't roll >> over. In the mean time the busy_loop_us_ can never exceed whatever I >> added to that so we are now locked into a loop. I realize I am >> probably being pedantic, and it will have an exceedingly small rate of >> occurrence, but it is still an issue. > > Do you realize that a 64bit clock wont rollover before the host has > reached 584 years of uptime ? Yeah, that is what I meant by "probably being pedantic". I was being a too much of a perfectionist. So I can work with the ">> 10" approach. The only thing I think I may still want to change is that on 32b systems I will still use the do_procintvec_minmax for busy_poll and busy_read to prevent us from inputting values less than 0. For 64b systems we can do_procuintvec. It isn't so much that I don't trust root, it is just that we didn't really document the ranges anywhere for this so I figure if we at least lock that down to the usable ranges since root may not be aware of the implementation details.
Re: [net-next PATCH v2 5/8] net: Track start of busy loop instead of when it should end
On Fri, Mar 24, 2017 at 4:16 AM, Eric Dumazet wrote: > On Thu, 2017-03-23 at 22:55 -0700, Alexander Duyck wrote: > >> Right, but time_after assumes roll over. When you are using a time >> value based off of local_clock() >> 10, you don't ever roll over when >> you do addition. Just the clock rolls over. At least on 64 bit >> systems. >> >> So if local time approaches something like all 1's, and we have >> shifted it by 10 it is then the max it can ever reach is >> 0x003F. I can add our loop time to that and it won't roll >> over. In the mean time the busy_loop_us_ can never exceed whatever I >> added to that so we are now locked into a loop. I realize I am >> probably being pedantic, and it will have an exceedingly small rate of >> occurrence, but it is still an issue. > > Do you realize that a 64bit clock wont rollover before the host has > reached 584 years of uptime ? Yeah, that is what I meant by "probably being pedantic". I was being a too much of a perfectionist. So I can work with the ">> 10" approach. The only thing I think I may still want to change is that on 32b systems I will still use the do_procintvec_minmax for busy_poll and busy_read to prevent us from inputting values less than 0. For 64b systems we can do_procuintvec. It isn't so much that I don't trust root, it is just that we didn't really document the ranges anywhere for this so I figure if we at least lock that down to the usable ranges since root may not be aware of the implementation details.
Re: [net-next PATCH v2 5/8] net: Track start of busy loop instead of when it should end
On Thu, 2017-03-23 at 22:55 -0700, Alexander Duyck wrote: > Right, but time_after assumes roll over. When you are using a time > value based off of local_clock() >> 10, you don't ever roll over when > you do addition. Just the clock rolls over. At least on 64 bit > systems. > > So if local time approaches something like all 1's, and we have > shifted it by 10 it is then the max it can ever reach is > 0x003F. I can add our loop time to that and it won't roll > over. In the mean time the busy_loop_us_ can never exceed whatever I > added to that so we are now locked into a loop. I realize I am > probably being pedantic, and it will have an exceedingly small rate of > occurrence, but it is still an issue. Do you realize that a 64bit clock wont rollover before the host has reached 584 years of uptime ?
Re: [net-next PATCH v2 5/8] net: Track start of busy loop instead of when it should end
On Thu, 2017-03-23 at 22:55 -0700, Alexander Duyck wrote: > Right, but time_after assumes roll over. When you are using a time > value based off of local_clock() >> 10, you don't ever roll over when > you do addition. Just the clock rolls over. At least on 64 bit > systems. > > So if local time approaches something like all 1's, and we have > shifted it by 10 it is then the max it can ever reach is > 0x003F. I can add our loop time to that and it won't roll > over. In the mean time the busy_loop_us_ can never exceed whatever I > added to that so we are now locked into a loop. I realize I am > probably being pedantic, and it will have an exceedingly small rate of > occurrence, but it is still an issue. Do you realize that a 64bit clock wont rollover before the host has reached 584 years of uptime ?
Re: [net-next PATCH v2 5/8] net: Track start of busy loop instead of when it should end
On Thu, Mar 23, 2017 at 9:27 PM, Eric Dumazetwrote: > On Thu, 2017-03-23 at 20:42 -0700, Alexander Duyck wrote: >> On Thu, Mar 23, 2017 at 6:24 PM, Eric Dumazet wrote: >> > On Thu, 2017-03-23 at 14:37 -0700, Alexander Duyck wrote: >> >> From: Alexander Duyck >> >> >> > >> >> The last bit I changed is to move from using a shift by 10 to just using >> >> NSEC_PER_USEC and using multiplication for any run time calculations and >> >> division for a few compile time ones. This should be more accurate and >> >> perform about the same on most architectures since modern CPUs typically >> >> handle multiplication without too much overhead. >> > >> > >> > busy polling thread can be preempted for more than 2 seconds. >> >> If it is preempted is the timing value even valid anymore? I was >> wondering about that. Also when preemption is enabled is there >> anything to prevent us from being migrated to a CPU? If so what do we >> do about architectures that allow drift between the clocks? >> >> > Using usec instead of nanoseconds gave us 3 orders of magnitude cushion. >> >> Yes, but the problem is we also opened up an issue where if the clock >> was approaching a roll-over we could add a value to it that would put >> us in a state where we would never time out. > > If you believe there is a bug, send a fix for net tree ? > > I really do not see a bug, given we use time_after(now, end_time) which > handles roll-over just fine. Right, but time_after assumes roll over. When you are using a time value based off of local_clock() >> 10, you don't ever roll over when you do addition. Just the clock rolls over. At least on 64 bit systems. So if local time approaches something like all 1's, and we have shifted it by 10 it is then the max it can ever reach is 0x003F. I can add our loop time to that and it won't roll over. In the mean time the busy_loop_us_ can never exceed whatever I added to that so we are now locked into a loop. I realize I am probably being pedantic, and it will have an exceedingly small rate of occurrence, but it is still an issue. >> >> > We do not need nsec accuracy for busy polling users, if this restricts >> > range and usability under stress. >> >> Yes and no. So the standard use cases suggest using values of 50 to >> 100 microseconds. I suspect that for most people that is probably >> what they are using. The addition of preemption kind of threw a >> wrench in the works because now instead of spending that time busy >> polling you can get preempted and then are off doing something else >> for the entire period of time. > > That is fine. Busy polling heavy users are pinning threads to cpu, and > the re-schedule check is basically a way to make sure ksoftirqd will get > a chance to service BH. > >> >> What would you think of changing this so that instead of tracking the >> total time this function is active, instead we tracked the total time >> we spent with preemption disabled? What I would do is move the >> start_time configuration to just after the preempt_disable() call. >> Then if we end up yielding to another thread we would just reset the >> start_time when we restarted instead of trying to deal with all the >> extra clock nonsense that we would have to deal with otherwise since I >> don't know if we actually want to count time where we aren't actually >> doing anything. In addition this would bring us closer to how NAPI >> already works since it essentially will either find an event, or if we >> time out we hand it off to the softirq which in turn can handle it or >> hand it off to softirqd. The only item that might be a bit more >> difficult to deal with then would be the way the times are used in >> fs/select.c but I don't know if that is really the right way to go >> anyway. With the preemption changes and such it might just make sense >> to drop those bits and rely on just the socket polling alone. >> >> The other option is to switch over everything from using unsigned long >> to using uint64_t and time_after64. Then we can guarantee the range >> needed and then some, but then we are playing with a u64 time value on >> 32b architectures which might be a bit more expensive. Even with that >> though I still need to clean up the sysctl since it doesn't make sense >> to allow negative values for the busy_poll usec to be used which is >> currently the case. >> >> Anyway let me know what you think and I can probably spin out a new >> set tomorrow. > > > Leave usec resolution, I see no point switching to nsec, as long we > support 32bit kernels. I wonder how much it really matters in the grand scheme of things. The question I would have is which is more expensive. We are either having to do a set of double precision shifts, or a multiplication, double precision addition, and double precision compare. I need to take a look at some instruction tables to see which is more expensive. My
Re: [net-next PATCH v2 5/8] net: Track start of busy loop instead of when it should end
On Thu, Mar 23, 2017 at 9:27 PM, Eric Dumazet wrote: > On Thu, 2017-03-23 at 20:42 -0700, Alexander Duyck wrote: >> On Thu, Mar 23, 2017 at 6:24 PM, Eric Dumazet wrote: >> > On Thu, 2017-03-23 at 14:37 -0700, Alexander Duyck wrote: >> >> From: Alexander Duyck >> >> >> > >> >> The last bit I changed is to move from using a shift by 10 to just using >> >> NSEC_PER_USEC and using multiplication for any run time calculations and >> >> division for a few compile time ones. This should be more accurate and >> >> perform about the same on most architectures since modern CPUs typically >> >> handle multiplication without too much overhead. >> > >> > >> > busy polling thread can be preempted for more than 2 seconds. >> >> If it is preempted is the timing value even valid anymore? I was >> wondering about that. Also when preemption is enabled is there >> anything to prevent us from being migrated to a CPU? If so what do we >> do about architectures that allow drift between the clocks? >> >> > Using usec instead of nanoseconds gave us 3 orders of magnitude cushion. >> >> Yes, but the problem is we also opened up an issue where if the clock >> was approaching a roll-over we could add a value to it that would put >> us in a state where we would never time out. > > If you believe there is a bug, send a fix for net tree ? > > I really do not see a bug, given we use time_after(now, end_time) which > handles roll-over just fine. Right, but time_after assumes roll over. When you are using a time value based off of local_clock() >> 10, you don't ever roll over when you do addition. Just the clock rolls over. At least on 64 bit systems. So if local time approaches something like all 1's, and we have shifted it by 10 it is then the max it can ever reach is 0x003F. I can add our loop time to that and it won't roll over. In the mean time the busy_loop_us_ can never exceed whatever I added to that so we are now locked into a loop. I realize I am probably being pedantic, and it will have an exceedingly small rate of occurrence, but it is still an issue. >> >> > We do not need nsec accuracy for busy polling users, if this restricts >> > range and usability under stress. >> >> Yes and no. So the standard use cases suggest using values of 50 to >> 100 microseconds. I suspect that for most people that is probably >> what they are using. The addition of preemption kind of threw a >> wrench in the works because now instead of spending that time busy >> polling you can get preempted and then are off doing something else >> for the entire period of time. > > That is fine. Busy polling heavy users are pinning threads to cpu, and > the re-schedule check is basically a way to make sure ksoftirqd will get > a chance to service BH. > >> >> What would you think of changing this so that instead of tracking the >> total time this function is active, instead we tracked the total time >> we spent with preemption disabled? What I would do is move the >> start_time configuration to just after the preempt_disable() call. >> Then if we end up yielding to another thread we would just reset the >> start_time when we restarted instead of trying to deal with all the >> extra clock nonsense that we would have to deal with otherwise since I >> don't know if we actually want to count time where we aren't actually >> doing anything. In addition this would bring us closer to how NAPI >> already works since it essentially will either find an event, or if we >> time out we hand it off to the softirq which in turn can handle it or >> hand it off to softirqd. The only item that might be a bit more >> difficult to deal with then would be the way the times are used in >> fs/select.c but I don't know if that is really the right way to go >> anyway. With the preemption changes and such it might just make sense >> to drop those bits and rely on just the socket polling alone. >> >> The other option is to switch over everything from using unsigned long >> to using uint64_t and time_after64. Then we can guarantee the range >> needed and then some, but then we are playing with a u64 time value on >> 32b architectures which might be a bit more expensive. Even with that >> though I still need to clean up the sysctl since it doesn't make sense >> to allow negative values for the busy_poll usec to be used which is >> currently the case. >> >> Anyway let me know what you think and I can probably spin out a new >> set tomorrow. > > > Leave usec resolution, I see no point switching to nsec, as long we > support 32bit kernels. I wonder how much it really matters in the grand scheme of things. The question I would have is which is more expensive. We are either having to do a set of double precision shifts, or a multiplication, double precision addition, and double precision compare. I need to take a look at some instruction tables to see which is more expensive. My preference would be to switch things over to a u64 at this point as it fixes
Re: [net-next PATCH v2 5/8] net: Track start of busy loop instead of when it should end
On Thu, 2017-03-23 at 21:27 -0700, Eric Dumazet wrote: > If you believe min/max values should be added to the sysctls, because we > do not trust root anymore, please send patches only addressing that. extern unsigned int sysctl_net_busy_read; extern unsigned int sysctl_net_busy_poll; ... unsigned intsk_ll_usec; These are unsigned values, so no min/max would be required, only maybe use proc_douintvec() instead of proc_dointvec() as most sysctls use because we were lazy and had to wait linux-4.8 to actually get proc_douintvec() in the kernel ;)
Re: [net-next PATCH v2 5/8] net: Track start of busy loop instead of when it should end
On Thu, 2017-03-23 at 21:27 -0700, Eric Dumazet wrote: > If you believe min/max values should be added to the sysctls, because we > do not trust root anymore, please send patches only addressing that. extern unsigned int sysctl_net_busy_read; extern unsigned int sysctl_net_busy_poll; ... unsigned intsk_ll_usec; These are unsigned values, so no min/max would be required, only maybe use proc_douintvec() instead of proc_dointvec() as most sysctls use because we were lazy and had to wait linux-4.8 to actually get proc_douintvec() in the kernel ;)
Re: [net-next PATCH v2 5/8] net: Track start of busy loop instead of when it should end
On Thu, 2017-03-23 at 20:42 -0700, Alexander Duyck wrote: > On Thu, Mar 23, 2017 at 6:24 PM, Eric Dumazetwrote: > > On Thu, 2017-03-23 at 14:37 -0700, Alexander Duyck wrote: > >> From: Alexander Duyck > >> > > > >> The last bit I changed is to move from using a shift by 10 to just using > >> NSEC_PER_USEC and using multiplication for any run time calculations and > >> division for a few compile time ones. This should be more accurate and > >> perform about the same on most architectures since modern CPUs typically > >> handle multiplication without too much overhead. > > > > > > busy polling thread can be preempted for more than 2 seconds. > > If it is preempted is the timing value even valid anymore? I was > wondering about that. Also when preemption is enabled is there > anything to prevent us from being migrated to a CPU? If so what do we > do about architectures that allow drift between the clocks? > > > Using usec instead of nanoseconds gave us 3 orders of magnitude cushion. > > Yes, but the problem is we also opened up an issue where if the clock > was approaching a roll-over we could add a value to it that would put > us in a state where we would never time out. If you believe there is a bug, send a fix for net tree ? I really do not see a bug, given we use time_after(now, end_time) which handles roll-over just fine. > > > We do not need nsec accuracy for busy polling users, if this restricts > > range and usability under stress. > > Yes and no. So the standard use cases suggest using values of 50 to > 100 microseconds. I suspect that for most people that is probably > what they are using. The addition of preemption kind of threw a > wrench in the works because now instead of spending that time busy > polling you can get preempted and then are off doing something else > for the entire period of time. That is fine. Busy polling heavy users are pinning threads to cpu, and the re-schedule check is basically a way to make sure ksoftirqd will get a chance to service BH. > > What would you think of changing this so that instead of tracking the > total time this function is active, instead we tracked the total time > we spent with preemption disabled? What I would do is move the > start_time configuration to just after the preempt_disable() call. > Then if we end up yielding to another thread we would just reset the > start_time when we restarted instead of trying to deal with all the > extra clock nonsense that we would have to deal with otherwise since I > don't know if we actually want to count time where we aren't actually > doing anything. In addition this would bring us closer to how NAPI > already works since it essentially will either find an event, or if we > time out we hand it off to the softirq which in turn can handle it or > hand it off to softirqd. The only item that might be a bit more > difficult to deal with then would be the way the times are used in > fs/select.c but I don't know if that is really the right way to go > anyway. With the preemption changes and such it might just make sense > to drop those bits and rely on just the socket polling alone. > > The other option is to switch over everything from using unsigned long > to using uint64_t and time_after64. Then we can guarantee the range > needed and then some, but then we are playing with a u64 time value on > 32b architectures which might be a bit more expensive. Even with that > though I still need to clean up the sysctl since it doesn't make sense > to allow negative values for the busy_poll usec to be used which is > currently the case. > > Anyway let me know what you think and I can probably spin out a new > set tomorrow. Leave usec resolution, I see no point switching to nsec, as long we support 32bit kernels. If you believe min/max values should be added to the sysctls, because we do not trust root anymore, please send patches only addressing that. Thanks.
Re: [net-next PATCH v2 5/8] net: Track start of busy loop instead of when it should end
On Thu, 2017-03-23 at 20:42 -0700, Alexander Duyck wrote: > On Thu, Mar 23, 2017 at 6:24 PM, Eric Dumazet wrote: > > On Thu, 2017-03-23 at 14:37 -0700, Alexander Duyck wrote: > >> From: Alexander Duyck > >> > > > >> The last bit I changed is to move from using a shift by 10 to just using > >> NSEC_PER_USEC and using multiplication for any run time calculations and > >> division for a few compile time ones. This should be more accurate and > >> perform about the same on most architectures since modern CPUs typically > >> handle multiplication without too much overhead. > > > > > > busy polling thread can be preempted for more than 2 seconds. > > If it is preempted is the timing value even valid anymore? I was > wondering about that. Also when preemption is enabled is there > anything to prevent us from being migrated to a CPU? If so what do we > do about architectures that allow drift between the clocks? > > > Using usec instead of nanoseconds gave us 3 orders of magnitude cushion. > > Yes, but the problem is we also opened up an issue where if the clock > was approaching a roll-over we could add a value to it that would put > us in a state where we would never time out. If you believe there is a bug, send a fix for net tree ? I really do not see a bug, given we use time_after(now, end_time) which handles roll-over just fine. > > > We do not need nsec accuracy for busy polling users, if this restricts > > range and usability under stress. > > Yes and no. So the standard use cases suggest using values of 50 to > 100 microseconds. I suspect that for most people that is probably > what they are using. The addition of preemption kind of threw a > wrench in the works because now instead of spending that time busy > polling you can get preempted and then are off doing something else > for the entire period of time. That is fine. Busy polling heavy users are pinning threads to cpu, and the re-schedule check is basically a way to make sure ksoftirqd will get a chance to service BH. > > What would you think of changing this so that instead of tracking the > total time this function is active, instead we tracked the total time > we spent with preemption disabled? What I would do is move the > start_time configuration to just after the preempt_disable() call. > Then if we end up yielding to another thread we would just reset the > start_time when we restarted instead of trying to deal with all the > extra clock nonsense that we would have to deal with otherwise since I > don't know if we actually want to count time where we aren't actually > doing anything. In addition this would bring us closer to how NAPI > already works since it essentially will either find an event, or if we > time out we hand it off to the softirq which in turn can handle it or > hand it off to softirqd. The only item that might be a bit more > difficult to deal with then would be the way the times are used in > fs/select.c but I don't know if that is really the right way to go > anyway. With the preemption changes and such it might just make sense > to drop those bits and rely on just the socket polling alone. > > The other option is to switch over everything from using unsigned long > to using uint64_t and time_after64. Then we can guarantee the range > needed and then some, but then we are playing with a u64 time value on > 32b architectures which might be a bit more expensive. Even with that > though I still need to clean up the sysctl since it doesn't make sense > to allow negative values for the busy_poll usec to be used which is > currently the case. > > Anyway let me know what you think and I can probably spin out a new > set tomorrow. Leave usec resolution, I see no point switching to nsec, as long we support 32bit kernels. If you believe min/max values should be added to the sysctls, because we do not trust root anymore, please send patches only addressing that. Thanks.
Re: [net-next PATCH v2 5/8] net: Track start of busy loop instead of when it should end
On Thu, Mar 23, 2017 at 6:24 PM, Eric Dumazetwrote: > On Thu, 2017-03-23 at 14:37 -0700, Alexander Duyck wrote: >> From: Alexander Duyck >> > >> The last bit I changed is to move from using a shift by 10 to just using >> NSEC_PER_USEC and using multiplication for any run time calculations and >> division for a few compile time ones. This should be more accurate and >> perform about the same on most architectures since modern CPUs typically >> handle multiplication without too much overhead. > > > busy polling thread can be preempted for more than 2 seconds. If it is preempted is the timing value even valid anymore? I was wondering about that. Also when preemption is enabled is there anything to prevent us from being migrated to a CPU? If so what do we do about architectures that allow drift between the clocks? > Using usec instead of nanoseconds gave us 3 orders of magnitude cushion. Yes, but the problem is we also opened up an issue where if the clock was approaching a roll-over we could add a value to it that would put us in a state where we would never time out. > We do not need nsec accuracy for busy polling users, if this restricts > range and usability under stress. Yes and no. So the standard use cases suggest using values of 50 to 100 microseconds. I suspect that for most people that is probably what they are using. The addition of preemption kind of threw a wrench in the works because now instead of spending that time busy polling you can get preempted and then are off doing something else for the entire period of time. What would you think of changing this so that instead of tracking the total time this function is active, instead we tracked the total time we spent with preemption disabled? What I would do is move the start_time configuration to just after the preempt_disable() call. Then if we end up yielding to another thread we would just reset the start_time when we restarted instead of trying to deal with all the extra clock nonsense that we would have to deal with otherwise since I don't know if we actually want to count time where we aren't actually doing anything. In addition this would bring us closer to how NAPI already works since it essentially will either find an event, or if we time out we hand it off to the softirq which in turn can handle it or hand it off to softirqd. The only item that might be a bit more difficult to deal with then would be the way the times are used in fs/select.c but I don't know if that is really the right way to go anyway. With the preemption changes and such it might just make sense to drop those bits and rely on just the socket polling alone. The other option is to switch over everything from using unsigned long to using uint64_t and time_after64. Then we can guarantee the range needed and then some, but then we are playing with a u64 time value on 32b architectures which might be a bit more expensive. Even with that though I still need to clean up the sysctl since it doesn't make sense to allow negative values for the busy_poll usec to be used which is currently the case. Anyway let me know what you think and I can probably spin out a new set tomorrow. - Alex
Re: [net-next PATCH v2 5/8] net: Track start of busy loop instead of when it should end
On Thu, Mar 23, 2017 at 6:24 PM, Eric Dumazet wrote: > On Thu, 2017-03-23 at 14:37 -0700, Alexander Duyck wrote: >> From: Alexander Duyck >> > >> The last bit I changed is to move from using a shift by 10 to just using >> NSEC_PER_USEC and using multiplication for any run time calculations and >> division for a few compile time ones. This should be more accurate and >> perform about the same on most architectures since modern CPUs typically >> handle multiplication without too much overhead. > > > busy polling thread can be preempted for more than 2 seconds. If it is preempted is the timing value even valid anymore? I was wondering about that. Also when preemption is enabled is there anything to prevent us from being migrated to a CPU? If so what do we do about architectures that allow drift between the clocks? > Using usec instead of nanoseconds gave us 3 orders of magnitude cushion. Yes, but the problem is we also opened up an issue where if the clock was approaching a roll-over we could add a value to it that would put us in a state where we would never time out. > We do not need nsec accuracy for busy polling users, if this restricts > range and usability under stress. Yes and no. So the standard use cases suggest using values of 50 to 100 microseconds. I suspect that for most people that is probably what they are using. The addition of preemption kind of threw a wrench in the works because now instead of spending that time busy polling you can get preempted and then are off doing something else for the entire period of time. What would you think of changing this so that instead of tracking the total time this function is active, instead we tracked the total time we spent with preemption disabled? What I would do is move the start_time configuration to just after the preempt_disable() call. Then if we end up yielding to another thread we would just reset the start_time when we restarted instead of trying to deal with all the extra clock nonsense that we would have to deal with otherwise since I don't know if we actually want to count time where we aren't actually doing anything. In addition this would bring us closer to how NAPI already works since it essentially will either find an event, or if we time out we hand it off to the softirq which in turn can handle it or hand it off to softirqd. The only item that might be a bit more difficult to deal with then would be the way the times are used in fs/select.c but I don't know if that is really the right way to go anyway. With the preemption changes and such it might just make sense to drop those bits and rely on just the socket polling alone. The other option is to switch over everything from using unsigned long to using uint64_t and time_after64. Then we can guarantee the range needed and then some, but then we are playing with a u64 time value on 32b architectures which might be a bit more expensive. Even with that though I still need to clean up the sysctl since it doesn't make sense to allow negative values for the busy_poll usec to be used which is currently the case. Anyway let me know what you think and I can probably spin out a new set tomorrow. - Alex
Re: [net-next PATCH v2 5/8] net: Track start of busy loop instead of when it should end
On Thu, 2017-03-23 at 14:37 -0700, Alexander Duyck wrote: > From: Alexander Duyck> > The last bit I changed is to move from using a shift by 10 to just using > NSEC_PER_USEC and using multiplication for any run time calculations and > division for a few compile time ones. This should be more accurate and > perform about the same on most architectures since modern CPUs typically > handle multiplication without too much overhead. busy polling thread can be preempted for more than 2 seconds. Using usec instead of nanoseconds gave us 3 orders of magnitude cushion. We do not need nsec accuracy for busy polling users, if this restricts range and usability under stress.
Re: [net-next PATCH v2 5/8] net: Track start of busy loop instead of when it should end
On Thu, 2017-03-23 at 14:37 -0700, Alexander Duyck wrote: > From: Alexander Duyck > > The last bit I changed is to move from using a shift by 10 to just using > NSEC_PER_USEC and using multiplication for any run time calculations and > division for a few compile time ones. This should be more accurate and > perform about the same on most architectures since modern CPUs typically > handle multiplication without too much overhead. busy polling thread can be preempted for more than 2 seconds. Using usec instead of nanoseconds gave us 3 orders of magnitude cushion. We do not need nsec accuracy for busy polling users, if this restricts range and usability under stress.
[net-next PATCH v2 5/8] net: Track start of busy loop instead of when it should end
From: Alexander DuyckThis patch flips the logic we were using to determine if the busy polling has timed out. The main motivation for this is that we will need to support two different possible timeout values in the future and by recording the start time rather than when we would want to end we can focus on making the end_time specific to the task be it epoll or socket based polling. I am also flipping the logic a bit. The previous code was taking local_clock() and shifting it by 10 to get the time value in microseconds. That works for most values but has a side effect of us potentially encountering time values that will never time out as the current time will never exceed the recorded clock value plus the timeout usecs if the clock value was approaching a roll-over. To account for that I am leaving start_time as a nanoseconds value instead of shifting it down to a microseconds value. In addition I am limiting the busy_poll and busy_read values so that they cannot exceed about 2 seconds. By doing this I can add the timeout value into the nanosecond value and be guaranteed that we will not prematurely trigger timeouts, even on 32 bit architectures. The last bit I changed is to move from using a shift by 10 to just using NSEC_PER_USEC and using multiplication for any run time calculations and division for a few compile time ones. This should be more accurate and perform about the same on most architectures since modern CPUs typically handle multiplication without too much overhead. Signed-off-by: Alexander Duyck --- fs/select.c| 16 + include/net/busy_poll.h| 78 +++- net/core/dev.c |6 ++- net/core/sysctl_net_core.c | 11 +- 4 files changed, 68 insertions(+), 43 deletions(-) diff --git a/fs/select.c b/fs/select.c index e2112270d75a..9287d3a96e35 100644 --- a/fs/select.c +++ b/fs/select.c @@ -409,7 +409,7 @@ int do_select(int n, fd_set_bits *fds, struct timespec64 *end_time) int retval, i, timed_out = 0; u64 slack = 0; unsigned int busy_flag = net_busy_loop_on() ? POLL_BUSY_LOOP : 0; - unsigned long busy_end = 0; + unsigned long busy_start = 0; rcu_read_lock(); retval = max_select_fd(n, fds); @@ -512,11 +512,11 @@ int do_select(int n, fd_set_bits *fds, struct timespec64 *end_time) /* only if found POLL_BUSY_LOOP sockets && not out of time */ if (can_busy_loop && !need_resched()) { - if (!busy_end) { - busy_end = busy_loop_end_time(); + if (!busy_start) { + busy_start = busy_loop_current_time(); continue; } - if (!busy_loop_timeout(busy_end)) + if (!busy_loop_timeout(busy_start)) continue; } busy_flag = 0; @@ -800,7 +800,7 @@ static int do_poll(struct poll_list *list, struct poll_wqueues *wait, int timed_out = 0, count = 0; u64 slack = 0; unsigned int busy_flag = net_busy_loop_on() ? POLL_BUSY_LOOP : 0; - unsigned long busy_end = 0; + unsigned long busy_start = 0; /* Optimise the no-wait case */ if (end_time && !end_time->tv_sec && !end_time->tv_nsec) { @@ -853,11 +853,11 @@ static int do_poll(struct poll_list *list, struct poll_wqueues *wait, /* only if found POLL_BUSY_LOOP sockets && not out of time */ if (can_busy_loop && !need_resched()) { - if (!busy_end) { - busy_end = busy_loop_end_time(); + if (!busy_start) { + busy_start = busy_loop_current_time(); continue; } - if (!busy_loop_timeout(busy_end)) + if (!busy_loop_timeout(busy_start)) continue; } busy_flag = 0; diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h index c55760f4820f..4626cb22f625 100644 --- a/include/net/busy_poll.h +++ b/include/net/busy_poll.h @@ -41,67 +41,85 @@ */ #define MIN_NAPI_ID ((unsigned int)(NR_CPUS + 1)) +/* The timer checks are using time_after() to determine if we have + * exceeded out timeout period. In order to support that we have to limit + * ourselves to the smallest possible signed type and then in addition + * account for the fact that we are recording our value in microseconds + * instead of nanoseconds. + * + * This limit should be just a little over 2 seconds. + */ +#define MAX_BUSY_POLL (INT_MAX / NSEC_PER_USEC) + static inline bool net_busy_loop_on(void) { return sysctl_net_busy_poll; } -static inline u64
[net-next PATCH v2 5/8] net: Track start of busy loop instead of when it should end
From: Alexander Duyck This patch flips the logic we were using to determine if the busy polling has timed out. The main motivation for this is that we will need to support two different possible timeout values in the future and by recording the start time rather than when we would want to end we can focus on making the end_time specific to the task be it epoll or socket based polling. I am also flipping the logic a bit. The previous code was taking local_clock() and shifting it by 10 to get the time value in microseconds. That works for most values but has a side effect of us potentially encountering time values that will never time out as the current time will never exceed the recorded clock value plus the timeout usecs if the clock value was approaching a roll-over. To account for that I am leaving start_time as a nanoseconds value instead of shifting it down to a microseconds value. In addition I am limiting the busy_poll and busy_read values so that they cannot exceed about 2 seconds. By doing this I can add the timeout value into the nanosecond value and be guaranteed that we will not prematurely trigger timeouts, even on 32 bit architectures. The last bit I changed is to move from using a shift by 10 to just using NSEC_PER_USEC and using multiplication for any run time calculations and division for a few compile time ones. This should be more accurate and perform about the same on most architectures since modern CPUs typically handle multiplication without too much overhead. Signed-off-by: Alexander Duyck --- fs/select.c| 16 + include/net/busy_poll.h| 78 +++- net/core/dev.c |6 ++- net/core/sysctl_net_core.c | 11 +- 4 files changed, 68 insertions(+), 43 deletions(-) diff --git a/fs/select.c b/fs/select.c index e2112270d75a..9287d3a96e35 100644 --- a/fs/select.c +++ b/fs/select.c @@ -409,7 +409,7 @@ int do_select(int n, fd_set_bits *fds, struct timespec64 *end_time) int retval, i, timed_out = 0; u64 slack = 0; unsigned int busy_flag = net_busy_loop_on() ? POLL_BUSY_LOOP : 0; - unsigned long busy_end = 0; + unsigned long busy_start = 0; rcu_read_lock(); retval = max_select_fd(n, fds); @@ -512,11 +512,11 @@ int do_select(int n, fd_set_bits *fds, struct timespec64 *end_time) /* only if found POLL_BUSY_LOOP sockets && not out of time */ if (can_busy_loop && !need_resched()) { - if (!busy_end) { - busy_end = busy_loop_end_time(); + if (!busy_start) { + busy_start = busy_loop_current_time(); continue; } - if (!busy_loop_timeout(busy_end)) + if (!busy_loop_timeout(busy_start)) continue; } busy_flag = 0; @@ -800,7 +800,7 @@ static int do_poll(struct poll_list *list, struct poll_wqueues *wait, int timed_out = 0, count = 0; u64 slack = 0; unsigned int busy_flag = net_busy_loop_on() ? POLL_BUSY_LOOP : 0; - unsigned long busy_end = 0; + unsigned long busy_start = 0; /* Optimise the no-wait case */ if (end_time && !end_time->tv_sec && !end_time->tv_nsec) { @@ -853,11 +853,11 @@ static int do_poll(struct poll_list *list, struct poll_wqueues *wait, /* only if found POLL_BUSY_LOOP sockets && not out of time */ if (can_busy_loop && !need_resched()) { - if (!busy_end) { - busy_end = busy_loop_end_time(); + if (!busy_start) { + busy_start = busy_loop_current_time(); continue; } - if (!busy_loop_timeout(busy_end)) + if (!busy_loop_timeout(busy_start)) continue; } busy_flag = 0; diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h index c55760f4820f..4626cb22f625 100644 --- a/include/net/busy_poll.h +++ b/include/net/busy_poll.h @@ -41,67 +41,85 @@ */ #define MIN_NAPI_ID ((unsigned int)(NR_CPUS + 1)) +/* The timer checks are using time_after() to determine if we have + * exceeded out timeout period. In order to support that we have to limit + * ourselves to the smallest possible signed type and then in addition + * account for the fact that we are recording our value in microseconds + * instead of nanoseconds. + * + * This limit should be just a little over 2 seconds. + */ +#define MAX_BUSY_POLL (INT_MAX / NSEC_PER_USEC) + static inline bool net_busy_loop_on(void) { return sysctl_net_busy_poll; } -static inline u64 busy_loop_us_clock(void) +static inline bool sk_can_busy_loop(const