On Wednesday, November 19, 2014 11:23:10 PM UTC-8, Uli wrote:
>
> >>> The Lee-Man <leeman...@gmail.com <javascript:>> schrieb am 19.11.2014 
> um 19:35 in 
> Nachricht <16860f30-b55c-4106-a4e1-d7badfc36...@googlegroups.com 
> <javascript:>>: 
> > On Tuesday, November 18, 2014 10:55:31 PM UTC-8, Uli wrote: 
> >> 
> >> >>> Lee Duncan <leeman...@gmail.com <javascript:>> schrieb am 
> 18.11.2014 
> >> um 22:35 in 
> >> Nachricht <1416346536-18198-1-git-seemail-###-duncan@ 
> <javascript:>###>: 
> >> > The following patch fixes a problem where the CPU becomes compute 
> bound 
> >> > when rediscovering targets, when there are hundreds of sessions. 
> >> > 
> >> > When his occurs, most of the time is spent in the function 
> >> > iscsi_sysfs_for_each_session(). This function does a scandir(), 
> >> > sorted alphabetically, to get a list of sessions, then scans 
> >> > that list looking for a match. When there are hundreds of sesions 
> >> > this can take forever. 
> >> 
> >> 
> >> I wonder: What takes forever: reading hundreds of sysfs entries, 
> sorting 
> >> them, or looking for a match? I guess none of them should take forever 
> >> unless the algorithm is really very bad. 
> >> 
> > 
> > The problem is not the sort, since the patch still does a sort of the 
> > directory entries. 
> > 
> > I believe the problem is in processing the sorted data, in 
> > iscsi_sysfs_for_each_session(). This function does dozens or more 
> > open/read/close cycles on sysfs attributes. Multiply that times hundreds 
> of 
> > session, and you have tens of thousands I/O operations. This fix ensures 
> > that, much of time, this loop is only gone through once. 
>
> When the reason for the sort is just to find some extrema (min or max), 
> the sort is not needed.


Actually, in general, that is not true. In the case where we've cached an 
entry, that may or may
not be true, depending on the use case.
 

> Also when just needing some extreme entry (min or max) it makes little 
> sense to populate some array with all entries just to discard them all, but 
> one. I don't know the details, but if all the other stuff isn't needed, it 
> shouldn't be read.
>

It also does not hurt anything to sort the whole array. I am trying to fix 
an issue where the CPU become completely compute-bound with hundreds (or 
more) sessions coming and going.

It sounds to me like you would like to redesign some of the code to make it 
more efficient. While I believe in efficiency in general, I don't believe 
the trade-off is worth the time here, since this is not the issue that is 
causing problems.


> What I saw from your path is that you just cheat on the sort routine. So 
> if getting the entries takes all the time (I guess it happens before 
> sorting), I don't see how you save a lot of time that way, _unless_ the 
> compare routine actually does I/O to compare the values which is bad, 
> because every value is compared more than once in a typical sort. 
>

As I said, it is trying to populate hundreds of internal session objects to 
find the one we need that takes so much time.

I believe part of the problem making more optimizations here is the fact 
that iscsi_sysfs_for_each_session() is called with a compare function 
pointer, so the routine does not know if the first entry in the sort list 
is going to make the caller happy or not. So in order to make this code 
work in all current cases *and* fix the compute-bound problem, a small 
change that sorts the last-known session to the top of the list cannot 
break any code, and also solves my problem.

Here are some actual numbers from testing this patch:

For an idle system (no data or link bounce) with 120 targets the CPU time 
improvement is about 3:1 after running 72 hours

For a system with data and link bounce every 10 minutes the CPU time 
improvement is on the order to 100:1 after 24 hours.  
The CPU time with the patch is linear with time.  Without the patch, it's 
exponential and will break the system to its knees after 1 day of bouncing 
links.  
Memory use was over 40G additional on the system without the patch.

Here's sar data from the 2 systems (bumble1 has the patch; bumble2 does 
not).  This is with bouncing links after 24+ hours:

bumble1:~ # sar -u 1 1000
Linux 2.6.32.54-0.23.TDC.1.R.2-default (bumble1)        11/13/14       
 _x86_64_

10:52:16        CPU     %user     %nice   %system   %iowait    %steal     
%idle
10:52:17        all      0.00      0.00      0.02      0.00      0.00     
99.98
10:52:18        all      0.00      0.00      0.07      0.00      0.00     
99.93
10:52:19        all      0.00      0.00      0.04      0.00      0.00     
99.96
10:52:20        all      0.00      0.00      0.16      0.00      0.00     
99.84
10:52:21        all      0.00      0.00      0.04      0.00      0.00     
99.96
10:52:22        all      0.00      0.00      0.10      0.00      0.00     
99.90
10:52:23        all      0.05      0.00      0.07      0.00      0.00     
99.89


bumble2:~ # sar -u 1 10000
Linux 2.6.32.54-0.23.TDC.1.R.2-default (bumble2)        11/13/14       
 _x86_64_

10:52:20        CPU     %user     %nice   %system   %iowait    %steal     
%idle
10:52:21        all     99.53      0.00      0.47      0.00      0.00     
 0.00
10:52:22        all     99.56      0.00      0.44      0.00      0.00     
 0.00
10:52:23        all     99.56      0.00      0.44      0.00      0.00     
 0.00
10:52:24        all     99.28      0.00      0.72      0.00      0.00     
 0.00
10:52:25        all     99.59      0.00      0.41      0.00      0.00     
 0.00

>
>
> > 
> > 
> >> > 
> >> > This patch saves the current session and then ensures that this 
> >> > session sorted to the front of the list. Testing shows that 
> >> > CPU usage goes from near 100% to near 0% when running cable 
> >> > plug tests with hundreds of sessions. 
> >> > 
> >> > Signed-off-by: Lee Duncan <leeman...@gmail.com <javascript:>> 
> >> > 
> >> > -- 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

Reply via email to