Re: Please, put 64-bit counter per task and incr.by.one each ctxt switch.

2008-02-26 Thread J.C. Pizarro
On 2008/2/25, Andrew Morton <[EMAIL PROTECTED]> wrote:
> On Sun, 24 Feb 2008 14:12:47 +0100 "J.C. Pizarro" <[EMAIL PROTECTED]> wrote:
>
>  > It's statistic, yes, but it's a very important parameter for the 
> CPU-scheduler.
>  > The CPU-scheduler will know the number of context switches of each task
>  >  before of to take a blind decision into infinitum!.
>
>
> We already have these:
>
> unsigned long nvcsw, nivcsw; /* context switch counts */
>
>  in the task_struct.

1. They use "unsigned long" instead "unsigned long long".
2. They use "= 0;" instead of "= 0ULL";
3. They don't use ++ (incr. by one per ctxt-switch).
4. I don't like the separation of voluntary and involuntary ctxt-switches,
and i don't understand the utility of this separation.

The tsk->nvcsw & tsk->nivcsw mean different to i had proposed.

It's simple, when calling to function kernel/sched.c:context_switch(..)
to do ++, but they don't do it.

I propose you
1. unsigned long long tsk->ncsw = 0ULL;  and  tsk->ncsw++;
2. unsigned long long tsk->last_registered_ncsw = tsk->ncsw; when it's polling.
3. long tsk->vcsw =  ( tsk->ncsw - tsk->last_registered_ncsw ) / ( t2 - t1 )
/* velocity of task (ctxt-switches per second), (t1 != t2 in seconds
for no zerodiv)
4. long tsk->last_registered_vcsw = tsk->vcsw;
5. long tsk->normalized_vcsw =
   (1 - alpha)*tsk->last_registered_vcsw + alpha*tsk->vcsw; /* 0http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Patch kernel: I have 8 Gbytes RAM, but why I can only allocate 2.8 Gbytes RAM for a single process?

2008-02-25 Thread J.C. Pizarro
2008/2/25, Ady Wicaksono <[EMAIL PROTECTED]>:
> I have 8 Gbytes RAM, but why I can allocate 2.8 Gbytes RAM for a single 
> process?
>  How to patch kernel so I have more than 2.8 Gbytes limitation?
>
>  Kernel:
>  ---
>  Linux xxx.com 2.6.9-023stab046.2-enterprise #1 SMP Mon Dec 10 15:22:33
>  MSK 2007 i686 i686 i386 GNU/Linux
>
>  Mem:
>  ---
>  # cat /proc/meminfo
>  MemTotal:  8296484 kB
>  MemFree: 50416 kB
>  Buffers: 64412 kB
>  Cached:4927328 kB
>  SwapCached:  0 kB
>  Active:6710828 kB
>  Inactive:  1065384 kB
>  HighTotal: 4980736 kB
>  HighFree: 1024 kB
>  LowTotal:  3315748 kB
>  LowFree: 49392 kB
>  SwapTotal:10256376 kB
>  SwapFree: 10255732 kB
>  Dirty:  64 kB
>  Writeback:   0 kB
>  Mapped:3054960 kB
>  Slab:   393224 kB
>  CommitLimit:  14404616 kB
>  Committed_AS:  6318152 kB
>  PageTables:  34892 kB
>  VmallocTotal:   303096 kB
>  VmallocUsed: 22360 kB
>  VmallocChunk:   280496 kB
>
>
>  CPU (8 processor id from 0-7), one of them is:
>  ---
>  processor   : 0
>  vendor_id   : GenuineIntel
>  cpu family  : 15
>  model   : 6
>  model name  : Intel(R) Xeon(TM) CPU 3.00GHz
>  stepping: 4
>  cpu MHz : 2993.054
>  cache size  : 2048 KB
>  physical id : 0
>  siblings: 4
>  core id : 0
>  cpu cores   : 2
>  fdiv_bug: no
>  hlt_bug : no
>  f00f_bug: no
>  coma_bug: no
>  fpu : yes
>  fpu_exception   : yes
>  cpuid level : 6
>  wp  : yes
>  flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
>  mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx
>  lm constant_tsc pni monitor ds_cpl est cid xtpr
>  bogomips: 5989.55
>
>  App to test memory limit:
>  ---
>  #include 
>  #include 
>
>  int main(){
> size_t siz = 100 * 1024 * 1024 ;
> size_t idx = 1 ;
> void *ptr;
>
> for (;;){
> ptr = malloc ( siz * idx );
> if(!ptr)
> break ;
> free(ptr);
> idx++;
> }
> printf ("Max malloc %d * 100 MB \n", idx - 1 );
> return (0);
>  }
>
>  App result: Max malloc 28 * 100 MB ==> 2.8 Gbytes

1. It's a 32-bit processor Xeon with 8 GiB of RAM. OK?
2. The 32-bit userspace's process is always limited to <3.0 GiB ( < 0xC000 )
3. Enable PAE (64 GB option in the kernel) to address the 8 GiB of RAM
using PAE,
also you can have many processes of ~3 GiB per process.
I'm not sure if the PAE's three-level paging works efficient in linux.

   ;)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Please, put 64-bit counter per task and incr.by.one each ctxt switch.

2008-02-24 Thread J.C. Pizarro
Good morning :)

On 2008/2/24, Rik van Riel <[EMAIL PROTECTED]> wrote:
> OK, one last reply on the (overly optimistic?) assumption that you are not a 
> troll.
>  > +++ linux-2.6_git-20080224/include/linux/sched.h2008-02-24
>  > 04:50:18.0 +0100
>  > @@ -1007,6 +1007,12 @@
>  > struct hlist_head preempt_notifiers;
>  >  #endif
>  >
>  > +   unsigned long long ctxt_switch_counts; /* 64-bit switches' count */
>  > +   /* ToDo:
>  > +*  To implement a poller/clock for CPU-scheduler that only reads
>  > +*   these counts of context switches of the runqueue's tasks.
>  > +*  No problem if this poller/clock is not implemented. */
>
> So you're introducing a statistic, but have not yet written any code
>  that uses it?

It's statistic, yes, but it's a very important parameter for the CPU-scheduler.
The CPU-scheduler will know the number of context switches of each task
 before of to take a blind decision into infinitum!.

Statistically, there are tasks X that have higher context switches and
tasks Y that have lower context switches in the last sized interval with the
 historical formula "(alpha-1)*prev + alpha*current" 0 < alpha < 1.
(measure this value V as a velocity of number of ctxt-switches/second too)

  Put more weight to X than to Y for more interactivity that X want.
  (X will have more higher V and Y more lower V).
  With an exception for avoid the eternal humble, to do sin(x) behaviour
   after of a long period of humble (later to modify the weights).

The missing code has to be implemented between everybodies because
1. Users wann't lose interactivity in overloaded CPU.
2. There are much code of CPU-schedulers bad organizated that i wann't
 touch it.

>  > +   p->ctxt_switch_counts = 0ULL; /* task's 64-bit counter inited 0 */
>
> Because we can all read C, there is no need to tell people in comments
>  what the code does.  Comments are there to explain why the code does
>  things, if an explanation is needed.

OK.

>  > >  > I will explain your later why of it.
>  > >
>  > > ... and explain exactly why the kernel needs this extra code.
>  >
>  > One reason: for the objective of gain interactivity, it's an issue that
>  >  CFS fair scheduler lacks it.

> Your patch does not actually help interactivity, because all it does
>  is add an irq spinlock in a hot path (bad idea) and a counter which
>  nothing reads.

Then remove the lock/unlock of the task that i'd put it,
i'm not sure if it's secure because i didn't read all the control of the road.

On 2008/2/24, Mike Galbraith <[EMAIL PROTECTED]> wrote:
>  > One reason: for the objective of gain interactivity, it's an issue that
>  >  CFS fair scheduler lacks it.
>
> A bug report would be a much better first step toward resolution of any
>  interactivity issues you're seeing than posts which do nothing but
>  suggest that there may be a problem.
>
>  First define the problem, _then_ fix it.

It's blind eternal problem in overloaded CPU scenario in the desktops.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Please, put 64-bit counter per task and incr.by.one each ctxt switch.

2008-02-23 Thread J.C. Pizarro
On 2008/2/24, Rik van Riel <[EMAIL PROTECTED]> wrote:
> On Sun, 24 Feb 2008 04:08:38 +0100
>  "J.C. Pizarro" <[EMAIL PROTECTED]> wrote:
>
>  > We will need 64 bit counters of the slow context switches,
>  >   one counter for each new created task (e.g. u64 ctxt_switch_counts;)
>
>
> Please send a patch ...

diff -ur linux-2.6_git-20080224.orig/include/linux/sched.h
linux-2.6_git-20080224/include/linux/sched.h
--- linux-2.6_git-20080224.orig/include/linux/sched.h   2008-02-24
01:04:18.0 +0100
+++ linux-2.6_git-20080224/include/linux/sched.h2008-02-24
04:50:18.0 +0100
@@ -1007,6 +1007,12 @@
struct hlist_head preempt_notifiers;
 #endif

+   unsigned long long ctxt_switch_counts; /* 64-bit switches' count */
+   /* ToDo:
+*  To implement a poller/clock for CPU-scheduler that only reads
+*   these counts of context switches of the runqueue's tasks.
+*  No problem if this poller/clock is not implemented. */
+
/*
 * fpu_counter contains the number of consecutive context switches
 * that the FPU is used. If this is over a threshold, the lazy fpu
diff -ur linux-2.6_git-20080224.orig/kernel/sched.c
linux-2.6_git-20080224/kernel/sched.c
--- linux-2.6_git-20080224.orig/kernel/sched.c  2008-02-24
01:04:19.0 +0100
+++ linux-2.6_git-20080224/kernel/sched.c   2008-02-24
04:33:57.0 +0100
@@ -2008,6 +2008,8 @@
BUG_ON(p->state != TASK_RUNNING);
update_rq_clock(rq);

+   p->ctxt_switch_counts = 0ULL; /* task's 64-bit counter inited 0 */
+
p->prio = effective_prio(p);

if (!p->sched_class->task_new || !current->se.on_rq) {
@@ -2189,8 +2191,14 @@
 context_switch(struct rq *rq, struct task_struct *prev,
   struct task_struct *next)
 {
+   unsigned long flags;
+   struct rq *rq_prev;
struct mm_struct *mm, *oldmm;

+   rq_prev = task_rq_lock(prev, &flags); /* locking the prev task */
+   prev->ctxt_switch_counts++; /* incr.+1 the task's 64-bit counter */
+   task_rq_unlock(rq_prev, &flags); /* unlocking the prev task */
+
prepare_task_switch(rq, prev, next);
mm = next->mm;
oldmm = prev->active_mm;

>  > I will explain your later why of it.
>
>
> ... and explain exactly why the kernel needs this extra code.

One reason: for the objective of gain interactivity, it's an issue that
 CFS fair scheduler lacks it.

o:)


linux-2.6_git-20080224_ctxt_switch_counts.patch
Description: Binary data


Please, put 64-bit counter per task and incr.by.one each ctxt switch.

2008-02-23 Thread J.C. Pizarro
Hello,

We will need 64 bit counters of the slow context switches,
  one counter for each new created task (e.g. u64 ctxt_switch_counts;)

We will only need them during the lifetime of the tasks.

To increment by +1 the task's 64 bit counter (it's fast)
  each one slow context switch.

*kernel/sched.c:
void context_switch(...) { ... } # incr. +1 here.
void wake_up_new_task(...) { ... } # ->ctxt_switch_counts = 0ULL;

*include/linux/sched.h:
struct task_struct { ... } # add 64-bit (u64 ctxt_switch_counts;) here.

Please, do it and we can do it better than CFS fair scheduler.

I will explain your later why of it.

   O:)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Question about your git habits

2008-02-23 Thread J.C. Pizarro
The google's gmail made a crap my last message that it did wrap
my message of X lines to the crap of (X+o) lines misconfiguring
my original lines of the message.

I don't see the motives of Google crapping my original lines
of the messages that i had sended.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Question about your git habits

2008-02-23 Thread J.C. Pizarro
On 2008/2/23, Charles Bailey <[EMAIL PROTECTED]> wrote:
> On Sat, Feb 23, 2008 at 02:36:59PM +0100, J.C. Pizarro wrote:
>  > On 2008/2/23, Charles Bailey <[EMAIL PROTECTED]> wrote:
>  > >
>
> > > It shouldn't matter how aggressively the repositories are packed or what
>  > >  the binary differences are between the pack files are. git clone
>  > >  should (with the --reference option) generate a new pack for you with
>  > >  only the missing objects. If these objects are ~52 MiB then a lot has
>  > >  been committed to the repository, but you're not going to be able to
>  > >  get around a big download any other way.
>  >
>  > You're wrong, nothing has to be commited ~52 MiB to the repository.
>  >
>  > I'm not saying "commit", i'm saying
>  >
>  > "Assume A & B binary git repos and delta_B-A another binary file, i
>  > request built
>  > B' = A + delta_B-A where is verified SHA1(B') = SHA1(B) for avoiding
>  > corrupting".
>  >
>  > Assume B is the higher repacked version of "A + minor commits of the day"
>  > as if B was optimizing 24 hours more the minimum spanning tree. Wow!!!
>  >
>
>
> I'm not sure that I understand where you are going with this.
>  Originally, you stated that if you clone a 775 MiB repository on day
>  one, and then you clone it again on day two when it was 777 MiB, then
>  you currently have to download 775 + 777 MiB of data, whereas you
>  could download a 52 MiB binary diff. I have no idea where that value
>  of 52 MiB comes from, and I've no idea how many objects were committed
>  between day one and day two. If we're going to talk about details,
>  then you need to provide more details about your scenario.

I don't said that "A & B binary git repos" are binary files, but i said that
delta_B-A is a binary file.

I said ago ~15 hours "Suppose the size cost of this binary delta is e.g. around
52 MiB instead of 2 MiB due to numerous mismatching of binary parts ..."

The binary delta is different to the textual delta (between lines of texts)
 used in the git scheme (the commits or changesets use textual deltas).
The textual delta can be compressed resulting a smaller binary object.
Collecting binary objects and some more is the git repository.
You can't apply textual delta of git repository, only binary delta.
You can apply binary delta of both git-repacked repositories if there
is a program
 that generates binary delta of both directories but it's not implement yet.
The SHA1 verifier is useful for avoid the corrupting of the generated repository
 (if it's corrupted then it has to be cloned again delta or whole
until non-corrupted).
An example of same SHA1 of both directories can be implemented as same SHA1
 of sorted SHA1s of contents, filenames and properties. Anything
alterated, added
 or eliminated from them implies different SHA1.

Don't you understand i'm saying? I will give you a practical example.
1. zip -r -8  foo1.zip foo1  # in foo1 there are tons of information
as from git repo
2. mv foo1 foo2 ; cp bar.txt foo2/
3. zip -r -9 foo2.zip foo2   # still little bit more optimized (=
higher repacked)
4. Apply binary delta between foo1.zip & foo2.zip with a supposed program
 deltaier and you get delta_foo1_foo2.bin. The size(delta_foo1_foo2.bin) is
 not nearly ~( size(foo2.zip) - size(foo1.zip) )
5. Apply hexadecimal diff and you will understand why it gives the exemplar
 ~52 MiB instead of ~2 MiB that i said it.
6. You will know some identical parts in both foo1.zip and foo2.zip.
 Identical parts are good for smaller binary deltas. It's possible to get
 still smaller binary deltas when their identical parts are in
random offsets
 or random locations depending of how deltaier program is advanced.
7. Same above but instead of both files, apply binary delta of both directories.

>  Having said that, here is my original point in some more detail. git
>  repositories are not binary blobs, they are object databases. Better
>  than this, they are databases of immutable objects. This means that to
>  get the difference between one database and another, you only need to
>  add the objects that are missing from the other database.

Databases of immutable objects <--- You're wrong because you confuse.
There are mutable objects as the better deltas of min. spanning tree.

The missing objects are not only the missing sources that you're thinking,
they can be any thing (blob, tree, commit, tag, etc.). The deltas of the
minimum spanning tree too are objects of the database that can be erased
or added when the spanning tree is alterated (because the alterated spanning
tree is smaller than previous) for bett

Re: Question about your git habits

2008-02-23 Thread J.C. Pizarro
On 2008/2/23, Charles Bailey <[EMAIL PROTECTED]> wrote:
> On Sat, Feb 23, 2008 at 02:08:35PM +0100, J.C. Pizarro wrote:
>  >
>  > But if the repos are aggressively repacked then the bit to bit differences
>  > are not ~2 MiB.
>
>
> It shouldn't matter how aggressively the repositories are packed or what
>  the binary differences are between the pack files are. git clone
>  should (with the --reference option) generate a new pack for you with
>  only the missing objects. If these objects are ~52 MiB then a lot has
>  been committed to the repository, but you're not going to be able to
>  get around a big download any other way.

You're wrong, nothing has to be commited ~52 MiB to the repository.

I'm not saying "commit", i'm saying

"Assume A & B binary git repos and delta_B-A another binary file, i
request built
B' = A + delta_B-A where is verified SHA1(B') = SHA1(B) for avoiding
corrupting".

Assume B is the higher repacked version of "A + minor commits of the day"
as if B was optimizing 24 hours more the minimum spanning tree. Wow!!!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Question about your git habits

2008-02-23 Thread J.C. Pizarro
On 2008/2/23, Charles Bailey <[EMAIL PROTECTED]> wrote:
> On Sat, Feb 23, 2008 at 03:47:07AM +0100, J.C. Pizarro wrote:
>  >
>  > Yesterday, i had git cloned git://foo.com/bar.git   ( 777 MiB )
>  >  Today, i've git cloned git://foo.com/bar.git   ( 779 MiB )
>  >
>  >  Both repos are different binaries , and i used 777 MiB + 779 MiB = 1556 
> MiB
>  >  of bandwidth in two days. It's much!
>  >
>  >  Why don't we implement "binary delta between old git repo and recent git 
> repo"
>  >  with "SHA1 built git repo verifier"?
>  >
>  >  Suppose the size cost of this binary delta is e.g. around 52 MiB instead 
> of
>  >  2 MiB due to numerous mismatching of binary parts, then the bandwidth
>  >  in two days will be 777 MiB + 52 MiB = 829 MiB instead of 1556 MiB.
>  >
>  >  Unfortunately, this "binary delta of repos" is not implemented yet :|
>
>
> It sounds like what concerns you is the bandwith to git://foo.bar. If
>  you are cloning the first repository to somewhere were the first
>  clone is accessible and bandwidth between the clones is not an issue,
>  then you should be able to use the --reference parameter to git clone
>  to just fetch the missing ~2 MiB from foo.bar.
>
>  A "binary delta of repos" should just be an 'incremental' pack file
>  and the git protocol should support generating an appropriate one. I'm
>  not quite sure what "not implemented yet" feature you are looking for.

But if the repos are aggressively repacked then the bit to bit differences
are not ~2 MiB.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Question about your git habits

2008-02-22 Thread J.C. Pizarro
On 2008/2/23, Al Viro <[EMAIL PROTECTED]> wrote:
> On Fri, Feb 22, 2008 at 05:51:04PM -0800, Junio C Hamano wrote:
>  > Al Viro <[EMAIL PROTECTED]> writes:
>  >
>  > > On Sat, Feb 23, 2008 at 02:37:00AM +0100, Jan Engelhardt wrote:
>  > >
>  > >> >do you tend to clone the entire repository repeatedly into a series
>  > >> >of separate working directories
>  > >>
>  > >> Too time consuming on consumer drives with projects the size of Linux.
>  > >
>  > > git clone -l -s
>  > >
>  > > is not particulary slow...
>  >
>  > How big is a checkout of a single revision of kernel these days,
>  > compared to a well-packed history since v2.6.12-rc2?
>  >
>  > The cost of writing out the work tree files isn't ignorable and
>  > probably more than writing out the repository data (which -s
>  > saves for you).
>
>
> Depends...  I'm using ext2 for that and noatime everywhere, so that might
>  change the picture, but IME it's fast enough...  As for the size, it gets
>  to ~320Mb on disk, which is comparable to the pack size (~240-odd Mb).

Yesterday, i had git cloned git://foo.com/bar.git   ( 777 MiB )
Today, i've git cloned git://foo.com/bar.git   ( 779 MiB )

Both repos are different binaries , and i used 777 MiB + 779 MiB = 1556 MiB
of bandwidth in two days. It's much!

Why don't we implement "binary delta between old git repo and recent git repo"
with "SHA1 built git repo verifier"?

Suppose the size cost of this binary delta is e.g. around 52 MiB instead of
2 MiB due to numerous mismatching of binary parts, then the bandwidth
in two days will be 777 MiB + 52 MiB = 829 MiB instead of 1556 MiB.

Unfortunately, this "binary delta of repos" is not implemented yet :|
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Question about your git habits

2008-02-22 Thread J.C. Pizarro
2008/2/23, Chase Venters <[EMAIL PROTECTED]> wrote:
>
> ... blablabla
>
>  My question is: If you're working on multiple things at once, do you tend to
>  clone the entire repository repeatedly into a series of separate working
>  directories and do your work there, then pull that work (possibly comprising
>  a series of "temporary" commits) back into a separate local master
>  respository with --squash, either into "master" or into a branch containing
>  the new feature?
>
> ... blablabla
>
>  I'm using git to manage my project and I'm trying to determine the most
>  optimal workflow I can. I figure that I'm going to have an "official" master
>  repository for the project, and I want to keep the revision history clean in
>  that repository (ie, no messy intermediate commits that don't compile or only
>  implement a feature half way).

I recomend you to use these complementary tools

   1. google: gitk screenshots  ( e.g. http://lwn.net/Articles/140350/ )

   2. google: "git-gui" screenshots
 ( e.g. http://www.spearce.org/2007/01/git-gui-screenshots.html )

   3. google: gitweb color meld

   ;)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Improved idea, to use NR_CPUS task_migrators for SMPs.

2008-02-22 Thread J.C. Pizarro
On 2008/2/22, J.C. Pizarro <[EMAIL PROTECTED]>, i wrote:
>  For 
> comprension, unlocking
>  some lockers of the task_migrators and inmediately switching CPU to
>  migrators is similar to quick awakening of migration_thread.

I'm sorry, it's wrong, the correct is in reverse, locking when it wants to enter
and unlocking when it's exited.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Improved idea, to use NR_CPUS task_migrators for SMPs.

2008-02-22 Thread J.C. Pizarro
In kernel/sched.c appears:

static void sched_migrate_task(struct task_struct *p, int dest_cpu)
{
struct migration_req req;
unsigned long flags;
struct rq *rq;

rq = task_rq_lock(p, &flags);
if (!cpu_isset(dest_cpu, p->cpus_allowed)
|| unlikely(cpu_is_offline(dest_cpu)))
goto out;

/* force the process onto the specified CPU */
if (migrate_task(p, dest_cpu, &req)) {
/* Need to wait for migration thread (might exit: take ref). */
struct task_struct *mt = rq->migration_thread;

get_task_struct(mt);
task_rq_unlock(rq, &flags);  < comment #1
wake_up_process(mt); < comment #2
put_task_struct(mt);
wait_for_completion(&req.done); < comment #3
  < comment #4
return;
}
out:
task_rq_unlock(rq, &flags);
}



* comment #1: why unlock soon this insecure task when it's still incomplete
and preemptible? IMHO, one of the reasons is that this task needs be unlocked
to be manipulated by this code, otherwise it could be deadlocked. But it's not
a good silver bullet.

* comment #2: why wake slowing its migration thread? what is the matter
if the picked task is SUSPENDED, BLOCKED, SIGSTOPed, KILLED or ZOMBIED?
It can ocurr "not running as expected". It's not a good decision.

* comment #3: what ocurrs it if the waiting is eternal as said comment #2
or as be deadlocked preemptibly? Arggg!

I thought that it's better to implement NR_CPUS task_migrators as kernel's
threads-daemons with lockers's mechanisms. For comprension, unlocking
some lockers of the task_migrators and inmediately switching CPU to
migrators is similar to quick awakening of migration_thread.

The being-migrated task "doesn't must know the #cpu that this task is
being runned", and IMHO it's little complicated when the task has
lockers, signals, etc
that depends from #cpu (lockers and more things need be altered when they
are migrated, overall their identifiers).

* comment #4: when sleep the migration thread?

   ;)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/