process hangs on do_exit when oom happens

2012-10-17 Thread gaoqiang

I looked up nothing useful with google,so I'm here for help..

when this happens:  I use memcg to limit the memory use of a process,and  
when the memcg cgroup was out of memory,
the process was oom-killed   however,it cannot really complete the  
exiting. here is the some information



OS version:  centos6.22.6.32.220.7.1

/proc/pid/stack
---

[] __cond_resched+0x2a/0x40
[] unmap_vmas+0xb49/0xb70
[] exit_mmap+0x7e/0x140
[] mmput+0x58/0x110
[] exit_mm+0x11d/0x160
[] do_exit+0x1ad/0x860
[] do_group_exit+0x41/0xb0
[] get_signal_to_deliver+0x1e8/0x430
[] do_notify_resume+0xf4/0x8b0
[] int_signal+0x12/0x17
[] 0x

/proc/pid/stat
---

11337 (CF_user_based) R 1 11314 11314 0 -1 4203524 7753602 0 0 0 622 1806  
0 0 -2 0 1 0 324381340 0 0 18446744073709551615 0 0 0 0 0 0 0 0 66784 0 0  
0 17 3 1 1 0 0 0


/proc/pid/status

Name:   CF_user_based
State:  R (running)
Tgid:   11337
Pid:11337
PPid:   1
TracerPid:  0
Uid:32114   32114   32114   32114
Gid:32114   32114   32114   32114
Utrace: 0
FDSize: 128
Groups: 32114
Threads:1
SigQ:   2/2325005
SigPnd: 
ShdPnd: 4100
SigBlk: 
SigIgn: 
SigCgt: 0001800104e0
CapInh: 
CapPrm: 
CapEff: 
CapBnd: 
Cpus_allowed:   
Cpus_allowed_list:  0-31
Mems_allowed:   ,0003
Mems_allowed_list:  0-1
voluntary_ctxt_switches:4300
nonvoluntary_ctxt_switches: 77

/var/log/messages
---

Oct 17 15:22:19 hpc16 kernel: CF_user_based invoked oom-killer:  
gfp_mask=0x280da, order=0, oom_adj=0, oom_score_adj=0

Oct 17 15:22:19 hpc16 kernel: CF_user_based cpuset=/ mems_allowed=0-1
Oct 17 15:22:19 hpc16 kernel: Pid: 3909, comm: CF_user_based Not tainted  
2.6.32-2.0.0.1 #4

Oct 17 15:22:19 hpc16 kernel: Call Trace:
Oct 17 15:22:19 hpc16 kernel: [] ? dump_header+0x85/0x1a0
Oct 17 15:22:19 hpc16 kernel: [] ?  
oom_kill_process+0x25e/0x2a0
Oct 17 15:22:19 hpc16 kernel: [] ?  
select_bad_process+0xce/0x110
Oct 17 15:22:19 hpc16 kernel: [] ?  
out_of_memory+0x1a8/0x390
Oct 17 15:22:19 hpc16 kernel: [] ?  
__alloc_pages_nodemask+0x73a/0x750
Oct 17 15:22:19 hpc16 kernel: [] ?  
__mem_cgroup_commit_charge+0x45/0x90
Oct 17 15:22:19 hpc16 kernel: [] ?  
alloc_pages_vma+0x9a/0x190
Oct 17 15:22:19 hpc16 kernel: [] ?  
handle_pte_fault+0x4cc/0xa90
Oct 17 15:22:19 hpc16 kernel: [] ?  
alloc_pages_current+0xab/0x110
Oct 17 15:22:19 hpc16 kernel: [] ?  
invalidate_interrupt5+0xe/0x20
Oct 17 15:22:19 hpc16 kernel: [] ?  
handle_mm_fault+0x12a/0x1b0
Oct 17 15:22:19 hpc16 kernel: [] ?  
do_page_fault+0x199/0x550
Oct 17 15:22:19 hpc16 kernel: [] ?  
call_rwsem_wake+0x18/0x30
Oct 17 15:22:19 hpc16 kernel: [] ?  
invalidate_interrupt5+0xe/0x20

Oct 17 15:22:19 hpc16 kernel: [] ? page_fault+0x25/0x30
Oct 17 15:22:19 hpc16 kernel: Mem-Info:
Oct 17 15:22:19 hpc16 kernel: Node 0 Normal per-cpu:
Oct 17 15:22:19 hpc16 kernel: CPU0: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU1: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU2: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU3: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU4: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU5: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU6: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU7: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU8: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU9: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU   10: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU   11: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU   12: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU   13: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU   14: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU   15: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU   16: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU   17: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU   18: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU   19: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU   20: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU   21: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU   22: hi:  186, btch:  31 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU   23: hi:  186, btch:  31 usd:  18
Oct 17 15:22:19 hpc16 kernel: Node 1 DMA per-cpu:
Oct 17 15:22:19 hpc16 kernel: CPU0: hi:0, btch:   1 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU1: hi:0, btch:   1 usd:   0
Oct 17 15:22:19 hpc16 kernel: CPU2: hi

Question about page_size of x86_64

2012-08-21 Thread gaoqiang

I found that my machine has a feature 'page size extension=true '
with 'cpuid' command  but I don't know now to use it with linux..
anyone can give some help ?



--
使用 Opera 革命性的电子邮件客户程序: http://www.opera.com/mail/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


compile a kernel for kgdb with less "optimized out"

2012-08-09 Thread gaoqiang
I think those who use kgdb must hate this sentence  "optimized out". I  
have tried many times to build a kernel with less optimization but failed.  
today,I found a trick method ,just get rid of the -O2 and -Os on the top  
level of the kernel source and add -O2  for the  arch/x86 directory,it  
works !.I saw much "optimized out" now.


I build the kernel of centos6, with kernel 2.6.32. I don't know whether it  
works with kernel of other versions ,but it worth a try.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: question about IO-sched

2012-07-19 Thread gaoqiang

thanks very much.

在 Wed, 18 Jul 2012 14:51:09 +0800,Corrado Zoccolo   
写道:



On Sun, Jul 15, 2012 at 9:08 AM, gaoqiang  wrote:


many thanks. but why the sys_read operation hangs on sync_page ? there  
are

still
many free memory.I mean ,the actually free memory,excluding the various
kinds of
caches or buffers.

http://kerneltrap.org/node/4941 explains sync_page:


->sync_page() is an awful misnomer. Usually, when page IO operation is
requested by calling ->writepage() or ->readpage(), file-system queues
IO request (e.g., disk-based file system may do this my calling
submit_bio()), but underlying device driver does not proceed with this
IO immediately, because IO scheduling is more efficient when there are
multiple requests in the queue.
Only when something really wants to wait for IO completion
(wait_on_page_{locked,writeback}() are used to wait for read and write
completion respectively) IO queue is processed. To do this
wait_on_page_bit() calls ->sync_page() (see block_sync_page()---standard
implementation of ->sync_page() for disk-based file systems).
So, semantics of ->sync_page() are roughly "kick underlying storage
driver to actually perform all IO queued for this page, and, maybe, for
other pages on this device too".


It is expected that sys_read will wait until the data is available for
the process.
If you don't want to wait (because you can do other stuff in the mean
time, including queuing other I/O operations), you can use aio_read.
The kernel will notify your process when the operation completes and
the data is available in memory.

Thanks,
Corrado




在 Fri, 13 Jul 2012 22:15:31 +0800,Corrado Zoccolo  
 写道:



Hi,
the catch is that writes are "fire and forget", so they keep  
accumulating
in the I/O sched, and there is always plenty of them to schedule  
(unless

you explicitly make sync writes).

The reader, instead, waits for the result of each read operation before
scheduling a new read, so there is at most one outstanding read, and  
some

time nothing.

The deadline scheduler is work conserving, meaning that it never leaves
the
disk idle when there is work queued, and most of the time after an
operation completes, there is only write work queued, so you see much
more
writes being sent to the device.

Only schedulers that delay writes waiting for reads (as Anticipatory in
old
kernels, and now CFQ) can achieve higher read to write ratios.

Cheers
Corrado

On Thu, Jul 12, 2012 at 11:01 AM, gaoqiang 
wrote:


Hi,all

I have long known that deadline is read-prefered. but a simple
test gives the opposite result.

with two processes running at the same time,one for read and  
one

for write.actually,they did nothing bug IO operation.
while(true)
{
read();
}
the other:
while(true)
{
write();
}

with deadline IO-sched  and ext4 filesystem.as a result, read
ratio was about below 3M/s.and write about 100M/s. I have tested both
kernel-2.6.18 and kernel-2.6.32,getting the same result.

I add some debug information in the kernel and recompile,found
that,it has little to do with IO-sched layer because read request
dropped
into deadline was 5% of write request .from /proc//stack,the read
process hands on sync_page most of the time.
what is the matter ? anyone help me ?
--
To unsubscribe from this list: send the line "unsubscribe  
linux-kernel"

in
the body of a message to majord...@vger.kernel.org
More majordomo info at
http://vger.kernel.org/**majordomo-info.html<http://vger.kernel.org/majordomo-info.html>

Please read the FAQ at  http://www.tux.org/lkml/








--
使用 Opera 革命性的电子邮件客户程序: http://www.opera.com/mail/





--
__

dott. Corrado Zoccolo  mailto:czocc...@gmail.com
PhD - Department of Computer Science - University of Pisa, Italy
--
The self-confidence of a warrior is not the self-confidence of the  
average
man. The average man seeks certainty in the eyes of the onlooker and  
calls

that self-confidence. The warrior seeks impeccability in his own eyes and
calls that humbleness.
   Tales of Power - C. Castaneda



--
使用 Opera 革命性的电子邮件客户程序: http://www.opera.com/mail/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: question about IO-sched

2012-07-15 Thread gaoqiang
many thanks. but why the sys_read operation hangs on sync_page ? there are  
still
many free memory.I mean ,the actually free memory,excluding the various  
kinds of

caches or buffers.


在 Fri, 13 Jul 2012 22:15:31 +0800,Corrado Zoccolo   
写道:



Hi,
the catch is that writes are "fire and forget", so they keep accumulating
in the I/O sched, and there is always plenty of them to schedule (unless
you explicitly make sync writes).

The reader, instead, waits for the result of each read operation before
scheduling a new read, so there is at most one outstanding read, and some
time nothing.

The deadline scheduler is work conserving, meaning that it never leaves  
the

disk idle when there is work queued, and most of the time after an
operation completes, there is only write work queued, so you see much  
more

writes being sent to the device.

Only schedulers that delay writes waiting for reads (as Anticipatory in  
old

kernels, and now CFQ) can achieve higher read to write ratios.

Cheers
Corrado

On Thu, Jul 12, 2012 at 11:01 AM, gaoqiang   
wrote:



Hi,all

I have long known that deadline is read-prefered. but a simple
test gives the opposite result.

with two processes running at the same time,one for read and one
for write.actually,they did nothing bug IO operation.
while(true)
{
read();
}
the other:
while(true)
{
write();
}

with deadline IO-sched  and ext4 filesystem.as a result, read
ratio was about below 3M/s.and write about 100M/s. I have tested both
kernel-2.6.18 and kernel-2.6.32,getting the same result.

I add some debug information in the kernel and recompile,found
that,it has little to do with IO-sched layer because read request  
dropped

into deadline was 5% of write request .from /proc//stack,the read
process hands on sync_page most of the time.
what is the matter ? anyone help me ?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel"  
in

the body of a message to majord...@vger.kernel.org
More majordomo info at   
http://vger.kernel.org/**majordomo-info.html<http://vger.kernel.org/majordomo-info.html>

Please read the FAQ at  http://www.tux.org/lkml/








--
使用 Opera 革命性的电子邮件客户程序: http://www.opera.com/mail/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


question about IO-sched

2012-07-12 Thread gaoqiang

Hi,all

	I have long known that deadline is read-prefered. but a simple test gives  
the opposite result.


	with two processes running at the same time,one for read and one for  
write.actually,they did nothing bug IO operation.

while(true)
{
read();
}
the other:
while(true)
{
write();
}

	with deadline IO-sched  and ext4 filesystem.as a result, read ratio was  
about below 3M/s.and write about 100M/s. I have tested both kernel-2.6.18  
and kernel-2.6.32,getting the same result.


	I add some debug information in the kernel and recompile,found that,it  
has little to do with IO-sched layer because read request dropped into  
deadline was 5% of write request .from /proc//stack,the read process  
hands on sync_page most of the time.

what is the matter ? anyone help me ?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/