from:"Denys Fedoryshchenko"

Re: KASAN, xt_TCPMSS finally found nasty use-after-free bug? 4.10.8

2017-04-03 Thread Denys Fedoryshchenko


On 2017-04-03 15:09, Eric Dumazet wrote:

On Mon, 2017-04-03 at 11:10 +0300, Denys Fedoryshchenko wrote:


I modified patch a little as:
if (th->doff * 4 < sizeof(_tcph)) {
  par->hotdrop = true;
  WARN_ON_ONCE(!tcpinfo->option);
  return false;
}

And it did triggered WARN once at morning, and didn't hit KASAN. I 
will

run for a while more, to see if it is ok, and then if stable, will try
to enable SFQ again.


Excellent news !
We will post an official fix today, thanks a lot for this detective 
work

Denys.

I am not sure it is finally fixed, maybe we need test more?
I'm doing extensive tests today with identical configuration (i had to 
run fifo, because customer cannot afford anymore outages). I've dded sfq 
now different way, and identical config i will run after 3 hours approx.

Re: KASAN, xt_TCPMSS finally found nasty use-after-free bug? 4.10.8

2017-04-03 Thread Denys Fedoryshchenko


On 2017-04-02 20:26, Eric Dumazet wrote:

On Sun, 2017-04-02 at 10:14 -0700, Eric Dumazet wrote:


Could that be that netfilter does not abort earlier if TCP header is
completely wrong ?



Yes, I wonder if this patch would be better, unless we replicate the
th->doff sanity check in all netfilter modules dissecting TCP frames.

diff --git a/net/netfilter/xt_tcpudp.c b/net/netfilter/xt_tcpudp.c
index
ade024c90f4f129a7c384e9e1cbfdb8ffe73065f..8cb4eadd5ba1c20e74bc27ee52a0bc36a5b26725
100644
--- a/net/netfilter/xt_tcpudp.c
+++ b/net/netfilter/xt_tcpudp.c
@@ -103,11 +103,11 @@ static bool tcp_mt(const struct sk_buff *skb,
struct xt_action_param *par)
if (!NF_INVF(tcpinfo, XT_TCP_INV_FLAGS,
 		 (((unsigned char *)th)[13] & tcpinfo->flg_mask) == 
tcpinfo->flg_cmp))

return false;
+   if (th->doff * 4 < sizeof(_tcph)) {
+   par->hotdrop = true;
+   return false;
+   }
if (tcpinfo->option) {
-   if (th->doff * 4 < sizeof(_tcph)) {
-   par->hotdrop = true;
-   return false;
-   }
if (!tcp_find_option(tcpinfo->option, skb, par->thoff,
 th->doff*4 - sizeof(_tcph),
 tcpinfo->invflags & XT_TCP_INV_OPTION,

I modified patch a little as:
if (th->doff * 4 < sizeof(_tcph)) {
 par->hotdrop = true;
 WARN_ON_ONCE(!tcpinfo->option);
 return false;
}

And it did triggered WARN once at morning, and didn't hit KASAN. I will 
run for a while more, to see if it is ok, and then if stable, will try 
to enable SFQ again.

Re: KASAN, xt_TCPMSS finally found nasty use-after-free bug? 4.10.8

2017-04-02 Thread Denys Fedoryshchenko


On 2017-04-02 15:32, Eric Dumazet wrote:

On Sun, 2017-04-02 at 15:25 +0300, Denys Fedoryshchenko wrote:

> */
I will add also WARN_ON_ONCE(tcp_hdrlen >= 15 * 4) before, for
curiosity, if this condition are triggered. Is it fine like that?


Sure.


It didnt triggered WARN_ON, and with both patches here is one more 
KASAN.
What i noticed also after this KASAN, there is many others start to 
trigger in TCPMSS and locking up server by flood.

There is heavy netlink activity, it is pppoe server with lot of shapers.
I noticed there left sfq by mistake, usually i am removing it, because 
it may trigger kernel panic too (and hard to trace reason).

I will try with pfifo instead, after 6 hours.

Here is full log with others: https://nuclearcat.com/kasan.txt


[ 2033.914478] 
==
[ 2033.914855] BUG: KASAN: slab-out-of-bounds in tcpmss_tg4+0x6cc/0xee4 
[xt_TCPMSS] at addr 8802bfe18140

[ 2033.915218] Read of size 1 by task swapper/1/0
[ 2033.915437] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 
4.10.8-build-0136-debug #7
[ 2033.915787] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS P80 
04/02/2015

[ 2033.916010] Call Trace:
[ 2033.916229]  
[ 2033.916449]  dump_stack+0x99/0xd4
[ 2033.916662]  ? _atomic_dec_and_lock+0x15d/0x15d
[ 2033.916886]  ? tcpmss_tg4+0x6cc/0xee4 [xt_TCPMSS]
[ 2033.917110]  kasan_object_err+0x21/0x81
[ 2033.917335]  kasan_report+0x527/0x69d
[ 2033.917557]  ? tcpmss_tg4+0x6cc/0xee4 [xt_TCPMSS]
[ 2033.917772]  __asan_report_load1_noabort+0x19/0x1b
[ 2033.917995]  tcpmss_tg4+0x6cc/0xee4 [xt_TCPMSS]
[ 2033.918222]  ? tcpmss_tg4_check+0x287/0x287 [xt_TCPMSS]
[ 2033.918451]  ? udp_mt+0x45a/0x45a [xt_tcpudp]
[ 2033.918669]  ? __fib_validate_source+0x46b/0xcd1
[ 2033.918895]  ipt_do_table+0x1432/0x1573 [ip_tables]
[ 2033.919114]  ? ip_tables_net_init+0x15/0x15 [ip_tables]
[ 2033.919338]  ? ip_route_input_slow+0xe9f/0x17e3
[ 2033.919562]  ? rt_set_nexthop+0x9a7/0x9a7
[ 2033.919790]  ? ip_tables_net_exit+0xe/0x15 [ip_tables]
[ 2033.920008]  ? tcf_action_exec+0x14a/0x18c
[ 2033.920227]  ? iptable_mangle_net_exit+0x92/0x92 [iptable_mangle]
[ 2033.920451]  ? iptable_filter_net_exit+0x92/0x92 [iptable_filter]
[ 2033.920667]  iptable_filter_hook+0xc0/0x1c8 [iptable_filter]
[ 2033.920882]  nf_hook_slow+0x7d/0x121
[ 2033.921105]  ip_forward+0x1183/0x11c6
[ 2033.921321]  ? ip_forward_finish+0x168/0x168
[ 2033.921542]  ? ip_frag_mem+0x43/0x43
[ 2033.921755]  ? iptable_nat_net_exit+0x92/0x92 [iptable_nat]
[ 2033.921981]  ? nf_nat_ipv4_in+0xf0/0x209 [nf_nat_ipv4]
[ 2033.922199]  ip_rcv_finish+0xf4c/0xf5b
[ 2033.922420]  ip_rcv+0xb41/0xb72
[ 2033.922635]  ? ip_local_deliver+0x282/0x282
[ 2033.922847]  ? ip_local_deliver_finish+0x6e6/0x6e6
[ 2033.923073]  ? ip_local_deliver+0x282/0x282
[ 2033.923291]  __netif_receive_skb_core+0x1b27/0x21bf
[ 2033.923510]  ? netdev_rx_handler_register+0x1a6/0x1a6
[ 2033.923736]  ? kasan_slab_free+0x137/0x154
[ 2033.923954]  ? save_stack_trace+0x1b/0x1d
[ 2033.924170]  ? kasan_slab_free+0xaa/0x154
[ 2033.924387]  ? net_rx_action+0x6ad/0x6dc
[ 2033.924611]  ? __do_softirq+0x22b/0x5df
[ 2033.924826]  ? irq_exit+0x8a/0xfe
[ 2033.925048]  ? do_IRQ+0x13d/0x155
[ 2033.925269]  ? common_interrupt+0x83/0x83
[ 2033.925483]  ? mwait_idle+0x15a/0x30d
[ 2033.925704]  ? napi_gro_flush+0x1d0/0x1d0
[ 2033.925928]  ? start_secondary+0x2cc/0x2d5
[ 2033.926142]  ? start_cpu+0x14/0x14
[ 2033.926354]  __netif_receive_skb+0x5e/0x191
[ 2033.926576]  process_backlog+0x295/0x573
[ 2033.926799]  ? __netif_receive_skb+0x191/0x191
[ 2033.927022]  napi_poll+0x311/0x745
[ 2033.927245]  ? napi_complete_done+0x3b4/0x3b4
[ 2033.927460]  ? igb_msix_ring+0x2d/0x35
[ 2033.927679]  net_rx_action+0x2e8/0x6dc
[ 2033.927903]  ? napi_poll+0x745/0x745
[ 2033.928133]  ? sched_clock_cpu+0x1f/0x18c
[ 2033.928360]  ? rps_trigger_softirq+0x181/0x1e4
[ 2033.928592]  ? __tick_nohz_idle_enter+0x465/0xa6d
[ 2033.928817]  ? rps_may_expire_flow+0x29b/0x29b
[ 2033.929038]  ? irq_work_run+0x2c/0x2e
[ 2033.929253]  __do_softirq+0x22b/0x5df
[ 2033.929464]  ? smp_call_function_single_async+0x17d/0x17d
[ 2033.929680]  irq_exit+0x8a/0xfe
[ 2033.929905]  smp_call_function_single_interrupt+0x8d/0x90
[ 2033.930136]  call_function_single_interrupt+0x83/0x90
[ 2033.930365] RIP: 0010:mwait_idle+0x15a/0x30d
[ 2033.930581] RSP: 0018:8802d1017e78 EFLAGS: 0246 ORIG_RAX: 
ff04
[ 2033.930934] RAX:  RBX: 8802d1000c80 RCX: 

[ 2033.931160] RDX: 11005a200190 RSI:  RDI: 

[ 2033.931383] RBP: 8802d1017e98 R08: ed00583c4fc1 R09: 
0080
[ 2033.931596] R10: 8802d1017d80 R11: ed00583c4fc1 R12: 
0001
[ 2033.931808] R13:  R14: 8802d1000c80 R15: 
dc00

[ 2033.932031]  
[ 2033.932247]  arch_cpu_idle+0xf/0x11
[ 2033.932472]  default_idle_call+0x59/0x5c
[ 2033.932686]  do_idle+0x11c/0x217
[ 2033.932906]  cpu_startup_entry+0x1

Re: KASAN, xt_TCPMSS finally found nasty use-after-free bug? 4.10.8

2017-04-02 Thread Denys Fedoryshchenko


On 2017-04-02 15:19, Eric Dumazet wrote:

On Sun, 2017-04-02 at 04:54 -0700, Eric Dumazet wrote:

On Sun, 2017-04-02 at 13:45 +0200, Florian Westphal wrote:
> Eric Dumazet  wrote:
> > - for (i = sizeof(struct tcphdr); i <= tcp_hdrlen - TCPOLEN_MSS; i += 
optlen(opt, i)) {
> > + for (i = sizeof(struct tcphdr); i < tcp_hdrlen - TCPOLEN_MSS; i += 
optlen(opt, i)) {
> >   if (opt[i] == TCPOPT_MSS && opt[i+1] == TCPOLEN_MSS) {
> >   u_int16_t oldmss;
>
> maybe I am low on caffeeine but this looks fine, for tcp header with
> only tcpmss this boils down to "20 <= 24 - 4" so we acccess offsets 20-23 
which seems ok.

I am definitely low on caffeine ;)

An issue in this function is that we might add the missing MSS option,
without checking that TCP options are already full.

But this should not cause a KASAN splat, only some malformed TCP 
packet


(tcph->doff would wrap)


Something like that maybe.

diff --git a/net/netfilter/xt_TCPMSS.c b/net/netfilter/xt_TCPMSS.c
index
27241a767f17b4b27d24095a31e5e9a2d3e29ce4..1465aaf0e3a15d69d105d0a50b0429b11b6439d3
100644
--- a/net/netfilter/xt_TCPMSS.c
+++ b/net/netfilter/xt_TCPMSS.c
@@ -151,7 +151,9 @@ tcpmss_mangle_packet(struct sk_buff *skb,
 */
if (len > tcp_hdrlen)
return 0;
-
+   /* tcph->doff is 4 bits wide, do not wrap its value to 0 */
+   if (tcp_hdrlen >= 15 * 4)
+   return 0;
/*
 * MSS Option not found ?! add it..
 */
I will add also WARN_ON_ONCE(tcp_hdrlen >= 15 * 4) before, for 
curiosity, if this condition are triggered. Is it fine like that?

Re: KASAN, xt_TCPMSS finally found nasty use-after-free bug? 4.10.8

2017-04-02 Thread Denys Fedoryshchenko


On 2017-04-02 14:45, Florian Westphal wrote:

Eric Dumazet  wrote:
-	for (i = sizeof(struct tcphdr); i <= tcp_hdrlen - TCPOLEN_MSS; i += 
optlen(opt, i)) {
+	for (i = sizeof(struct tcphdr); i < tcp_hdrlen - TCPOLEN_MSS; i += 
optlen(opt, i)) {

if (opt[i] == TCPOPT_MSS && opt[i+1] == TCPOLEN_MSS) {
u_int16_t oldmss;


maybe I am low on caffeeine but this looks fine, for tcp header with
only tcpmss this boils down to "20 <= 24 - 4" so we acccess offsets
20-23 which seems ok.
It seems some non-standard(or corrupted) packets are passing, because 
even on ~1G server it might cause corruption once per several days, 
KASAN seems need less time to trigger.


I am not aware how things working, but:
[25181.875696] Memory state around the buggy address:
[25181.875919]  8802975fff80: 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00
[25181.876275]  88029760: 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00
[25181.876628] >880297600080: 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00
[25181.876984]
^
[25181.877203]  880297600100: 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00
[25181.877569]  880297600180: 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00

Why all data here is zero? I guess it should be some packet data?

KASAN, xt_TCPMSS finally found nasty use-after-free bug? 4.10.8

2017-04-02 Thread Denys Fedoryshchenko


Repost, due being sleepy missed few important points.

I am searching reasons of crashes for multiple conntrack enabled 
servers, usually they point to conntrack, but i suspect use after free 
might be somewhere else,

so i tried to enable KASAN.
And seems i got something after few hours, and it looks related to all 
crashes, because on all that servers who rebooted i had MSS adjustment 
(--clamp-mss-to-pmtu or --set-mss).

Please let me know if any additional information needed.

[25181.855611] 
==
[25181.855985] BUG: KASAN: use-after-free in tcpmss_tg4+0x682/0xe9c 
[xt_TCPMSS] at addr 8802976000ea

[25181.856344] Read of size 1 by task swapper/1/0
[25181.856555] page:ea000a5d8000 count:0 mapcount:0 mapping: 
 (null) index:0x0

[25181.856909] flags: 0x1000()
[25181.857123] raw: 1000   

[25181.857630] raw: ea000b0444a0 ea000a0b1f60  


[25181.857996] page dumped because: kasan: bad access detected
[25181.858214] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 
4.10.8-build-0133-debug #3
[25181.858571] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS P80 
04/02/2015

[25181.858786] Call Trace:
[25181.859000]  
[25181.859215]  dump_stack+0x99/0xd4
[25181.859423]  ? _atomic_dec_and_lock+0x15d/0x15d
[25181.859644]  ? __dump_page+0x447/0x4e3
[25181.859859]  ? tcpmss_tg4+0x682/0xe9c [xt_TCPMSS]
[25181.860080]  kasan_report+0x577/0x69d
[25181.860291]  ? __ip_route_output_key_hash+0x14ce/0x1503
[25181.860512]  ? tcpmss_tg4+0x682/0xe9c [xt_TCPMSS]
[25181.860736]  __asan_report_load1_noabort+0x19/0x1b
[25181.860956]  tcpmss_tg4+0x682/0xe9c [xt_TCPMSS]
[25181.861180]  ? tcpmss_tg4_check+0x287/0x287 [xt_TCPMSS]
[25181.861407]  ? udp_mt+0x45a/0x45a [xt_tcpudp]
[25181.861634]  ? __fib_validate_source+0x46b/0xcd1
[25181.861860]  ipt_do_table+0x1432/0x1573 [ip_tables]
[25181.862088]  ? igb_msix_ring+0x2d/0x35
[25181.862318]  ? ip_tables_net_init+0x15/0x15 [ip_tables]
[25181.862537]  ? ip_route_input_slow+0xe9f/0x17e3
[25181.862759]  ? handle_irq_event_percpu+0x141/0x141
[25181.862985]  ? rt_set_nexthop+0x9a7/0x9a7
[25181.863203]  ? ip_tables_net_exit+0xe/0x15 [ip_tables]
[25181.863419]  ? tcf_action_exec+0xce/0x18c
[25181.863628]  ? iptable_mangle_net_exit+0x92/0x92 [iptable_mangle]
[25181.863856]  ? iptable_filter_net_exit+0x92/0x92 [iptable_filter]
[25181.864084]  iptable_filter_hook+0xc0/0x1c8 [iptable_filter]
[25181.864311]  nf_hook_slow+0x7d/0x121
[25181.864536]  ip_forward+0x1183/0x11c6
[25181.864752]  ? ip_forward_finish+0x168/0x168
[25181.864967]  ? ip_frag_mem+0x43/0x43
[25181.865194]  ? iptable_nat_net_exit+0x92/0x92 [iptable_nat]
[25181.865423]  ? nf_nat_ipv4_in+0xf0/0x209 [nf_nat_ipv4]
[25181.865648]  ip_rcv_finish+0xf4c/0xf5b
[25181.865861]  ip_rcv+0xb41/0xb72
[25181.866086]  ? ip_local_deliver+0x282/0x282
[25181.866308]  ? ip_local_deliver_finish+0x6e6/0x6e6
[25181.866524]  ? ip_local_deliver+0x282/0x282
[25181.866752]  __netif_receive_skb_core+0x1b27/0x21bf
[25181.866971]  ? netdev_rx_handler_register+0x1a6/0x1a6
[25181.867186]  ? enqueue_hrtimer+0x232/0x240
[25181.867401]  ? hrtimer_start_range_ns+0xd1c/0xd4b
[25181.867630]  ? __ppp_xmit_process+0x101f/0x104e [ppp_generic]
[25181.867852]  ? hrtimer_cancel+0x20/0x20
[25181.868081]  ? ppp_push+0x1402/0x1402 [ppp_generic]
[25181.868301]  ? __pskb_pull_tail+0xb0f/0xb25
[25181.868523]  ? ppp_xmit_process+0x47/0xaf [ppp_generic]
[25181.868749]  __netif_receive_skb+0x5e/0x191
[25181.868968]  process_backlog+0x295/0x573
[25181.869180]  ? __netif_receive_skb+0x191/0x191
[25181.869401]  napi_poll+0x311/0x745
[25181.869611]  ? napi_complete_done+0x3b4/0x3b4
[25181.869836]  ? __qdisc_run+0x4ec/0xb7f
[25181.870061]  ? sch_direct_xmit+0x60b/0x60b
[25181.870286]  net_rx_action+0x2e8/0x6dc
[25181.870512]  ? napi_poll+0x745/0x745
[25181.870732]  ? rps_trigger_softirq+0x181/0x1e4
[25181.870956]  ? rps_may_expire_flow+0x29b/0x29b
[25181.871184]  ? irq_work_run+0x2c/0x2e
[25181.871411]  __do_softirq+0x22b/0x5df
[25181.871629]  ? smp_call_function_single_async+0x17d/0x17d
[25181.871854]  irq_exit+0x8a/0xfe
[25181.872069]  smp_call_function_single_interrupt+0x8d/0x90
[25181.872297]  call_function_single_interrupt+0x83/0x90
[25181.872519] RIP: 0010:mwait_idle+0x15a/0x30d
[25181.872733] RSP: 0018:8802d1017e78 EFLAGS: 0246 ORIG_RAX: 
ff04
[25181.873091] RAX:  RBX: 8802d1000c80 RCX: 

[25181.873311] RDX: 11005a200190 RSI:  RDI: 

[25181.873532] RBP: 8802d1017e98 R08: 003f R09: 
7f75f7fff700
[25181.873751] R10: 8802d1017d80 R11: 8802c9b0 R12: 
0001
[25181.873971] R13:  R14: 8802d1000c80 R15: 
dc00

[25181.874182]  
[25181.874393]  arch_cpu_idle+0xf/0x11
[25181.874602]  default_idle_call+0x59/0x5c
[25181.874818]  do_idle+0x11c/0x217
[2

Re: probably serious conntrack/netfilter panic, 4.8.14, timers and intel turbo

2017-03-31 Thread Denys Fedoryshchenko

I am not sure if it is same issue, but panics still happen, but much 
less. Same server, nat.
I will upgrade to latest 4.10.x build, because for this one i dont have 
files anymore (for symbols and etc).


 [864288.511464] Modules linked in: nf_conntrack_netlink nf_nat_pptp 
nf_nat_proto_gre xt_TCPMSS xt_connmark ipt_MASQUERADE 
nf_nat_masquerade_ipv4 xt_nat xt_rateest xt_RATEEST nf_conntrack_pptp 
nf_conntrack_proto_gre xt_CT xt_set xt_hl xt_tcpudp ip_set_hash_net 
ip_set nfnetlink iptable_raw iptable_mangle iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 
iptable_filter ip_tables x_tables netconsole configfs 8021q garp mrp stp 
llc bonding ixgbe dca
 [864288.512740] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 
4.10.1-build-0132 #2
 [864288.513005] Hardware name: Intel Corporation S2600WTT/S2600WTT, 
BIOS SE5C610.86B.01.01.0019.101220160604 10/12/2016

 [864288.513454] task: 881038cb6000 task.stack: c9000c678000
 [864288.513719] RIP: 0010:nf_nat_cleanup_conntrack+0xe2/0x1bc [nf_nat]
 [864288.513980] RSP: 0018:88103fc43ba0 EFLAGS: 00010206
 [864288.514237] RAX: 140504021ad8 RBX: 881004021ad8 RCX: 
0100
 [864288.514677] RDX: 140504021ad8 RSI: 88103279628c RDI: 
88103279628c
 [864288.515117] RBP: 88103fc43be0 R08: c9003b47b558 R09: 
0004
 [864288.515558] R10: 8820083d00ce R11: 881038480b00 R12: 
881004021a40
 [864288.515998] R13:  R14: a00d406e R15: 
c90036e11000
 [864288.516438] FS:  () GS:88103fc4() 
knlGS:

 [864288.516882] CS:  0010 DS:  ES:  CR0: 80050033
 [864288.517142] CR2: 7fbfc303f978 CR3: 00202267c000 CR4: 
001406e0

 [864288.517580] Call Trace:
 [864288.517831]  
 [864288.518090]  __nf_ct_ext_destroy+0x3f/0x57 [nf_conntrack]
 [864288.518352]  nf_conntrack_free+0x25/0x55 [nf_conntrack]
 [864288.518615]  destroy_conntrack+0x80/0x8c [nf_conntrack]
 [864288.518880]  nf_conntrack_destroy+0x19/0x1b
 [864288.519137]  nf_ct_gc_expired+0x6e/0x71 [nf_conntrack]
 [864288.519400]  __nf_conntrack_find_get+0x89/0x2ab [nf_conntrack]
 [864288.519663]  nf_conntrack_in+0x1ec/0x877 [nf_conntrack]
 [864288.519925]  ipv4_conntrack_in+0x1c/0x1e [nf_conntrack_ipv4]
 [864288.520185]  nf_hook_slow+0x2a/0x9a
 [864288.520439]  ip_rcv+0x318/0x337
 [864288.520692]  ? ip_local_deliver_finish+0x1ba/0x1ba
 [864288.520953]  __netif_receive_skb_core+0x607/0x852
 [864288.521213]  ? kmem_cache_free_bulk+0x232/0x274
 [864288.521471]  __netif_receive_skb+0x18/0x5a
 [864288.521727]  process_backlog+0x90/0x113
 [864288.521981]  net_rx_action+0x114/0x2dc
 [864288.522238]  ? sched_clock_cpu+0x15/0x94
 [864288.522496]  __do_softirq+0xe7/0x259
 [864288.522753]  irq_exit+0x52/0x93
 [864288.523006]  smp_call_function_single_interrupt+0x33/0x35
 [864288.523267]  call_function_single_interrupt+0x83/0x90
 [864288.523531] RIP: 0010:mwait_idle+0x9e/0x125
 [864288.523786] RSP: 0018:c9000c67beb0 EFLAGS: 0246 ORIG_RAX: 
ff04
 [864288.524229] RAX:  RBX: 881038cb6000 RCX: 

 [864288.524669] RDX:  RSI:  RDI: 

 [864288.525110] RBP: c9000c67bec0 R08: 0001 R09: 

 [864288.525551] R10: c9000c67be50 R11:  R12: 
0011
 [864288.525991] R13:  R14: 881038cb6000 R15: 
881038cb6000

 [864288.526429]  
 [864288.526682]  arch_cpu_idle+0xf/0x11
 [864288.526937]  default_idle_call+0x25/0x27
 [864288.527193]  do_idle+0xb6/0x15d
 [864288.527446]  cpu_startup_entry+0x1f/0x21
 [864288.527702]  start_secondary+0xe8/0xeb
 [864288.527961]  start_cpu+0x14/0x14
 [864288.528212] Code: 48 89 f7 48 89 75 c8 e8 6e e8 8f e1 8b 45 c4 48 
8b 75 c8 48 83 c0 08 4d 8d 04 c7 49 8b 04 c7 a8 01 75 46 48 39 c3 74 1e 
48 89 c2 <48> 8b 7a 08 48 85 ff 0f 84 b3 00 00 00 48 39 fb 0f 84 9e 00 
00
 [864288.528905] RIP: nf_nat_cleanup_conntrack+0xe2/0x1bc [nf_nat] RSP: 
88103fc43ba0

 [864288.529362] ---[ end trace e3c40a5e4bf43e26 ]---
 [864288.567835] Kernel panic - not syncing: Fatal exception in 
interrupt

 [864288.568122] Kernel Offset: disabled
 [864288.587619] Rebooting in 5 seconds..

Re: kexec on panic

2017-02-18 Thread Denys Fedoryshchenko


On 2017-02-18 09:42, Jon Masters wrote:

Hi Denys,

On 02/10/2017 03:14 AM, Denys Fedoryshchenko wrote:

After years of using kexec and recent unpleasant experience with 
modern (supposed to be blazing fast to boot) hardware that need 5-10 
minutes just to pass POST tests,

one question came up to me:
Is it possible anyhow to execute regular (not special "panic" one to 
capture crash data) kexec on panic to reduce reboot time?


Generally, you don't want to do this, because various platform hardware
might be in non-quiescent states (still doing DMA to random memory, 
etc.)
and other nastiness that means you don't want to do more than the 
minimal

amount in a kexec on panic (crash). We've seen no end of fun and games
even with just regular crash dumps while hardware is busily writing to
memory that it shouldn't be. An IOMMU helps, but isn't a cure-all.

Jon.
Well, i have to try, even sometimes i am facing issues with non-booting 
hardware even on regular kexec, but having at small customer HP server 
that need almost 6 minutes to boot,
no hot-spare(and hard to do by many reasons, no spare 10G ports, cost of 
hardware and etc) and some nasty bugs that is not resolved yet - forcing 
me to search way to reduce reboot time.
If i will find way to save backtrace and reboot fast, it will help a lot 
to debug kernels with minimal downtime, if bug is reproducible only on 
live system.


What i did now, might be insanely wrong, but:
diff -Naur linux-4.9.9-vanilla/kernel/kexec_core.c 
linux-4.9.9/kernel/kexec_core.c
--- linux-4.9.9-vanilla/kernel/kexec_core.c	2017-02-09 
07:08:40.0 +

+++ linux-4.9.9/kernel/kexec_core.c 2017-02-17 12:54:49.0 +
@@ -897,6 +897,10 @@
machine_crash_shutdown(&fixed_regs);
machine_kexec(kexec_crash_image);
}
+   if (kexec_image) {
+   machine_shutdown();
+   machine_kexec(kexec_image);
+   }
mutex_unlock(&kexec_mutex);
}
 }

Then

kexec -l /mnt/flash/kernel --append="intel_idle.max_cstate=0 
processor.max_cstate=1"


and
echo c >/proc/sysrq-trigger
worked even on busy network router, but i'm not sure it will be same on 
real networking stack crash.

Mistake in include IS_ENABLED(CONFIG_LIVEPATCH)

2017-02-10 Thread Denys Fedoryshchenko


Hello,

I noticed that sample of livepatch is not working in 4.9.9, because in 
include,

linux/livepatch.h
it is:
#if IS_ENABLED(CONFIG_LIVEPATCH)

while config option is:
CONFIG_HAVE_LIVEPATCH=y

After editing livepatch.h sample module compiles fine

Probably that's just a typo?

kexec on panic

2017-02-10 Thread Denys Fedoryshchenko


Hello,

After years of using kexec and recent unpleasant experience with modern 
(supposed to be blazing fast to boot) hardware that need 5-10 minutes 
just to pass POST tests,

one question came up to me:
Is it possible anyhow to execute regular (not special "panic" one to 
capture crash data) kexec on panic to reduce reboot time?


Thanks!

Re: probably serious conntrack/netfilter panic, 4.8.14, timers and intel turbo

2017-01-11 Thread Denys Fedoryshchenko


On 2017-01-11 19:22, Guillaume Nault wrote:

Cc: netfilter-de...@vger.kernel.org, I'm afraid I'll need some help
for this case.

On Sat, Dec 17, 2016 at 09:48:13PM +0200, Denys Fedoryshchenko wrote:

Hi,

I posted recently several netfilter related crashes, didn't got any 
answers,

one of them started to happen quite often on loaded NAT (17Gbps),
so after trying endless ways to make it stable, i found out that in
backtrace i can often see timers, and this bug probably appearing on 
older

releases,
i've seen such backtrace with timer fired for conntrack on them.
I disabled Intel turbo for cpus on this loaded NAT, and voila, panic
disappeared for 2nd day!
* by wrmsr -a 0x1a0 0x4000850089
I am not sure timers is the reason, but probably turbo creating some
condition for bug.



Re-formatting the stack-trace for easier reference:

[28904.162607] BUG: unable to handle kernel NULL pointer dereference
at 0008
[28904.163210] IP: []
nf_ct_add_to_dying_list+0x55/0x61 [nf_conntrack]
[28904.163745] PGD 0
[28904.164058] Oops: 0002 [#1] SMP
[28904.164323] Modules linked in: nf_nat_pptp nf_nat_proto_gre
xt_TCPMSS xt_connmark ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_nat
xt_rateest xt_RATEEST nf_conntrack_pptp nf_conntrack_proto_gre xt_CT
xt_set xt_hl xt_tcpudp ip_set_hash_net ip_set nfnetlink iptable_raw
iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables
netconsole configfs 8021q garp mrp stp llc bonding ixgbe dca
[28904.168132] CPU: 27 PID: 0 Comm: swapper/27 Not tainted 
4.8.14-build-0124 #2

[28904.168398] Hardware name: Intel Corporation S2600WTT/S2600WTT,
BIOS SE5C610.86B.01.01.1008.031920151331 03/19/2015
[28904.168853] task: 885fa42e8c40 task.stack: 885fa42f
[28904.169114] RIP: 0010:[] []
nf_ct_add_to_dying_list+0x55/0x61 [nf_conntrack]
[28904.169643] RSP: 0018:885fbccc3dd8 EFLAGS: 00010246
[28904.169901] RAX:  RBX: 885fbccc RCX: 
885fbccc0010
[28904.170169] RDX: 885f87a1c150 RSI: 0142 RDI: 
885fbccc
[28904.170437] RBP: 885fbccc3de8 R08: cbdee177 R09: 
0100
[28904.170704] R10: 885fbccc3dd0 R11: 820050c0 R12: 
885f87a1c140
[28904.170971] R13: 0005d948 R14: 000ea942 R15: 
885f87a1c160

[28904.171237] FS: () GS:885fbccc()
knlGS:
[28904.171688] CS: 0010 DS:  ES:  CR0: 80050033
[28904.171964] CR2: 0008 CR3: 00607f006000 CR4: 
001406e0

[28904.172231] Stack:
[28904.172482] 885f87a1c140 820a1405 885fbccc3e28
a00abb30
[28904.173182] 0002820a1405 885f87a1c140 885f99a28201

[28904.173884]  820050c8 885fbccc3e58
a00abc62
[28904.174585] Call Trace:
[28904.174835] 
[28904.174912] [] nf_ct_delete_from_lists+0xc9/0xf2
[nf_conntrack]
[28904.175613] [] nf_ct_delete+0x109/0x12c 
[nf_conntrack]
[28904.175894] [] ? nf_ct_delete+0x12c/0x12c 
[nf_conntrack]
[28904.176169] [] death_by_timeout+0xd/0xf 
[nf_conntrack]

[28904.176443] [] call_timer_fn.isra.5+0x17/0x6b
[28904.176714] [] expire_timers+0x6f/0x7e
[28904.176975] [] run_timer_softirq+0x69/0x8b
[28904.177238] [] ? 
clockevents_program_event+0xd0/0xe8

[28904.177504] [] __do_softirq+0xbd/0x1aa
[28904.177765] [] irq_exit+0x37/0x7c
[28904.178026] [] 
smp_trace_apic_timer_interrupt+0x7b/0x88

[28904.178300] [] smp_apic_timer_interrupt+0x9/0xb
[28904.178565] [] apic_timer_interrupt+0x7c/0x90
[28904.178835] 
[28904.178907] [] ? mwait_idle+0x64/0x7a
[28904.179436] [] ? 
atomic_notifier_call_chain+0x13/0x15

[28904.179712] [] arch_cpu_idle+0xa/0xc
[28904.179976] [] default_idle_call+0x27/0x29
[28904.180244] [] cpu_startup_entry+0x11d/0x1c7
[28904.180508] [] start_secondary+0xe8/0xeb
[28904.180767] Code: 80 2f 0b 82 48 89 df e8 da 90 84 e1 48 8b 43 10
49 8d 54 24 10 48 8d 4b 10 49 89 4c 24 18 a8 01 49 89 44 24 10 48 89
53 10 75 04 <89> 50 08 c6 03 00 5b 41 5c 5d c3 48 8b 05 10 be 00 00 89
f6
[28904.185546] RIP []
nf_ct_add_to_dying_list+0x55/0x61 [nf_conntrack]
[28904.186065] RSP 
[28904.186319] CR2: 0008
[28904.186593] ---[ end trace 35cbc6c885a5c2d8 ]---
[28904.186860] Kernel panic - not syncing: Fatal exception in interrupt
[28904.187155] Kernel Offset: disabled
[28904.187419] Rebooting in 5 seconds..
[28909.193662] ACPI MEMORY or I/O RESET_REG.

And here's decodecode's output:

All code

   0:   80 2f 0bsubb   $0xb,(%rdi)
   3:   82  (bad)
   4:   48 89 dfmov%rbx,%rdi
   7:   e8 da 90 84 e1  callq  0xe18490e6
   c:   48 8b 43 10 mov0x10(%rbx),%rax
  10:   49 8d 54 24 10  lea0x10(%r12),%rdx
  15:   48 8d 4b 10 lea0x10(%rbx),%rcx
  19:   49 89 4c 24 18  mov%rcx,0x18(%r12)
  1e:   a8 01   test   $0x1,%al

probably serious conntrack/netfilter panic, 4.8.14, timers and intel turbo

2016-12-17 Thread Denys Fedoryshchenko


Hi,

I posted recently several netfilter related crashes, didn't got any 
answers, one of them started to happen quite often on loaded NAT 
(17Gbps),
so after trying endless ways to make it stable, i found out that in 
backtrace i can often see timers, and this bug probably appearing on 
older releases,

i've seen such backtrace with timer fired for conntrack on them.
I disabled Intel turbo for cpus on this loaded NAT, and voila, panic 
disappeared for 2nd day!

* by wrmsr -a 0x1a0 0x4000850089
I am not sure timers is the reason, but probably turbo creating some 
condition for bug.




Here is examples of backtrace of last reboots (kernel 4.8.14), and same 
kernel worked perfectly without turbo.
Last one also one crash on 4.8.0 that looks painfully similar, on 
totally different workload, but with conntrack enabled. It happens there 
much less often,

so harder to crash and test by disabling turbo.

[28904.162607] BUG: unable to handle kernel
NULL pointer dereference
at 0008
[28904.163210] IP:
[] nf_ct_add_to_dying_list+0x55/0x61 [nf_conntrack]
[28904.163745] PGD 0

[28904.164058] Oops: 0002 [#1] SMP
[28904.164323] Modules linked in:
nf_nat_pptp
nf_nat_proto_gre
xt_TCPMSS
xt_connmark
ipt_MASQUERADE
nf_nat_masquerade_ipv4
xt_nat
xt_rateest
xt_RATEEST
nf_conntrack_pptp
nf_conntrack_proto_gre
xt_CT
xt_set
xt_hl
xt_tcpudp
ip_set_hash_net
ip_set
nfnetlink
iptable_raw
iptable_mangle
iptable_nat
nf_conntrack_ipv4
nf_defrag_ipv4
nf_nat_ipv4
nf_nat
nf_conntrack
iptable_filter
ip_tables
x_tables
netconsole
configfs
8021q
garp
mrp
stp
llc
bonding
ixgbe
dca

[28904.168132] CPU: 27 PID: 0 Comm: swapper/27 Not tainted 
4.8.14-build-0124 #2
[28904.168398] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS 
SE5C610.86B.01.01.1008.031920151331 03/19/2015

[28904.168853] task: 885fa42e8c40 task.stack: 885fa42f
[28904.169114] RIP: 0010:[]
[] nf_ct_add_to_dying_list+0x55/0x61 [nf_conntrack]
[28904.169643] RSP: 0018:885fbccc3dd8 EFLAGS: 00010246
[28904.169901] RAX:  RBX: 885fbccc RCX: 
885fbccc0010
[28904.170169] RDX: 885f87a1c150 RSI: 0142 RDI: 
885fbccc
[28904.170437] RBP: 885fbccc3de8 R08: cbdee177 R09: 
0100
[28904.170704] R10: 885fbccc3dd0 R11: 820050c0 R12: 
885f87a1c140
[28904.170971] R13: 0005d948 R14: 000ea942 R15: 
885f87a1c160
[28904.171237] FS: () GS:885fbccc() 
knlGS:

[28904.171688] CS: 0010 DS:  ES:  CR0: 80050033
[28904.171964] CR2: 0008 CR3: 00607f006000 CR4: 
001406e0

[28904.172231] Stack:
[28904.172482] 885f87a1c140
820a1405
885fbccc3e28
a00abb30

[28904.173182] 0002820a1405
885f87a1c140
885f99a28201


[28904.173884] 
820050c8
885fbccc3e58
a00abc62

[28904.174585] Call Trace:
[28904.174835] 

[28904.174912] [] nf_ct_delete_from_lists+0xc9/0xf2 
[nf_conntrack]
[28904.175613] [] nf_ct_delete+0x109/0x12c 
[nf_conntrack]
[28904.175894] [] ? nf_ct_delete+0x12c/0x12c 
[nf_conntrack]
[28904.176169] [] death_by_timeout+0xd/0xf 
[nf_conntrack]

[28904.176443] [] call_timer_fn.isra.5+0x17/0x6b
[28904.176714] [] expire_timers+0x6f/0x7e
[28904.176975] [] run_timer_softirq+0x69/0x8b
[28904.177238] [] ? 
clockevents_program_event+0xd0/0xe8

[28904.177504] [] __do_softirq+0xbd/0x1aa
[28904.177765] [] irq_exit+0x37/0x7c
[28904.178026] [] 
smp_trace_apic_timer_interrupt+0x7b/0x88

[28904.178300] [] smp_apic_timer_interrupt+0x9/0xb
[28904.178565] [] apic_timer_interrupt+0x7c/0x90
[28904.178835] 

[28904.178907] [] ? mwait_idle+0x64/0x7a
[28904.179436] [] ? 
atomic_notifier_call_chain+0x13/0x15

[28904.179712] [] arch_cpu_idle+0xa/0xc
[28904.179976] [] default_idle_call+0x27/0x29
[28904.180244] [] cpu_startup_entry+0x11d/0x1c7
[28904.180508] [] start_secondary+0xe8/0xeb
[28904.180767] Code:
80
2f
0b
82
48
89
df
e8
da
90
84
e1
48
8b
43
10
49
8d
54
24
10
48
8d
4b
10
49
89
4c
24
18
a8
01
49
89
44
24
10
48
89
53
10
75
04

89
50
08
c6
03
00
5b
41
5c
5d
c3
48
8b
05
10
be
00
00
89
f6

[28904.185546] RIP
[] nf_ct_add_to_dying_list+0x55/0x61 [nf_conntrack]
[28904.186065] RSP 
[28904.186319] CR2: 0008
[28904.186593] ---[ end trace 35cbc6c885a5c2d8 ]---
[28904.186860] Kernel panic - not syncing: Fatal exception in interrupt
[28904.187155] Kernel Offset: disabled
[28904.187419] Rebooting in 5 seconds..

[28909.193662] ACPI MEMORY or I/O RESET_REG.



[14125.227611] BUG: unable to handle kernel
NULL pointer dereference
at (null)
[14125.228215] IP:
[] nf_nat_setup_info+0x6d8/0x755 [nf_nat]
[14125.228564] PGD 0

[14125.228882] Oops:  [#1] SMP
[14125.229146] Modules linked in:
nf_nat_pptp
nf_nat_proto_gre
xt_TCPMSS
xt_connmark
ipt_MASQUERADE
nf_nat_masquerade_ipv4
xt_nat
xt_rateest
xt_RATEEST
nf_conntrack_pptp
nf_conntrack_proto_gre
xt_CT
xt_set
xt_hl
xt_tcpudp
ip_set_hash_net
ip_set
nfnetlink
iptable_raw
ipt

regression, 4.8.10 -> 4.9.0 totally fail on NUMA machine, ACPI issue?

2016-12-12 Thread Denys Fedoryshchenko


Hi,

Just attempted to upgrade from 4.8.10 to 4.9.10 with minimal kernel 
changes (oldconfig, but then attempted to add few options to solve 
problem (such as adding NR_CPUS and PCI options, didnt helped).
My filesystem are residing on USB drive, and USB where flash are located 
is not working, so i am able to get only early kernel messages over 
netconsole. Also this server is semi-production,

and remote, so i can't do much tests on it (only at night).

Hardware:
2x E5-2640 v3
Motherboard: S2600WTT
RAM: 384GB (not sure)

Here is diff of kernel messages:

--- x0c 2016-12-13 04:37:04.245892429 +0200
+++ x1c 2016-12-13 04:45:23.976088816 +0200
@@ -1,5 +1,5 @@
-version 4.8.10-build-0121 (root@dev) (gcc version 5.4.0 (Gentoo 5.4.0 
p1.0, pie-0.6.5) ) #10 SMP Thu Nov 24 01:05:28 UTC 2016
-line: BOOT_IMAGE=kernel panic=1 intel_idle.max_cstate=0 
processor.max_cstate=1
+version 4.9.0-build-0123 (root@dev) (gcc version 5.4.0 (Gentoo 5.4.0 
p1.0, pie-0.6.5) ) #3 SMP Tue Dec 13 02:07:57 UTC 2016
+line: BOOT_IMAGE=kernelup panic=1 intel_idle.max_cstate=0 
processor.max_cstate=1

 Supporting XSAVE feature 0x001: 'x87 floating point registers'
 Supporting XSAVE feature 0x002: 'SSE registers'
 Supporting XSAVE feature 0x004: 'AVX registers'
@@ -249,36 +249,36 @@
 INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
 ACPI (MADT) for SMP configuration information
 HPET id: 0x8086a701 base: 0xfed0
-144 Processors exceeds NR_CPUS limit of 64
-Allowing 64 CPUs, 32 hotplug CPUs
+Allowing 32 CPUs, 0 hotplug CPUs
 [mem 0x9000-0xfed1bfff] available for PCI devices
 refined-jiffies: mask: 0x max_cycles: 0x, max_idle_ns: 
1910969940391419 ns

-NR_CPUS:64 nr_cpumask_bits:64 nr_cpu_ids:64 nr_node_ids:2
-Embedded 33 pages/cpu @882fbf60 s95320 r8192 d31656 u262144
+NR_CPUS:64 nr_cpumask_bits:64 nr_cpu_ids:32 nr_node_ids:2
+Embedded 33 pages/cpu @882fbfa0 s95512 r8192 d31464 u262144
 2 zonelists in Node order, mobility grouping on. Total pages: 99066449
 zone: Normal
-command line: BOOT_IMAGE=kernel panic=1 intel_idle.max_cstate=0 
processor.max_cstate=1
+command line: BOOT_IMAGE=kernelup panic=1 intel_idle.max_cstate=0 
processor.max_cstate=1

 individual max cpu contribution: 4096 bytes
-total cpu_extra contributions: 258048 bytes
+total cpu_extra contributions: 126976 bytes
 min size: 32768 bytes
-524288 bytes
-log buf free: 11976(36%)
+262144 bytes
+log buf free: 12212(37%)
 hash table entries: 4096 (order: 3, 32768 bytes)
-396163212K/402555824K available (9082K kernel code, 730K rwdata, 4528K 
rodata, 8024K init, 416K bss, 6392612K reserved, 0K cma-reserved)

-HWalign=64, Order=0-3, MinObjects=0, CPUs=64, Nodes=2
+396167976K/402555824K available (9238K kernel code, 746K rwdata, 4564K 
rodata, 8016K init, 416K bss, 6387848K reserved, 0K cma-reserved)

+HWalign=64, Order=0-3, MinObjects=0, CPUs=32, Nodes=2
 RCU implementation.
 adjustment of leaf fanout to 64.
-nr_irqs:1752 16
+restricting CPUs from NR_CPUS=64 to nr_cpu_ids=32.
+Adjusting geometry for rcu_fanout_leaf=64, nr_cpu_ids=32
+nr_irqs:1496 16
 colour VGA+ 80x25
 [tty0] enabled
 hpet: mask: 0x max_cycles: 0x, max_idle_ns: 
133484882848 ns

 Fast TSC calibration using PIT
-Detected 2593.948 MHz processor
-delay loop (skipped), value calculated using timer frequency.. 5187.89 
BogoMIPS (lpj=2593948)

-default: 65536 minimum: 512
-Core revision 20160422
+Detected 2593.955 MHz processor
+delay loop (skipped), value calculated using timer frequency.. 5187.91 
BogoMIPS (lpj=2593955)

+default: 32768 minimum: 301
+Core revision 20160831
 4 ACPI AML tables successfully acquired and loaded
-
 Framework initialized
 cache hash table entries: 67108864 (order: 17, 536870912 bytes)
 hash table entries: 33554432 (order: 16, 268435456 bytes)
@@ -293,246 +293,31 @@
 using mwait in idle threads
 level iTLB entries: 4KB 1024, 2MB 1024, 4MB 1024
 level dTLB entries: 4KB 1024, 2MB 1024, 4MB 1024, 1GB 4
-SMP alternatives memory: 32K (8288e000 - 82896000)
+SMP alternatives memory: 32K (8289 - 82898000)
 APIC(0) Converting physical 0 to logical package 0
 APIC(10) Converting physical 1 to logical package 1
-fast init done
-Max logical packages: 18
-APIC routing to physical flat.
-vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
-CPU0: Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz (family: 0x6, model: 
0x3f, stepping: 0x2)
-Events: PEBS fmt2+, Haswell events, 16-deep LBR, full-width counters, 
Intel PMU driver.

-version: 3
-bit width: 48
-generic registers: 4
-value mask: 
-max period: 
-fixed-purpose events: 3
-event mask: 0007000f
-watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
-Booting SMP configuration:
-node #0, CPUs: #1 #2 #3 #4 #5 #6 #7
-node #1, CPUs: #8 #9 #10 #11 #12 #13 #14 #15
-node #0, CPUs: #16 #17 #18 #19 #20 #21 #22 #23
-node #1, CPUs: #24 #25 #26 #27 #28 #29 #30 #31
-Booted up 2 nodes, 32 CPUs
-Total of 32 processors activated (1662

Re: kernel panic in 4.2.3, rb_erase in sch_fq

2015-11-13 Thread Denys Fedoryshchenko

I can confirm, after patch this issue never appeared again. So maybe 
good to push it to stable and etc :) Thanks a lot Eric, you saved me 
again.



Still i have some weird panic issues, maybe related to conntrack, but 
they are rare even on high load, so i am slowly gathering data, and i 
found at least one more person with similar conntrack crashes on latest 
kernels.



On 2015-11-04 06:46, Eric Dumazet wrote:

On Wed, 2015-11-04 at 06:25 +0200, Denys Fedoryshchenko wrote:

On 2015-11-04 00:06, Cong Wang wrote:
> On Mon, Nov 2, 2015 at 6:11 AM, Denys Fedoryshchenko
>  wrote:
>> Hi!
>>
>> Actually seems i was getting this panic for a while (once per week) on
>> loaded pppoe server, but just now was able to get full panic message.
>> After checking commit logs on sch_fq.c i didnt seen any fixes, so
>> probably
>> upgrading to newer kernel wont help?
>
>
> Can you share your `tc qdisc show dev ` with us? And how to
> reproduce
> it? I tried to setup htb+fq and then flip the interface back and forth
> but I don't
> see any crash.
My guess it wont be easy to reproduce, it is happening on box with 
4.5k

interfaces, that constantly create/delete interfaces,
and even with that this problem may happen once per day, or may not
happen for 1 week.

Here is script that is being fired after new ppp interface detected. 
But

pppoe process are independent from
process that are "establishing" shapers.



It is probably a generic bug. sch_fq seems OK to me.

Somehow nobody tries to change qdisc hundred times per second ;)

Could you try following patch ?

It seems to 'fix' the issue for me.

diff --git a/net/core/dev.c b/net/core/dev.c
index 8ce3f74cd6b9..bf136103bc7b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2880,6 +2880,12 @@ static inline int __dev_xmit_skb(struct sk_buff
*skb, struct Qdisc *q,
spin_lock(&q->busylock);

spin_lock(root_lock);
+   if (unlikely(q != rcu_dereference_bh(txq->qdisc))) {
+   pr_err_ratelimited("Arg, qdisc changed ! state %lx\n", 
q->state);
+   kfree_skb(skb);
+   rc = NET_XMIT_DROP;
+   goto end;
+   }
if (unlikely(test_bit(__QDISC_STATE_DEACTIVATED, &q->state))) {
kfree_skb(skb);
rc = NET_XMIT_DROP;
@@ -2913,6 +2919,7 @@ static inline int __dev_xmit_skb(struct sk_buff
*skb, struct Qdisc *q,
__qdisc_run(q);
}
}
+end:
spin_unlock(root_lock);
if (unlikely(contended))
spin_unlock(&q->busylock);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel panic in 4.2.3, rb_erase in sch_fq

2015-11-03 Thread Denys Fedoryshchenko


On 2015-11-04 06:58, Eric Dumazet wrote:

On Tue, 2015-11-03 at 20:46 -0800, Eric Dumazet wrote:

On Wed, 2015-11-04 at 06:25 +0200, Denys Fedoryshchenko wrote:
> On 2015-11-04 00:06, Cong Wang wrote:
> > On Mon, Nov 2, 2015 at 6:11 AM, Denys Fedoryshchenko
> >  wrote:
> >> Hi!
> >>
> >> Actually seems i was getting this panic for a while (once per week) on
> >> loaded pppoe server, but just now was able to get full panic message.
> >> After checking commit logs on sch_fq.c i didnt seen any fixes, so
> >> probably
> >> upgrading to newer kernel wont help?
> >
> >
> > Can you share your `tc qdisc show dev ` with us? And how to
> > reproduce
> > it? I tried to setup htb+fq and then flip the interface back and forth
> > but I don't
> > see any crash.
> My guess it wont be easy to reproduce, it is happening on box with 4.5k
> interfaces, that constantly create/delete interfaces,
> and even with that this problem may happen once per day, or may not
> happen for 1 week.
>
> Here is script that is being fired after new ppp interface detected. But
> pppoe process are independent from
> process that are "establishing" shapers.


It is probably a generic bug. sch_fq seems OK to me.

Somehow nobody tries to change qdisc hundred times per second ;)

Could you try following patch ?

It seems to 'fix' the issue for me.


Following patch would be more appropriate.
Prior one was meant to 'show' the issue.

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index cb5d4ad32946..7f5f3e8a10f5 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -706,9 +706,11 @@ struct Qdisc *dev_graft_qdisc(struct netdev_queue
*dev_queue,
spin_lock_bh(root_lock);

/* Prune old scheduler */
-   if (oqdisc && atomic_read(&oqdisc->refcnt) <= 1)
-   qdisc_reset(oqdisc);
-
+   if (oqdisc) {
+   if (atomic_read(&oqdisc->refcnt) <= 1)
+   qdisc_reset(oqdisc);
+   set_bit(__QDISC_STATE_DEACTIVATED, &oqdisc->state);
+   }
/* ... and graft new one */
if (qdisc == NULL)
qdisc = &noop_qdisc;


Applied, will test it, but this bug might be triggered rarely.
I will try to push it to more pppoe servers in order to stress test them 
(and 4.3) more.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel panic in 4.2.3, rb_erase in sch_fq

2015-11-03 Thread Denys Fedoryshchenko


On 2015-11-04 00:06, Cong Wang wrote:

On Mon, Nov 2, 2015 at 6:11 AM, Denys Fedoryshchenko
 wrote:

Hi!

Actually seems i was getting this panic for a while (once per week) on
loaded pppoe server, but just now was able to get full panic message.
After checking commit logs on sch_fq.c i didnt seen any fixes, so 
probably

upgrading to newer kernel wont help?



Can you share your `tc qdisc show dev ` with us? And how to 
reproduce

it? I tried to setup htb+fq and then flip the interface back and forth
but I don't
see any crash.
My guess it wont be easy to reproduce, it is happening on box with 4.5k 
interfaces, that constantly create/delete interfaces,
and even with that this problem may happen once per day, or may not 
happen for 1 week.


Here is script that is being fired after new ppp interface detected. But 
pppoe process are independent from

process that are "establishing" shapers.

/sbin/tc qdisc del  root
/sbin/tc qdisc add  handle 1: root htb default 3

/sbin/tc filter add parent 1:0 protocol ip prio 4 handle 1 fw flowid 1:3
/sbin/tc filter add parent 1:0 protocol ip prio 3 u32 match ip protocol 
6 0xff match ip src 10.0.252.8/32 flowid 1:3/sbin/tc filter add parent 
1:0 protocol ip prio 5 u32 match ip protocol 1 0xff flowid 1:0
/sbin/tc filter add parent 1:0 protocol ip prio 5 u32 match ip protocol 
6 0xff match ip sport 80 0x flowid 1:4
/sbin/tc filter add parent 1:0 protocol ip prio 5 u32 match ip protocol 
6 0xff match ip sport 443 0x flowid 1:5
/sbin/tc filter add parent 1:0 protocol ip prio 100 u32 match u32 0 0 
flowid 1:2


/sbin/tc class add  classid 1:1 parent 1:0 htb rate 512Kbit ceil 
512Kbit.

/sbin/tc class add  classid 1:2 parent 1:1 htb rate 32Kbit ceil 512Kbit
/sbin/tc class add  classid 1:3 parent 1:0 htb rate 10Mbit ceil 10Mbit
/sbin/tc class add  classid 1:4 parent 1:1 htb rate 32Kbit ceil 512Kbit
/sbin/tc class add  classid 1:5 parent 1:1 htb rate 32Kbit ceil 512Kbit

/sbin/tc qdisc add parent 1:2 fq limit 300
/sbin/tc qdisc add parent 1:3 pfifo limit 300
/sbin/tc qdisc add parent 1:4 fq limit 300
/sbin/tc qdisc add parent 1:5 fq limit 300

Possible cases come to my mind (but maybe i missed others):
 Script and tc working and interface are deleted in a process (e.g. 
interface disappears)
 Script deleting root while there is heavy traffic on interface and a 
lot of packets queued
 ppp interface destroyed, while there is a lot of traffic queued on it 
(this one a bit rare situation)




Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel panic in 4.2.3, rb_erase in sch_fq

2015-11-02 Thread Denys Fedoryshchenko

On 2015-11-02 18:12, Eric Dumazet wrote:

On Mon, 2015-11-02 at 17:58 +0200, Denys Fedoryshchenko wrote:

On 2015-11-02 17:24, Eric Dumazet wrote:
> On Mon, 2015-11-02 at 16:11 +0200, Denys Fedoryshchenko wrote:
>> Hi!
>>
>> Actually seems i was getting this panic for a while (once per week) on
>> loaded pppoe server, but just now was able to get full panic message.
>> After checking commit logs on sch_fq.c i didnt seen any fixes, so
>> probably upgrading to newer kernel wont help?
>
> I do not think we support sch_fq as a HTB leaf.
>
> If you want both HTB and sch_fq, you need to setup a bonding device.
>
> HTB on bond0
>
> sch_fq on the slaves
>
> Sure, the kernel should not crash, but HTB+sch_fq on same net device is
> certainly not something that will work anyway.
Strange, because except ppp, on static devices it works really very 
well

in such scheme. It is the only solution that can throttle incoming
bandwidth, when bandwidth is very overbooked - reliably, for my use
cases, such as 256k+ flows/2.5Gbps and several different classes of
traffic, so using DRR will end up in just not enough classes.

On latest kernels i had to patch tc to provide parameter for orphan 
mask

in fq, to increase number for flows for transit traffic.
None of other qdiscs able to solve this problem, incoming bandwidth
simply flowing 10-20% more than set, but fq is doing magic.
The only device that was working with similar efficiency for such 
cases
- proprietary PacketShaper, but is modifying tcp window size, and 
can't

be called transparent, and also has stability issues over 1Gbps.

Ah, I was thinking you needed more like 10Gb traffic ;)

with HTB on bonding, we can use MQ+FQ on the slaves in order to use 
many

cpus to serve local traffic.

But yes, if you use HTB+FQ for forwarding, I guess the bonding setup is
not really needed.
Well, here country is very underdeveloped in matters of technology. 10G 
interfaces appeared in some ISP only this year.
On the ppp interfaces where crash happening - it is even less bandwidth. 
Each user max 1-2Mbps(average usage 128kbps), 4.5k interfaces.
But i have some more heavy setups there, around 9k pppoe users 
terminated on single server, (means 9k interfaces), about 2Gbps traffic 
passing thru.
If i take non-FOSS solution, i will have to pay for software licenses 
$100k+, which is unbearable for local ISP. fq is not critical in this 
specific use case, i can use for ppp interfaces fifo or such, but i 
guess better to report a but :)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel panic in 4.2.3, rb_erase in sch_fq

2015-11-02 Thread Denys Fedoryshchenko


On 2015-11-02 17:24, Eric Dumazet wrote:

On Mon, 2015-11-02 at 16:11 +0200, Denys Fedoryshchenko wrote:

Hi!

Actually seems i was getting this panic for a while (once per week) on
loaded pppoe server, but just now was able to get full panic message.
After checking commit logs on sch_fq.c i didnt seen any fixes, so
probably upgrading to newer kernel wont help?


I do not think we support sch_fq as a HTB leaf.

If you want both HTB and sch_fq, you need to setup a bonding device.

HTB on bond0

sch_fq on the slaves

Sure, the kernel should not crash, but HTB+sch_fq on same net device is
certainly not something that will work anyway.
Strange, because except ppp, on static devices it works really very well 
in such scheme. It is the only solution that can throttle incoming 
bandwidth, when bandwidth is very overbooked - reliably, for my use 
cases, such as 256k+ flows/2.5Gbps and several different classes of 
traffic, so using DRR will end up in just not enough classes.


On latest kernels i had to patch tc to provide parameter for orphan mask 
in fq, to increase number for flows for transit traffic.
None of other qdiscs able to solve this problem, incoming bandwidth 
simply flowing 10-20% more than set, but fq is doing magic.
The only device that was working with similar efficiency for such cases 
- proprietary PacketShaper, but is modifying tcp window size, and can't 
be called transparent, and also has stability issues over 1Gbps.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

kernel panic in 4.2.3, rb_erase in sch_fq

2015-11-02 Thread Denys Fedoryshchenko


Hi!

Actually seems i was getting this panic for a while (once per week) on 
loaded pppoe server, but just now was able to get full panic message.
After checking commit logs on sch_fq.c i didnt seen any fixes, so 
probably upgrading to newer kernel wont help?



 [237470.633382] general protection fault:  [#1]
 SMP

 [237470.633832] Modules linked in:
 netconsole
 configfs
 act_skbedit
 sch_fq
 cls_fw
 act_police
 cls_u32
 sch_ingress
 sch_sfq
 sch_htb
 pppoe
 pppox
 ppp_generic
 slhc
 xt_nat
 ts_bm
 xt_string
 xt_connmark
 xt_TCPMSS
 xt_tcpudp
 xt_mark
 iptable_filter
 iptable_nat
 nf_conntrack_ipv4
 nf_defrag_ipv4
 nf_nat_ipv4
 nf_nat
 nf_conntrack
 iptable_mangle
 ip_tables
 x_tables
 8021q
 garp
 mrp
 stp
 llc

 [237470.637835] CPU: 1 PID: 14035 Comm: accel-pppd Not tainted 
4.2.3-build-0087 #3
 [237470.638342] Hardware name: Intel Corporation 
S2600GZ/S2600GZ, BIOS SE5C600.86B.01.03.0002.062020121504 06/20/2012
 [237470.638859] task: 8803ef8b5080 ti: 8803ed7e 
task.ti: 8803ed7e

 [237470.639370] RIP: 0010:[]
 [] rb_erase+0x37/0x2c4
 [237470.639960] RSP: 0018:8803ed7e3b88  EFLAGS: 00010286
 [237470.644863] RAX:  RBX: 8804106ab000 
RCX: 0001
 [237470.645366] RDX: ffa2050402210218 RSI: 88040cfe2cf0 
RDI: 8803f50d00e0
 [237470.645872] RBP: 8803ed7e3b88 R08:  
R09: 88042ee37d50
 [237470.646376] R10: ea000fe7a9c0 R11: 94f1b850 
R12: 019e
 [237470.646881] R13: 88040cfe2cf0 R14: 8803f50d00d0 
R15: 
 [237470.647381] FS:  7fcd5d384700() 
GS:88042ee2() knlGS:
 [237470.647889] CS:  0010 DS:  ES:  CR0: 
80050033
 [237470.648209] CR2: 7fcd003efa90 CR3: 000424b6e000 
CR4: 000406e0

 [237470.648707] Stack:
 [237470.648990]  8803ed7e3bb8
 a00ef38b
 8804106ab000
 880416079000

 [237470.649791]  0002
 8804160790d8
 8803ed7e3bd8
 8183785c

 [237470.650589]  0002
 8800b021d000
 8803ed7e3c18
 a00d247a

 [237470.651387] Call Trace:
 [237470.651716]  [] fq_reset+0x7a/0xf2 
[sch_fq]

 [237470.652084]  [] qdisc_reset+0x18/0x42
 [237470.652444]  [] htb_reset+0x96/0x14d 
[sch_htb]

 [237470.652780]  [] qdisc_reset+0x18/0x42
 [237470.653146]  [] 
dev_deactivate_queue.constprop.34+0x43/0x53
 [237470.653726]  [] 
dev_deactivate_many+0x53/0x206
 [237470.654088]  [] 
__dev_close_many+0x73/0xbf

 [237470.654436]  [] __dev_close+0x2c/0x41
 [237470.654784]  [] ? 
_raw_spin_unlock_bh+0x15/0x17
 [237470.655106]  [] 
__dev_change_flags+0xa5/0x13c
 [237470.655427]  [] 
dev_change_flags+0x23/0x59

 [237470.655777]  [] ? mutex_lock+0x13/0x24
 [237470.656073]  [] devinet_ioctl+0x246/0x533
 [237470.656372]  [] inet_ioctl+0x8c/0xa6
 [237470.656667]  [] sock_do_ioctl+0x22/0x40
 [237470.656960]  [] sock_ioctl+0x1f2/0x200
 [237470.657253]  [] do_vfs_ioctl+0x360/0x41a
 [237470.657549]  [] ? vfs_write+0x105/0x164
 [237470.657841]  [] SyS_ioctl+0x39/0x61
 [237470.658134]  [] 
entry_SYSCALL_64_fastpath+0x16/0x6e

 [237470.658431] Code:
 48
 85
 c0
 75
 36
 48
 8b
 0f
 48
 89
 c8
 48
 83
 e0
 fc
 74
 12
 48
 39
 78
 10
 75
 06
 48
 89
 50
 10
 eb
 09
 48
 89
 50
 08
 eb
 03
 48
 89
 16
 48
 85
 d2
 74
 08

 89
 0a
 e9
 83
 02
 00
 00
 80
 e1
 01
 e9
 c3
 00
 00
 00
 48
 85
 d2
 75
 2c

 [237470.663930] RIP
 [] rb_erase+0x37/0x2c4
 [237470.664296]  RSP 
 [237470.664598] ---[ end trace 32ea40a7de450892 ]---
 [237470.673272] Kernel panic - not syncing: Fatal exception in 
interrupt

 [237470.673577] Kernel Offset: disabled
 [237470.704654] Rebooting in 5 seconds..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.12 BUG() on ext4, kernel crash on nbd-client when nbd server rebooting

2013-11-12 Thread Denys Fedoryshchenko


Hi

On 2013-11-12 23:46, Jan Kara wrote:

Hello,

On Tue 12-11-13 16:34:07, Denys Fedoryshchenko wrote:

I just did some fault testing for test nbd setup, and found that if
i reboot nbd server i will get immediately BUG() message on nbd
client and filesystem that i cannot unmount, and any operations on
it will freeze and lock processes trying to access it.
  So how exactly did you do the fault testing? Because it seems 
something
has discarded the block device under filesystem's toes and the 
superblock

buffer_head got unmapped. Didn't something call NBD_CLEAR_SOCK ioctl?
Because that calls kill_bdev() which would do exactly that...


Client side:
modprobe nbd
nbd-client 2.2.2.29 /dev/nbd0 -name export1
nbd-client 2.2.2.29 /dev/nbd1 -name export2
nbd-client 2.2.2.29 /dev/nbd2 -name export3
mount /dev/nbd0 /mnt/disk1
mount /dev/nbd1 /mnt/disk2
mount /dev/nbd2 /mnt/disk3

On server i have config:
[generic]
[export1]
exportname = /dev/sda1
[export2]
exportname = /dev/sdb1
[export3]
exportname = /dev/sdc1

Steps to reproduce:
1)Start some large file copy on client side to /mnt/disk1/
2)Reboot server. It reboots quite fast, just few seconds, server system 
will get ip before nbd-server process started listening, so probably 
nbd-client will see connection refused.

3)seems when client gets connection refused - it is going mad

I can try to capture traffic dump, or do any other debug operation, 
please let me know, what i should run :)
P.S. I noticed maybe i should run persist mode, but anyway it should not 
crash like this i think.




Honza

Kernel 3.12, x86_64

Please let me know if you need more information

Here is dmesg contents i got:
[  102.269270] block nbd1: Receive control failed (result -32)
[  102.269443] block nbd1: shutting down socket
[  102.269461] block nbd1: queue cleared
[  102.269859] block nbd2: Receive control failed (result -32)
[  102.269873] block nbd2: shutting down socket
[  102.269883] block nbd2: queue cleared
[  102.271353] block nbd0: Receive control failed (result -32)
[  102.271518] block nbd0: shutting down socket
[  102.271536] block nbd0: queue cleared
[  106.297217] block nbd0: Attempted send on closed socket
[  106.297219] end_request: I/O error, dev nbd0, sector 73992
[  106.297226] EXT4-fs warning (device nbd0):
__ext4_read_dirblock:908: error reading directory block (ino 2,
block 0)
[  106.297233] block nbd0: Attempted send on closed socket
[  106.297235] end_request: I/O error, dev nbd0, sector 8456
[  106.297245] [ cut here ]
[  106.297343] kernel BUG at fs/buffer.c:3015!
[  106.297438] invalid opcode:  [#1] SMP
[  106.297716] Modules linked in: nbd act_mirred cls_u32 sch_ingress
sch_htb iptable_filter i2c_i801
[  106.298568] CPU: 0 PID: 2587 Comm: ls Not tainted 3.12.0noc-02 #1
[  106.298665] Hardware name:  /DH55TC, BIOS
TCIBX10H.86A.0037.2010.0614.1712 06/14/2010
[  106.298772] task: 880231da9770 ti: 880231cd4000 task.ti:
880231cd4000
[  106.298879] RIP: 0010:[]  []
_submit_bh+0x26/0x1d3
[  106.299078] RSP: 0018:880231cd5b48  EFLAGS: 00010246
[  106.299182] RAX: 0005 RBX: 8800b7456b60 RCX:
0008
[  106.299285] RDX:  RSI: 8800b7456b60 RDI:
0411
[  106.299388] RBP: 880231cd5b68 R08: 0040 R09:
81a9a370
[  106.299487] R10: 810c0d61 R11:  R12:
0411
[  106.299590] R13: 880231b21400 R14:  R15:
0aea9ff5
[  106.299697] FS:  7f4f0d755700() GS:88023fc0()
knlGS:
[  106.299800] CS:  0010 DS:  ES:  CR0: 8005003b
[  106.300114] CR2: 022275c8 CR3: 000235538000 CR4:
07f0
[  106.300438] Stack:
[  106.300750]  8800b7456b60 0411 880231b21400
0001
[  106.301652]  880231cd5b78 81125598 880231cd5ba8
8112761a
[  106.307886]  880231cd5bb8 81293a72 8800b7456b60
8802358d4800
[  106.308794] Call Trace:
[  106.309105]  [] submit_bh+0xb/0xd
[  106.309419]  [] __sync_dirty_buffer+0x53/0x86
[  106.309736]  [] ? __percpu_counter_sum+0x4d/0x63
[  106.310058]  [] sync_dirty_buffer+0xe/0x10
[  106.310368]  [] ext4_commit_super+0x19e/0x1e7
[  106.310687]  [] save_error_info+0x1e/0x22
[  106.311002]  [] __ext4_error_inode+0x52/0x10b
[  106.311326]  [] ? __cond_resched+0x25/0x30
[  106.311634]  [] __ext4_get_inode_loc+0x310/0x336
[  106.311954]  [] ? ext4_dirty_inode+0x3b/0x54
[  106.312277]  [] ext4_get_inode_loc+0x17/0x19
[  106.312596]  [] 
ext4_reserve_inode_write+0x21/0x7e

[  106.312916]  [] ? jbd2__journal_start+0xe0/0x199
[  106.313229]  [] ext4_mark_inode_dirty+0x67/0x1e4
[  106.313549]  [] ? ext4_dirty_inode+0x25/0x54
[  106.313861]  [] ext4_dirty_inode+0x3b/0x54
[  106.314177]  [] __mark_inode_dirty+0x60/0x224
[  106.314493]  [] update_time

3.12 BUG() on ext4, kernel crash on nbd-client when nbd server rebooting

2013-11-12 Thread Denys Fedoryshchenko


Hi

I just did some fault testing for test nbd setup, and found that if i 
reboot nbd server i will get immediately BUG() message on nbd client and 
filesystem that i cannot unmount, and any operations on it will freeze 
and lock processes trying to access it.


Kernel 3.12, x86_64

Please let me know if you need more information

Here is dmesg contents i got:
[  102.269270] block nbd1: Receive control failed (result -32)
[  102.269443] block nbd1: shutting down socket
[  102.269461] block nbd1: queue cleared
[  102.269859] block nbd2: Receive control failed (result -32)
[  102.269873] block nbd2: shutting down socket
[  102.269883] block nbd2: queue cleared
[  102.271353] block nbd0: Receive control failed (result -32)
[  102.271518] block nbd0: shutting down socket
[  102.271536] block nbd0: queue cleared
[  106.297217] block nbd0: Attempted send on closed socket
[  106.297219] end_request: I/O error, dev nbd0, sector 73992
[  106.297226] EXT4-fs warning (device nbd0): __ext4_read_dirblock:908: 
error reading directory block (ino 2, block 0)

[  106.297233] block nbd0: Attempted send on closed socket
[  106.297235] end_request: I/O error, dev nbd0, sector 8456
[  106.297245] [ cut here ]
[  106.297343] kernel BUG at fs/buffer.c:3015!
[  106.297438] invalid opcode:  [#1] SMP
[  106.297716] Modules linked in: nbd act_mirred cls_u32 sch_ingress 
sch_htb iptable_filter i2c_i801

[  106.298568] CPU: 0 PID: 2587 Comm: ls Not tainted 3.12.0noc-02 #1
[  106.298665] Hardware name:  /DH55TC, BIOS 
TCIBX10H.86A.0037.2010.0614.1712 06/14/2010
[  106.298772] task: 880231da9770 ti: 880231cd4000 task.ti: 
880231cd4000
[  106.298879] RIP: 0010:[]  [] 
_submit_bh+0x26/0x1d3

[  106.299078] RSP: 0018:880231cd5b48  EFLAGS: 00010246
[  106.299182] RAX: 0005 RBX: 8800b7456b60 RCX: 
0008
[  106.299285] RDX:  RSI: 8800b7456b60 RDI: 
0411
[  106.299388] RBP: 880231cd5b68 R08: 0040 R09: 
81a9a370
[  106.299487] R10: 810c0d61 R11:  R12: 
0411
[  106.299590] R13: 880231b21400 R14:  R15: 
0aea9ff5
[  106.299697] FS:  7f4f0d755700() GS:88023fc0() 
knlGS:

[  106.299800] CS:  0010 DS:  ES:  CR0: 8005003b
[  106.300114] CR2: 022275c8 CR3: 000235538000 CR4: 
07f0

[  106.300438] Stack:
[  106.300750]  8800b7456b60 0411 880231b21400 
0001
[  106.301652]  880231cd5b78 81125598 880231cd5ba8 
8112761a
[  106.307886]  880231cd5bb8 81293a72 8800b7456b60 
8802358d4800

[  106.308794] Call Trace:
[  106.309105]  [] submit_bh+0xb/0xd
[  106.309419]  [] __sync_dirty_buffer+0x53/0x86
[  106.309736]  [] ? __percpu_counter_sum+0x4d/0x63
[  106.310058]  [] sync_dirty_buffer+0xe/0x10
[  106.310368]  [] ext4_commit_super+0x19e/0x1e7
[  106.310687]  [] save_error_info+0x1e/0x22
[  106.311002]  [] __ext4_error_inode+0x52/0x10b
[  106.311326]  [] ? __cond_resched+0x25/0x30
[  106.311634]  [] __ext4_get_inode_loc+0x310/0x336
[  106.311954]  [] ? ext4_dirty_inode+0x3b/0x54
[  106.312277]  [] ext4_get_inode_loc+0x17/0x19
[  106.312596]  [] ext4_reserve_inode_write+0x21/0x7e
[  106.312916]  [] ? jbd2__journal_start+0xe0/0x199
[  106.313229]  [] ext4_mark_inode_dirty+0x67/0x1e4
[  106.313549]  [] ? ext4_dirty_inode+0x25/0x54
[  106.313861]  [] ext4_dirty_inode+0x3b/0x54
[  106.314177]  [] __mark_inode_dirty+0x60/0x224
[  106.314493]  [] update_time+0x99/0xa2
[  106.314807]  [] touch_atime+0xf1/0x126
[  106.315117]  [] iterate_dir+0x87/0xaa
[  106.315439]  [] SyS_getdents+0x85/0xd0
[  106.315757]  [] ? SyS_ioctl+0x80/0x80
[  106.316081]  [] system_call_fastpath+0x16/0x1b
[  106.316399] Code: 00 5f 5b c9 c3 55 48 89 e5 41 56 49 89 d6 41 55 41 
54 41 89 fc 53 48 89 f3 48 8b 06 a8 04 75 04 0f 0b eb fe 48 8b 06 a8 20 
75 04 <0f> 0b eb fe 48 83 7e 38 00 75 04 0f 0b eb fe 48 8b 06 f6 c4 02

[  106.323170] RIP  [] _submit_bh+0x26/0x1d3
[  106.323579]  RSP 
[  106.323983] ---[ end trace 205f692f3e0cfed7 ]---
[  111.834648] [ cut here ]
[  111.834975] kernel BUG at fs/buffer.c:3015!
[  111.835283] invalid opcode:  [#2] SMP
[  111.835837] Modules linked in: nbd act_mirred cls_u32 sch_ingress 
sch_htb iptable_filter i2c_i801
[  111.837121] CPU: 3 PID: 2578 Comm: jbd2/nbd0-8 Tainted: G  D  
3.12.0noc-02 #1
[  111.837656] Hardware name:  /DH55TC, BIOS 
TCIBX10H.86A.0037.2010.0614.1712 06/14/2010
[  111.838193] task: 88023574f530 ti: 8800b727a000 task.ti: 
8800b727a000
[  111.838741] RIP: 0010:[]  [] 
_submit_bh+0x26/0x1d3

[  111.839372] RSP: 0018:8800b727bbb8  EFLAGS: 00010246
[  111.839688] RAX: 0405 RBX: 8800b7452750 RCX: 
0411
[  111.840005] RDX:  RSI: 8800b7452750 RDI: 
0411
[  111.840325

Re: netlink, RTM_NEWTCLASS, nested attributes

2013-02-20 Thread Denys Fedoryshchenko


On 2013-02-21 01:21, Stephen Hemminger wrote:

On Tue, 19 Feb 2013 23:45:25 +0200
Denys Fedoryshchenko  wrote:


Hi

I tried recently to write my own tool based on amazing libmnl (which
makes understanding of netlink - easy), written
by Pablo Neira Ayuso, to manage QoS in Linux and faced problem, 
which i

think probably
a bug in handling netlink messages in kernel.

For example if i send message, RTM_NEWTCLASS, after attribute
TCA_OPTIONS i have nested attributes,
for example in HTB: TCA_HTB_PARMS, TCA_HTB_RTAB, TCA_HTB_CTAB.
libmnl, if i use nested attribute, adding a bit to it, by OR -
NLA_F_NESTED(1 << 15).
If i remove this flag - everything works fine. And here is the case,
iproute2 tools
just update length of TCA_OPTIONS, without setting flag, and it 
works

because of that fine too.

So there is basically 3 solutions:
1)New function in libmnl to do nested attributes without setting by 
OR

flag
2)AND-ing attribute type in kernel to ignore nested flag
3)Keeping as is, who cares?



Several legacy netlink interfaces don't use NESTED flag. These are by
now enshrined in ABI and can't change. In code, that uses libmnl, I 
just

manually clear the flag as needed and document why. This could
be added to libmnl.


Thank you for clarification!

---
Denys Fedoryshchenko, Network Engineer, Virtual ISP S.A.L.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

kernel BUG at mm/slub.c:3409, 3.8.0-rc7

2013-02-15 Thread Denys Fedoryshchenko

17 localhost kernel: [23260.079648] CR2: ffa8
Feb 16 00:40:17 localhost kernel: [23260.079650] ---[ end trace 
bae1313833245123 ]---
Feb 16 00:40:17 localhost kernel: [23260.079652] Fixing recursive fault 
but reboot is needed!


---
Denys Fedoryshchenko, Network Engineer, Virtual ISP S.A.L.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

3.8.0-rc7, nouveau, possible recursive locking, nouveau_instobj_create_ and nv50_disp_data_ctor

2013-02-15 Thread Denys Fedoryshchenko

[] do_one_initcall+0x7a/0x130
[   16.678006]  [] load_module+0x168b/0x19c0
[   16.678006]  [] ? free_notes_attrs+0x46/0x46
[   16.678006]  [] sys_init_module+0xa9/0xab
[   16.678006]  [] system_call_fastpath+0x1a/0x1f
[   16.776320] [drm] Supports vblank timestamp caching Rev 1 
(10.10.2010).

[   16.777184] [drm] No driver support for vblank timestamp query.
[   16.778029] nouveau  [ DRM] ACPI backlight interface available, 
not registering our own




---
Denys Fedoryshchenko, Network Engineer, Virtual ISP S.A.L.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

kernel 3.7.6, l2tp, qdisc_tx circular locking

2013-02-11 Thread Denys Fedoryshchenko

789808]  [] dst_output+0x18/0x1c
[ 7575.790271]  [] ip_local_out+0x1b/0x1f
[ 7575.790735]  [] ip_queue_xmit+0x2d3/0x337
[ 7575.791200]  [] l2tp_xmit_skb+0x404/0x453 
[l2tp_core]
[ 7575.791668]  [] pppol2tp_xmit+0x122/0x15d 
[l2tp_ppp]

[ 7575.792135]  [] ppp_push+0x7f/0x507
[ 7575.792600]  [] ? 
_raw_spin_unlock_irqrestore+0x3a/0x41
[ 7575.793069]  [] ? 
trace_hardirqs_on_caller+0x107/0x158

[ 7575.793536]  [] ? trace_hardirqs_on+0xd/0xf
[ 7575.794002]  [] ppp_xmit_process+0x44a/0x4ff
[ 7575.794467]  [] ppp_start_xmit+0x128/0x143
[ 7575.794933]  [] dev_hard_start_xmit+0x2ef/0x371
[ 7575.795400]  [] sch_direct_xmit+0x70/0x14f
[ 7575.795864]  [] dev_queue_xmit+0x152/0x34d
[ 7575.796328]  [] neigh_direct_output+0xc/0xe
[ 7575.796795]  [] ip_finish_output2+0x268/0x2e5
[ 7575.797261]  [] ip_finish_output+0x46/0x4b
[ 7575.797726]  [] ip_output+0x63/0x67
[ 7575.798190]  [] ip_forward_finish+0x6b/0x70
[ 7575.798656]  [] ip_forward+0x205/0x285
[ 7575.799120]  [] ip_rcv_finish+0x2b3/0x2cb
[ 7575.799586]  [] ? skb_dst.isra.7+0x58/0x58
[ 7575.800051]  [] NF_HOOK.constprop.8+0x4c/0x55
[ 7575.800517]  [] ip_rcv+0x22b/0x259
[ 7575.800981]  [] __netif_receive_skb+0x458/0x4c5
[ 7575.801448]  [] netif_receive_skb+0x56/0x8b
[ 7575.801913]  [] napi_gro_complete+0xd1/0xdc
[ 7575.802378]  [] napi_gro_flush+0x4c/0x68
[ 7575.802843]  [] ? rcu_read_unlock+0x1c/0x1e
[ 7575.803395]  [] napi_complete+0x19/0x4e
[ 7575.803856]  [] igb_poll+0x6c5/0x909
[ 7575.804317]  [] ? __lock_acquire+0x5b9/0xdce
[ 7575.804783]  [] net_rx_action+0xa3/0x1b9
[ 7575.805247]  [] ? __do_softirq+0x70/0x157
[ 7575.805712]  [] __do_softirq+0xa8/0x157
[ 7575.806176]  [] call_softirq+0x1c/0x26
[ 7575.806641]  [] do_softirq+0x38/0x83
[ 7575.807106]  [] irq_exit+0x4e/0xad
[ 7575.807569]  [] do_IRQ+0x89/0xa0
[ 7575.808032]  [] common_interrupt+0x6f/0x6f
[ 7575.808495][] ? 
retint_restore_args+0xe/0xe

[ 7575.808974]  [] ? intel_idle+0xeb/0x111
[ 7575.809443]  [] ? intel_idle+0xe4/0x111
[ 7575.809909]  [] cpuidle_enter+0x12/0x14
[ 7575.810373]  [] cpuidle_enter_state+0x10/0x39
[ 7575.810840]  [] cpuidle_idle_call+0x7e/0xa4
[ 7575.811306]  [] cpu_idle+0x58/0xa2
[ 7575.811770]  [] start_secondary+0x188/0x18d

---
Denys Fedoryshchenko, Network Engineer, Virtual ISP S.A.L.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Latest 3.6.6 are not compiling due tg3 network driver, hwmon_device_unregister

2012-11-18 Thread Denys Fedoryshchenko


On 2012-11-19 00:42, David Rientjes wrote:

On Wed, 14 Nov 2012, Nithin Nayak Sujir wrote:


On 11/14/2012 07:30 PM, David Rientjes wrote:
> On Wed, 14 Nov 2012, Nithin Nayak Sujir wrote:
>
> > This was fixed by
> >
> > commit de0a41484c47d783dd4d442914815076aa2caac2
> > Author: Paul Gortmaker 
> > Date:   Mon Oct 1 11:43:49 2012 -0400
> >
> >  tg3: unconditionally select HWMON support when tg3 is 
enabled.

> >
> Would you mind submitting this for stable by following the 
procedure

> described in Documentation/stable_kernel_rules.txt?
>

Will do. Thank you for bringing this to our attention.



Thanks for submitting the patch to stable, Greg has queued it for the
kernels he maintains.  Denys, expect to see this fix in 3.6.8.

Thank you!

---
Denys Fedoryshchenko, Network Engineer, Virtual ISP S.A.L.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: e1000e on DH55HC stalling and kernel panic in 3.6.6

2012-11-13 Thread Denys Fedoryshchenko


On 2012-11-13 21:41, Dave, Tushar N wrote:

-Original Message-
From: netdev-ow...@vger.kernel.org 
[mailto:netdev-ow...@vger.kernel.org]

On Behalf Of Denys Fedoryshchenko
Sent: Tuesday, November 13, 2012 5:59 AM
To: Kirsher, Jeffrey T; Brandeburg, Jesse; Allan, Bruce W; Wyborny,
Carolyn; Skidmore, Donald C; Rose, Gregory V; Waskiewicz Jr, Peter P;
Duyck, Alexander H; Ronciak, John; e1000-de...@lists.sourceforge.net;
net...@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: e1000e on DH55HC stalling and kernel panic in 3.6.6

Hi

I just tried to run latest kernel on my DH55HC motherboard latest 
kernel

3.6.6 and got various network problems, such as network traffic are
stopping, and sometimes i am getting kernel panic.
When traffic are stopping, ethtool -r eth0 sometimes helps.
When i do ethtool -G eth0 rx NNN , sometimes it will give kernel 
panic,

but it is hard to reproduce.

I tried to capture panic on pictures, so will try to decode on what i 
got
photo, it is a nightmare, but sadly i dont have serial in hands to 
get

data over it.

skbuff: skb_over_panic: text:f86fc769 len:25807 put:25807 
head:c1da1800

data:c1da1840 tail:0xc1da7d0f end:c1da1f40 dev:eth0 kernel BUG at
net/core/skbuff.c:127
opcode:  [#1] SMP
Pid: 0 comm: swapper/6 Not tained 3.6.6-build-0063 #23
EIP: 0060:[] EFLAGS: 00010296 CPU:6 EIP is at 
skb_put+0x83/0x8e


There is registers and stack, let me know if you need specific fields

Call trace:
f86fc769 ? e1000_clean_rx_irq+0x1e1/0x2af [e1000e]
f86fc769 e1000_clean_rx_irq+0x1e1/0x2af [e1000e]
f86fcc73 e1000e_poll+0x6a/0x209 [e1000e]
c02f1630 net_rx_action+0x90/0x15d
c01302d5 __do_softirq+0x8a/-x13b
c013024b ? local_bh_enable+0xd/0xd

c0130504 irq_exit+0x41/0x91
c0102c37 do_IRQ+0x79/0x8d

There is also more data, let me know if you need it.

Yes, please send us the full dmesg log with error.
Have you tried out-of-tree e1000e driver?

-Tushar
It is kernel panic, and screen is not good that i am unable to get 
proper photo.

Should i try to make photos and send them as pictures?
I will try to get also tomorrow USB Serial, and probably i can output 
there panic message.


I will try also out of tree e1000e tomorrow first.


---
Denys Fedoryshchenko, Network Engineer, Virtual ISP S.A.L.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Intel management, circular locking warning

2012-11-13 Thread Denys Fedoryshchenko

 [] watchdog_start+0x37/0x53
[4.361960]  [] watchdog_open+0x5c/0xa1
[4.361962]  [] misc_open+0xf5/0x14f
[4.361963]  [] chrdev_open+0x106/0x124
[4.361964]  [] ? cdev_put+0x1a/0x1a
[4.361966]  [] do_dentry_open.clone.16+0x12a/0x1c6
[4.361967]  [] finish_open+0x18/0x22
[4.361969]  [] do_last.clone.35+0x6fb/0x865
[4.361970]  [] ? inode_permission+0x3f/0x41
[4.361972]  [] path_openat+0x99/0x2c3
[4.361974]  [] do_filp_open+0x26/0x67
[4.361977]  [] ? alloc_fd+0xb7/0xc2
[4.361979]  [] do_sys_open+0x5b/0xe6
[4.361980]  [] sys_open+0x26/0x2c
[4.361981]  [] syscall_call+0x7/0xb



---
Denys Fedoryshchenko, Network Engineer, Virtual ISP S.A.L.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG/ spinlock lockup, 2.6.24

2008-02-15 Thread Denys Fedoryshchenko

This server was working fine under load under FreeBSD, and worked fine before 
with other tasks under Linux. I dont think it is RAM.
Additionally it is server hardware (Dell PowerEdge) with ECC, MCE and other 
layers, who will report about any hardware issue most probably, and i think 
even better than memtest. 
Additionally it is very difficult to run test on it, cause it is in another 
country, and i have limited access to it (i dont have network KVM).

I have similar crashes on completely different hardware with same job (QOS), 
so i think it is actually some nasty bug in networking.


On Fri, 15 Feb 2008 16:24:56 +0100, Bart Van Assche wrote
> 2008/2/15 Denys Fedoryshchenko <[EMAIL PROTECTED]>:
> >  I have random crashes, at least once per week. It is very difficult to 
catch
> >  error message, and only recently i setup netconsole. Now i got crash, but
> >  there is no traceback and only single line came over netconsole, 
mentioned
> >  before.
> 
> Did you already run memtest ? You can run memtest by booting from the
> Knoppix CD-ROM or DVD. Most Linux distributions also have included
> memtest on their bootable distribution CD's/DVD's.
> 
> Bart Van Assche.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

BUG/ spinlock lockup, 2.6.24

2008-02-15 Thread Denys Fedoryshchenko

 : yes
fpu_exception   : yes
cpuid level : 6
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm 
constant_tsc pebs bts sync_rdtsc pni monitor ds_cpl vmx cid cx16 xtpr lahf_lm
bogomips: 6383.76
clflush size: 64


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel panic on 2.6.24/iTCO_wdt not rebooting machine

2008-02-01 Thread Denys Fedoryshchenko

33.916459] iTCO_wdt: Remaining time 11
[ 2534.416688] iTCO_wdt: Remaining time 10
[ 2534.916916] iTCO_wdt: Remaining time 10
[ 2535.417144] iTCO_wdt: Remaining time 10
[ 2535.917373] iTCO_wdt: Remaining time 9
[ 2536.417602] iTCO_wdt: Remaining time 9
[ 2536.917830] iTCO_wdt: Remaining time 8
[ 2537.418059] iTCO_wdt: Remaining time 7
[ 2537.918287] iTCO_wdt: Remaining time 7
[ 2538.418516] iTCO_wdt: Remaining time 7
[ 2538.918744] iTCO_wdt: Remaining time 6
[ 2539.418973] iTCO_wdt: Remaining time 6
[ 2539.919201] iTCO_wdt: Remaining time 5
[ 2540.419431] iTCO_wdt: Remaining time 4
[ 2540.919658] iTCO_wdt: Remaining time 4
[ 2541.419888] iTCO_wdt: Remaining time 4
[ 2541.920116] iTCO_wdt: Remaining time 3
[ 2542.420345] iTCO_wdt: Remaining time 3
[ 2542.920573] iTCO_wdt: Remaining time 2
[ 2543.420802] iTCO_wdt: Remaining time 1
[ 2543.921030] iTCO_wdt: Remaining time 1
[ 2544.421259] iTCO_wdt: Remaining time 0
[ 2544.921487] iTCO_wdt: Remaining time 0
[ 2545.421716] iTCO_wdt: Remaining time 2
[ 2545.921945] iTCO_wdt: Remaining time 1
[ 2546.422173] iTCO_wdt: Remaining time 1
[ 2546.922402] iTCO_wdt: Remaining time 0
[ 2547.422631] iTCO_wdt: Remaining time 2
[ 2547.922859] iTCO_wdt: Remaining time 1
[ 2548.423088] iTCO_wdt: Remaining time 1

I tried to watch register each 100ms

[ 3525.608533] iTCO_wdt: Remaining ticks 3
[ 3525.709376] iTCO_wdt: Remaining ticks 3
[ 3525.810220] iTCO_wdt: Remaining ticks 3
[ 3525.911065] iTCO_wdt: Remaining ticks 3
[ 3526.011909] iTCO_wdt: Remaining ticks 2
[ 3526.112753] iTCO_wdt: Remaining ticks 2
[ 3526.213598] iTCO_wdt: Remaining ticks 2
[ 3526.314443] iTCO_wdt: Remaining ticks 2
[ 3526.415287] iTCO_wdt: Remaining ticks 2
[ 3526.516135] iTCO_wdt: Remaining ticks 2
[ 3526.616977] iTCO_wdt: Remaining ticks 1
[ 3526.717820] iTCO_wdt: Remaining ticks 1
[ 3526.818665] iTCO_wdt: Remaining ticks 1
[ 3526.919510] iTCO_wdt: Remaining ticks 1
[ 3527.020354] iTCO_wdt: Remaining ticks 1
[ 3527.121199] iTCO_wdt: Remaining ticks 4
[ 3527.222043] iTCO_wdt: Remaining ticks 4
[ 3527.322890] iTCO_wdt: Remaining ticks 4
[ 3527.423732] iTCO_wdt: Remaining ticks 4
[ 3527.524577] iTCO_wdt: Remaining ticks 4
[ 3527.625422] iTCO_wdt: Remaining ticks 4


Which means timer reaching 0... and, nothing happen! It goes again 2 and then 
again 0. I check even STS registers, they are still zero! Register just set 
back to default value 0004h.

Probably someone can help me with this? Or it is hardware bug of chipset?
I will try to look more docs, maybe i will be able to find whats wrong there.

On Fri, 1 Feb 2008 15:39:08 -0500, Len Brown wrote
> On Friday 01 February 2008 14:15, Denys Fedoryshchenko wrote:
> > 
> > On Fri, 1 Feb 2008 12:11:41 -0500, Len Brown wrote
> > > 
> > > What do you see if you build with CONFIG_HIGH_RES_TIMERS=n
> > > 
> > > Does it work better if you boot with "acpi=off"?
> > > if yes, how about with just pnpacpi=off?
> > > 
> > > thanks,
> > > -Len
> > 
> > It is not very easy to test. About bug - most probably it is related to 
third 
> > party ESFQ patch, i will drop it and then test more properly when i will 
be 
> > able to make watchdog work fine. But more important i notice - that 
iTCO_wdt 
> > is not working at all. I think hrtimers doesn't change anything on that.
> > About testing, i cannot take even small risk now(and near 3-5 days) by 
> > changing kernel options, i set now maximum available set of watchdogs, 
cause 
> > there is noone to maintain server, area is unreachable because of snow 
and 
> > bad weather.
> > 
> > Do you think reasonable to try acpi / pnpacpi with iTCO_wdt to make it 
work? 
> > Maybe just registers addresses or way how TCO watchdog activated changed 
on 
> > this chipset?
> 
> yes, i'm wondering if the changes in IO resource reservations
> in the PNPACPI layer are interfering with the native driver.
> 
> unfortunately, if you boot with acpi=off or pnpacpi=off, you may
> run into other, unrelated, issues (or not).
> 
> one way to isolate the problem is if you revert these two lines
> from their 2.6.24 values to their 2.6.23 values by applying this patch:
> ---
> diff --git a/include/linux/pnp.h b/include/linux/pnp.h
> index 2a6d62c..16b46aa 100644
> --- a/include/linux/pnp.h
> +++ b/include/linux/pnp.h
> @@ -13,8 +13,8 @@
>  #include 
>  #include 
> 
> -#define PNP_MAX_PORT 40
> -#define PNP_MAX_MEM  12
> +#define PNP_MAX_PORT 8
> +#define PNP_MAX_MEM  4
>  #define PNP_MAX_IRQ  2
>  #define PNP_MAX_DMA  2
>  #define PNP_NAME_LEN 50


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel panic on 2.6.24/iTCO_wdt not rebooting machine

2008-02-01 Thread Denys Fedoryshchenko

I check, watchdog still doesn't work with acpi=off, nor with pnpacpi=off
I will try to check technical documents about chipset, to find any reference 
to watchdog registers, maybe i can see there something useful.

On Fri, 1 Feb 2008 15:39:08 -0500, Len Brown wrote
> On Friday 01 February 2008 14:15, Denys Fedoryshchenko wrote:
> > 
> > On Fri, 1 Feb 2008 12:11:41 -0500, Len Brown wrote
> > > 
> > > What do you see if you build with CONFIG_HIGH_RES_TIMERS=n
> > > 
> > > Does it work better if you boot with "acpi=off"?
> > > if yes, how about with just pnpacpi=off?
> > > 
> > > thanks,
> > > -Len
> > 
> > It is not very easy to test. About bug - most probably it is related to 
third 
> > party ESFQ patch, i will drop it and then test more properly when i will 
be 
> > able to make watchdog work fine. But more important i notice - that 
iTCO_wdt 
> > is not working at all. I think hrtimers doesn't change anything on that.
> > About testing, i cannot take even small risk now(and near 3-5 days) by 
> > changing kernel options, i set now maximum available set of watchdogs, 
cause 
> > there is noone to maintain server, area is unreachable because of snow 
and 
> > bad weather.
> > 
> > Do you think reasonable to try acpi / pnpacpi with iTCO_wdt to make it 
work? 
> > Maybe just registers addresses or way how TCO watchdog activated changed 
on 
> > this chipset?
> 
> yes, i'm wondering if the changes in IO resource reservations
> in the PNPACPI layer are interfering with the native driver.
> 
> unfortunately, if you boot with acpi=off or pnpacpi=off, you may
> run into other, unrelated, issues (or not).
> 
> one way to isolate the problem is if you revert these two lines
> from their 2.6.24 values to their 2.6.23 values by applying this patch:
> ---
> diff --git a/include/linux/pnp.h b/include/linux/pnp.h
> index 2a6d62c..16b46aa 100644
> --- a/include/linux/pnp.h
> +++ b/include/linux/pnp.h
> @@ -13,8 +13,8 @@
>  #include 
>  #include 
> 
> -#define PNP_MAX_PORT 40
> -#define PNP_MAX_MEM  12
> +#define PNP_MAX_PORT 8
> +#define PNP_MAX_MEM  4
>  #define PNP_MAX_IRQ  2
>  #define PNP_MAX_DMA  2
>  #define PNP_NAME_LEN 50


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel panic on 2.6.24/iTCO_wdt not rebooting machine

2008-02-01 Thread Denys Fedoryshchenko


On Fri, 1 Feb 2008 12:11:41 -0500, Len Brown wrote
> 
> What do you see if you build with CONFIG_HIGH_RES_TIMERS=n
> 
> Does it work better if you boot with "acpi=off"?
> if yes, how about with just pnpacpi=off?
> 
> thanks,
> -Len

It is not very easy to test. About bug - most probably it is related to third 
party ESFQ patch, i will drop it and then test more properly when i will be 
able to make watchdog work fine. But more important i notice - that iTCO_wdt 
is not working at all. I think hrtimers doesn't change anything on that.
About testing, i cannot take even small risk now(and near 3-5 days) by 
changing kernel options, i set now maximum available set of watchdogs, cause 
there is noone to maintain server, area is unreachable because of snow and 
bad weather.

Do you think reasonable to try acpi / pnpacpi with iTCO_wdt to make it work? 
Maybe just registers addresses or way how TCO watchdog activated changed on 
this chipset?


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

kernel panic on 2.6.24/iTCO_wdt not rebooting machine

2008-02-01 Thread Denys Fedoryshchenko

Feb  1 09:08:50 SERVER 04
Feb  1 07:08:49 SERVER unparseable log message: "<8b> "
Feb  1 09:08:50 SERVER 59
Feb  1 09:08:50 SERVER 08
Feb  1 09:08:50 SERVER 85
Feb  1 09:08:50 SERVER db
Feb  1 09:08:50 SERVER 74
Feb  1 09:08:50 SERVER 06
Feb  1 09:08:50 SERVER 8b
Feb  1 09:08:50 SERVER 03
Feb  1 09:08:50 SERVER a8
Feb  1 09:08:50 SERVER 01
Feb  1 09:08:50 SERVER 74
Feb  1 09:08:50 SERVER 15
Feb  1 09:08:50 SERVER 8b
Feb  1 09:08:50 SERVER 41
Feb  1 09:08:50 SERVER 04
Feb  1 09:08:50 SERVER 85
Feb  1 09:08:50 SERVER c0
Feb  1 09:08:50 SERVER 0f
Feb  1 09:08:50 SERVER 84
Feb  1 09:08:50 SERVER c6
Feb  1 09:08:50 SERVER
Feb  1 09:08:50 SERVER [12380.068753] EIP: []
Feb  1 09:08:50 SERVER rb_erase+0x110/0x22f
Feb  1 09:08:50 SERVER SS:ESP 0068:c037fda8
Feb  1 09:08:50 SERVER [12380.068978] Kernel panic - not syncing: Fatal 
exception in interrupt


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc7 to 2.6.24-rc8 possible regression

2008-01-22 Thread Denys Fedoryshchenko

No, i am using vanilla kernel. It is one of production machines, and as i 
know screen is not using epoll.

I will try to apply on all my production machines this patch. Sorry if it is 
related.

On Mon, 21 Jan 2008 23:45:40 +0100, Stefan Richter wrote
> Denys Fedoryshchenko wrote:
> > After running screen found in dmesg. It was not happening before.
> > 
> > [625138.248257]
> > [625138.248260] =
> > [625138.248542] [ INFO: possible recursive locking detected ]
> > [625138.248686] 2.6.24-rc8-devel #2
> > [625138.248821] -
> > [625138.248963] screen/18164 is trying to acquire lock:
> > [625138.249101]  (&q->lock){++..}, at: [] __wake_up+0x15/0x42
> > [625138.249454]
> > [625138.249456] but task is already holding lock:
> > [625138.249724]  (&q->lock){++..}, at: [] __wake_up+0x15/0x42
> > [625138.250073]
> > [625138.250075] other info that might help us debug this:
> > [625138.250343] 2 locks held by screen/18164:
> > [625138.250477]  #0:  (&tty->atomic_read_lock){--..}, at: [] 
> > read_chan+0x18f/0x50b
> > [625138.250960]  #1:  (&q->lock){++..}, at: [] __wake_up+0x15/
0x42
> > [625138.251356]
> > [625138.251357] stack backtrace:
> > [625138.251623] Pid: 18164, comm: screen Not tainted 2.6.24-rc8-devel #2
> > [625138.251764]  [] show_trace_log_lvl+0x1a/0x2f
> > [625138.251959]  [] show_trace+0x12/0x14
> > [625138.252150]  [] dump_stack+0x6c/0x72
> > [625138.252338]  [] __lock_acquire+0x172/0xb8c
> > [625138.252533]  [] lock_acquire+0x5f/0x78
> > [625138.252725]  [] _spin_lock_irqsave+0x34/0x44
> > [625138.252920]  [] __wake_up+0x15/0x42
> > [625138.253108]  [] ep_poll_safewake+0x8e/0xbf
> > [625138.253300]  [] ep_poll_callback+0x9f/0xac
> > [625138.253491]  [] __wake_up_common+0x32/0x5c
> > [625138.253688]  [] __wake_up+0x31/0x42
> > [625138.253878]  [] tty_wakeup+0x4f/0x54
> > [625138.254070]  [] pty_unthrottle+0x15/0x21
> > [625138.254258]  [] check_unthrottle+0x2e/0x30
> > [625138.254445]  [] read_chan+0x417/0x50b
> > [625138.254633]  [] tty_read+0x66/0xac
> > [625138.254819]  [] vfs_read+0x8e/0x117
> > [625138.255004]  [] sys_read+0x3d/0x61
> > [625138.255190]  [] sysenter_past_esp+0x5f/0xa5
> > [625138.255376]  ===
> 
> Do you have Peter's lockdep annotation patch for epoll applied?
> http://lkml.org/lkml/2008/1/13/84
> -- 
> Stefan Richter
> -=-==--- ---= =-=-=
> http://arcgraph.de/sr/


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.24-rc7 to 2.6.24-rc8 possible regression

2008-01-21 Thread Denys Fedoryshchenko

After running screen found in dmesg. It was not happening before.

[625138.248257]
[625138.248260] =
[625138.248542] [ INFO: possible recursive locking detected ]
[625138.248686] 2.6.24-rc8-devel #2
[625138.248821] -
[625138.248963] screen/18164 is trying to acquire lock:
[625138.249101]  (&q->lock){++..}, at: [] __wake_up+0x15/0x42
[625138.249454]
[625138.249456] but task is already holding lock:
[625138.249724]  (&q->lock){++..}, at: [] __wake_up+0x15/0x42
[625138.250073]
[625138.250075] other info that might help us debug this:
[625138.250343] 2 locks held by screen/18164:
[625138.250477]  #0:  (&tty->atomic_read_lock){--..}, at: [] 
read_chan+0x18f/0x50b
[625138.250960]  #1:  (&q->lock){++..}, at: [] __wake_up+0x15/0x42
[625138.251356]
[625138.251357] stack backtrace:
[625138.251623] Pid: 18164, comm: screen Not tainted 2.6.24-rc8-devel #2
[625138.251764]  [] show_trace_log_lvl+0x1a/0x2f
[625138.251959]  [] show_trace+0x12/0x14
[625138.252150]  [] dump_stack+0x6c/0x72
[625138.252338]  [] __lock_acquire+0x172/0xb8c
[625138.252533]  [] lock_acquire+0x5f/0x78
[625138.252725]  [] _spin_lock_irqsave+0x34/0x44
[625138.252920]  [] __wake_up+0x15/0x42
[625138.253108]  [] ep_poll_safewake+0x8e/0xbf
[625138.253300]  [] ep_poll_callback+0x9f/0xac
[625138.253491]  [] __wake_up_common+0x32/0x5c
[625138.253688]  [] __wake_up+0x31/0x42
[625138.253878]  [] tty_wakeup+0x4f/0x54
[625138.254070]  [] pty_unthrottle+0x15/0x21
[625138.254258]  [] check_unthrottle+0x2e/0x30
[625138.254445]  [] read_chan+0x417/0x50b
[625138.254633]  [] tty_read+0x66/0xac
[625138.254819]  [] vfs_read+0x8e/0x117
[625138.255004]  [] sys_read+0x3d/0x61
[625138.255190]  [] sysenter_past_esp+0x5f/0xa5
[625138.255376]  =======



--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: bugreport kernel panic on early stage, with HIGHMEM4G:

2008-01-17 Thread Denys Fedoryshchenko

vice :00:1c.2 to 64
[   24.273079] ACPI: PCI Interrupt :00:1c.3[D] -> GSI 19 (level, low) -> 
IRQ 19
[   24.273204] PCI: Setting latency timer of device :00:1c.3 to 64
[   24.273217] ACPI: PCI Interrupt :00:1c.4[A] -> GSI 17 (level, low) -> 
IRQ 17
[   24.273343] PCI: Setting latency timer of device :00:1c.4 to 64
[   24.273351] PCI: Setting latency timer of device :00:1e.0 to 64
[   24.273370] NET: Registered protocol family 2
[   24.283667] IP route cache hash table entries: 32768 (order: 5, 131072 
bytes)
[   24.283894] TCP established hash table entries: 131072 (order: 8, 1048576 
bytes)
[   24.284300] TCP bind hash table entries: 65536 (order: 7, 786432 bytes)
[   24.284563] TCP: Hash tables configured (established 131072 bind 65536)
[   24.284634] TCP reno registered
[   24.288032] Machine check exception polling timer started.
[   24.288271] IA-32 Microcode Update Driver: v1.14a 
<[EMAIL PROTECTED]>
[   24.288744] highmem bounce pool size: 64 pages
[   24.288809] Total HugeTLB memory allocated, 0
[   24.289015] Block layer SCSI generic (bsg) driver version 0.4 loaded 
(major 254)
[   24.289115] io scheduler noop registered
[   24.289188] io scheduler cfq registered (default)
[   24.289542] Boot video device is :01:00.0
[   24.289613] PCI: Setting latency timer of device :00:01.0 to 64



On Tue, 15 Jan 2008 12:39:47 +0100, Ingo Molnar wrote
> * Denys Fedoryshchenko <[EMAIL PROTECTED]> wrote:
> 
> > Hi
> > 
> > After physical memory upgrade from 3GB to 4GB (also it happens on 5GB) 
> > got kernel panic.
> > 
> > Because it is happening on early stage and my machine doesn't contain 
> > serial port, i had to take photo. Kernel boots fine with 64GB highmem, 
> > no highmem, or highmem4G with limited memory by mem=3G. All dmesg 
> > attached. Also i attach dmidecode and lspci -vvv output, probably it 
> > will be useful.
> 
> thanks for the detailed report, i think i know what's going on. 
> Could you try the patch below, does it fix your problem?
> 
> this seems to be a SPARSEMEM bug which is present in v2.6.23 as well 
> and has probably been present ever since SPARSEMEM was added to 32-
> bit x86.
> 
> There's a ~256MB hole in your e820 memory map (the pci aperture),
>  which causes the last 4 sparsemem sections (each covering 64MB of 
> RAM) to be not present - and they are thus missing from the 
> sparsemem mem_map[] too. The highmem init code on the other hand 
> assumes that all pages are in the mem_map[]:
> 
>  static void __init set_highmem_pages_init(int bad_ppro)
>  {
> int pfn;
> for (pfn = highstart_pfn; pfn < highend_pfn; pfn++)
> add_one_highpage_init(pfn_to_page(pfn), pfn, 
> bad_ppro);
> 
> the pfn_to_page() is unconditional and dereferences to a NULL-ish 
> pointer which crashes your box. highend_pfn is what got 
> miscalculated by 256 MB, so set_highmem_pages_init() tried to 
> reference a non-existing struct page - but it should still be robust 
> enough against non-existent pages.
> 
> The patch below fixes this bug. Please also send a dmesg if you 
> manage to boot the box up fine, i've added a few debug printouts to 
> confirm this theory. (i'll figure out whether we need to clip 
> highend_pfn as well - but this patch alone should be good enough to 
> fix the crash on your box.)
> 
>   Ingo
> 
> ->
> Subject: x86: fix CONFIG_SPARSEMEM highmem init bug
> From: Ingo Molnar <[EMAIL PROTECTED]>
> 
> fix CONFIG_SPARSEMEM highmem init bug.
> 
> Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
> ---
>  arch/x86/mm/init_32.c |   43 
> --- mm/sparse.c   |  
>   8 +++- 2 files changed, 47 insertions(+), 4 deletions(-)
> 
> Index: linux/arch/x86/mm/init_32.c
> ===
> --- linux.orig/arch/x86/mm/init_32.c
> +++ linux/arch/x86/mm/init_32.c
> @@ -321,11 +321,48 @@ extern void set_highmem_pages_init(int);
>  static void __init set_highmem_pages_init(int bad_ppro)
>  {
>   int pfn;
> - for (pfn = highstart_pfn; pfn < highend_pfn; pfn++)
> - add_one_highpage_init(pfn_to_page(pfn), pfn, bad_ppro);
> +
> + printk("set_highmem_pages_init(bad_ppro:%d)\n", bad_ppro);
> + printk("sizeof(struct page):%d\n", sizeof(struct page));
> + printk("sizeof(struct mem_section): %d\n", sizeof(struct 
> mem_section)); +  printk("PFN_SECTION_SHIFT:  %d\n",
>  PFN_SECTION_SHIFT); + +  printk("mem_map: %p\n", mem_map); + 
> printk("  highstart_pfn: %9ld [page: %p]\n", +

Re: TSC && HPET calibration

2008-01-15 Thread Denys Fedoryshchenko

Latest, 2.6.24-rc7, and 2.6.23 is the same.

If more information required, tell me. It is btw not latest (not based on 
Core2) Xeon.


On Tue, 15 Jan 2008 02:17:20 -0800, Andrew Morton wrote
> On Thu, 10 Jan 2008 14:36:12 +0200 "Denys Fedoryshchenko" 
> <[EMAIL PROTECTED]> wrote:
> 
> > Hi
> > 
> > I have same issue, but it's never passed synchronization.
> > 
> > Jan 10 12:59:44 visp-1 Time: tsc clocksource has been installed.
> > Jan 10 13:41:51 visp-1 ACPI: HPET 000F29CD, 0038 (r1 DELL   
PE_SC3  1 
> > DELL1)
> > Jan 10 13:41:51 visp-1 ACPI: HPET id: 0x8086a201 base: 0xfed0
> > Jan 10 13:41:51 visp-1 hpet clockevent registered
> > Jan 10 13:41:51 visp-1 checking TSC synchronization [CPU#0 -> CPU#1]: 
passed.
> > Jan 10 13:41:51 visp-1 checking TSC synchronization [CPU#0 -> CPU#2]:
> > Jan 10 13:41:51 visp-1 Measured 1020 cycles TSC warp between CPUs, 
turning 
> > off TSC clock.
> > Jan 10 13:41:51 visp-1 Marking TSC unstable due to: check_tsc_sync_source 
> > failed.
> > Jan 10 13:41:51 visp-1 hpet0: at MMIO 0xfed0, IRQs 2, 8, 0
> > Jan 10 13:41:51 visp-1 hpet0: 3 64-bit timers, 14318180 Hz
> > Jan 10 13:41:51 visp-1 Time: hpet clocksource has been installed.
> > 
> > grep Measures
> > Sep 19 14:31:28 visp-1 Measured 1044 cycles TSC warp between CPUs, 
turning 
> > off TSC clock.
> > Sep 19 18:22:46 visp-1 Measured 996 cycles TSC warp between CPUs, turning 
off 
> > TSC clock.
> > Sep 19 18:35:44 visp-1 Measured 1080 cycles TSC warp between CPUs, 
turning 
> > off TSC clock.
> 
> Which kernel version?


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

native_flush_tlb_others very nasty crash

2008-01-14 Thread Denys Fedoryshchenko

Hi

Correction, it is appearing from 2.6.22, oldest kernel i found on server is 
2.6.22. Older kernels i didn't try, and probably will be difficult to try.
--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

native_flush_tlb_others very nasty crash

2008-01-14 Thread Denys Fedoryshchenko

.3 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express to PCI-X 
Bridge (rev 01)
06:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream 
Port E1 (rev 01)
06:01.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream 
Port E2 (rev 01)
07:00.0 PCI bridge: Broadcom EPB PCI-Express to PCI-X Bridge (rev c2)
08:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708 
Gigabit Ethernet (rev 11)
0b:00.0 PCI bridge: Intel Corporation 6702PXH PCI Express-to-PCI Bridge A 
(rev 09)
10:0d.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02)

visp-1 ~ # cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 15
model   : 6
model name  : Intel(R) Xeon(TM) CPU 3.20GHz
stepping: 4
cpu MHz : 3192.259
cache size  : 2048 KB
physical id : 0
siblings: 4
core id : 0
cpu cores   : 2
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 6
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm 
constant_tsc pebs bts sync_rdtsc pni monitor ds_cpl vmx cid cx16 xtpr lahf_lm
bogomips: 6390.26
clflush size: 64

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 15
model   : 6
model name  : Intel(R) Xeon(TM) CPU 3.20GHz
stepping: 4
cpu MHz : 3192.259
cache size  : 2048 KB
physical id : 0
siblings: 4
core id : 0
cpu cores   : 2
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 6
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm 
constant_tsc pebs bts sync_rdtsc pni monitor ds_cpl vmx cid cx16 xtpr lahf_lm
bogomips: 6383.68
clflush size: 64

processor   : 2
vendor_id   : GenuineIntel
cpu family  : 15
model   : 6
model name  : Intel(R) Xeon(TM) CPU 3.20GHz
stepping: 4
cpu MHz : 3192.259
cache size  : 2048 KB
physical id : 0
siblings: 4
core id : 1
cpu cores   : 2
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 6
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm 
constant_tsc pebs bts sync_rdtsc pni monitor ds_cpl vmx cid cx16 xtpr lahf_lm
bogomips: 6383.75
clflush size: 64

processor   : 3
vendor_id   : GenuineIntel
cpu family  : 15
model   : 6
model name  : Intel(R) Xeon(TM) CPU 3.20GHz
stepping: 4
cpu MHz : 3192.259
cache size  : 2048 KB
physical id : 0
siblings: 4
core id : 1
cpu cores   : 2
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 6
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm 
constant_tsc pebs bts sync_rdtsc pni monitor ds_cpl vmx cid cx16 xtpr lahf_lm
bogomips: 6383.77
clflush size: 64

Two FB-DIMM 512MB
Manufacturer: 0198808980C1
Serial Number: 982D0D6A
Asset Tag: 000621
Part Number: KD7538-IFA-INTC0S
another
Manufacturer: 0198808980C1
Serial Number: 9C2EA25B
Asset Tag: 000621
Part Number: KD7538-IFA-INTC0S


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: possible recursive locking, 2.6.24-rc7

2008-01-13 Thread Denys Fedoryshchenko

I cannot reproduce, it is happened with rtorrent just randomly. But i will
patch and keep watching.

On Sun, 13 Jan 2008 19:44:26 +0100, Peter Zijlstra wrote
> On Sun, 2008-01-13 at 17:22 +0100, Peter Zijlstra wrote:
> > On Sun, 2008-01-13 at 17:51 +0200, Denys Fedoryshchenko wrote:
> > > Hi, got in dmesg
> > > Not sure where to send (there is TCP), so sending netdev@ and kernel@
> > 
> > It's epoll, this is a known issue and will be fixed soon. Thanks for
> > reporting.
> 
> If its easy for you to reproduce, would you mind giving the following
> patch a spin?
> 
> ---
> 
> Subject: lockdep: annotate epoll
> 
> On Sat, 2008-01-05 at 13:35 -0800, Davide Libenzi wrote:
> 
> > I remember I talked with Arjan about this time ago. Basically, since 1) 
> > you can drop an epoll fd inside another epoll fd 2) callback-based wakeups 
> > are used, you can see a wake_up() from inside another wake_up(), but they 
> > will never refer to the same lock instance.
> > Think about:
> > 
> > dfd = socket(...);
> > efd1 = epoll_create();
> > efd2 = epoll_create();
> > epoll_ctl(efd1, EPOLL_CTL_ADD, dfd, ...);
> > epoll_ctl(efd2, EPOLL_CTL_ADD, efd1, ...);
> > 
> > When a packet arrives to the device underneath "dfd", the net code will 
> > issue a wake_up() on its poll wake list. Epoll (efd1) has installed a 
> > callback wakeup entry on that queue, and the wake_up() performed by the 
> > "dfd" net code will end up in ep_poll_callback(). At this point epoll 
> > (efd1) notices that it may have some event ready, so it needs to wake up 
> > the waiters on its poll wait list (efd2). So it calls ep_poll_safewake() 
> > that ends up in another wake_up(), after having checked about the 
> > recursion constraints. That are, no more than EP_MAX_POLLWAKE_NESTS, to 
> > avoid stack blasting. Never hit the same queue, to avoid loops like:
> > 
> > epoll_ctl(efd2, EPOLL_CTL_ADD, efd1, ...);
> > epoll_ctl(efd3, EPOLL_CTL_ADD, efd2, ...);
> > epoll_ctl(efd4, EPOLL_CTL_ADD, efd3, ...);
> > epoll_ctl(efd1, EPOLL_CTL_ADD, efd4, ...);
> > 
> > The code "if (tncur->wq == wq || ..." prevents re-entering the same 
> > queue/lock.
> 
> Since the epoll code is very careful to not nest same instance locks
> allow the recursion.
> 
> Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
> ---
>  fs/eventpoll.c   |2 +-
>  include/linux/wait.h |   16 
>  2 files changed, 17 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6/fs/eventpoll.c
> ===
> --- linux-2.6.orig/fs/eventpoll.c
> +++ linux-2.6/fs/eventpoll.c
> @@ -353,7 +353,7 @@ static void ep_poll_safewake(struct poll
>   spin_unlock_irqrestore(&psw->lock, flags);
> 
>   /* Do really wake up now */
> - wake_up(wq);
> + wake_up_nested(wq, 1 + wake_nests);
> 
>   /* Remove the current task from the list */
>   spin_lock_irqsave(&psw->lock, flags);
> Index: linux-2.6/include/linux/wait.h
> ===
> --- linux-2.6.orig/include/linux/wait.h
> +++ linux-2.6/include/linux/wait.h
> @@ -161,6 +161,22 @@ wait_queue_head_t *FASTCALL(bit_waitqueu
>  #define  wake_up_locked(x)   __wake_up_locked((x),
>  TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE) #define 
> wake_up_interruptible_sync(x)   __wake_up_sync((x),
> TASK_INTERRUPTIBLE, 1)
> 
> +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> +/*
> + * macro to avoid include hell
> + */
> +#define wake_up_nested(x, s) \
> +do { \
> + unsigned long flags;\
> + \
> + spin_lock_irqsave_nested(&(x)->lock, flags, (s));   \
> + wake_up_locked(x);      \
> + spin_unlock_irqrestore(&(x)->lock, flags);  \
> +} while (0)
> +#else
> +#define wake_up_nested(x, s) wake_up(x)
> +#endif
> +
>  #define __wait_event(wq, condition)  \
>  do { \
>   DEFINE_WAIT(__wait);\


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

possible recursive locking, 2.6.24-rc7

2008-01-13 Thread Denys Fedoryshchenko

Hi, got in dmesg
Not sure where to send (there is TCP), so sending netdev@ and kernel@


[159859.491752]
[159859.491755] =
[159859.492021] [ INFO: possible recursive locking detected ]
[159859.492156] 2.6.24-rc7-devel #2
[159859.492284] -
[159859.492418] swapper/0 is trying to acquire lock:
[159859.492550]  (&q->lock){++..}, at: [] __wake_up+0x15/0x42
[159859.492883]
[159859.492884] but task is already holding lock:
[159859.493140]  (&q->lock){++..}, at: [] __wake_up+0x15/0x42
[159859.493466]
[159859.493467] other info that might help us debug this:
[159859.493726] 5 locks held by swapper/0:
[159859.495687]  #0:  (rcu_read_lock){..--}, at: []
netif_receive_skb+   
 0x9c/0x3a7
[159859.496141]  #1:  (rcu_read_lock){..--}, at: []
ip_local_deliver_f   
 inish+0x30/0x18d
[159859.496604]  #2:  (slock-AF_INET/1){-+..}, at: []
tcp_v4_rcv+0x426 
   /0x812
[159859.497104]  #3:  (clock-AF_INET){-.-?}, at: []
sock_def_readable+   
 0x18/0x6e
[159859.497555]  #4:  (&q->lock){++..}, at: [] __wake_up+0x15/0x42
[159859.497931]
[159859.497932] stack backtrace:
[159859.498185] Pid: 0, comm: swapper Not tainted 2.6.24-rc7-devel #2
[159859.498320]  [] show_trace_log_lvl+0x1a/0x2f
[159859.498505]  [] show_trace+0x12/0x14
[159859.498690]  [] dump_stack+0x6c/0x72
[159859.498872]  [] __lock_acquire+0x172/0xb8c
[159859.499057]  [] lock_acquire+0x5f/0x78
[159859.499239]  [] _spin_lock_irqsave+0x34/0x44
[159859.499423]  [] __wake_up+0x15/0x42
[159859.499604]  [] ep_poll_safewake+0x8e/0xbf
[159859.499787]  [] ep_poll_callback+0x9f/0xac
[159859.499970]  [] __wake_up_common+0x32/0x5c
[159859.500154]  [] __wake_up+0x31/0x42
[159859.500335]  [] sock_def_readable+0x42/0x6e
[159859.500518]  [] tcp_rcv_established+0x3bc/0x643
[159859.500704]  [] tcp_v4_do_rcv+0x2f/0x325
[159859.500887]  [] tcp_v4_rcv+0x7c9/0x812
[159859.501069]  [] ip_local_deliver_finish+0x107/0x18d
[159859.501255]  [] ip_local_deliver+0x72/0x7c
[159859.501438]  [] ip_rcv_finish+0x2cf/0x2ee
[159859.501623]  [] ip_rcv+0x211/0x23b
[159859.501805]  [] netif_receive_skb+0x350/0x3a7
[159859.501989]  [] bnx2_poll+0x975/0xb45 [bnx2]
[159859.502177]  [] net_rx_action+0x6c/0x116
[159859.502360]  [] __do_softirq+0x6f/0xe9
[159859.502543]  [] do_softirq+0x3a/0x52
[159859.502728]  [] irq_exit+0x47/0x7b
[159859.502911]  [] do_IRQ+0x81/0x96
[159859.503098]  [] common_interrupt+0x2e/0x34
[159859.503288]  [] mwait_idle+0x12/0x14
[159859.503476]  [] cpu_idle+0x7b/0x95
[159859.503662]  [] rest_init+0x49/0x4b
[159859.503844]  [] start_kernel+0x2f9/0x301
[159859.504030]  [<>] 0x0
[159859.504210]  =======


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

TSC && HPET calibration

2008-01-10 Thread Denys Fedoryshchenko

ce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm 
constant_tsc pebs bts sync_rdtsc pni monitor ds_cpl vmx cid cx16 xtpr lahf_lm
bogomips: 6388.02
clflush size: 64

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 15
model   : 6
model name  : Intel(R) Xeon(TM) CPU 3.20GHz
stepping: 4
cpu MHz : 3192.070
cache size  : 2048 KB
physical id : 0
siblings: 4
core id : 0
cpu cores   : 2
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 6
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm 
constant_tsc pebs bts sync_rdtsc pni monitor ds_cpl vmx cid cx16 xtpr lahf_lm
bogomips: 6383.74
clflush size: 64

processor   : 2
vendor_id   : GenuineIntel
cpu family  : 15
model   : 6
model name  : Intel(R) Xeon(TM) CPU 3.20GHz
stepping: 4
cpu MHz : 3192.070
cache size  : 2048 KB
physical id : 0
siblings: 4
core id : 1
cpu cores   : 2
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 6
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm 
constant_tsc pebs bts sync_rdtsc pni monitor ds_cpl vmx cid cx16 xtpr lahf_lm
bogomips: 6383.75
clflush size: 64

processor   : 3
vendor_id   : GenuineIntel
cpu family  : 15
model   : 6
model name  : Intel(R) Xeon(TM) CPU 3.20GHz
stepping: 4
cpu MHz : 3192.070
cache size  : 2048 KB
physical id : 0
siblings: 4
core id : 1
cpu cores   : 2
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 6
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm 
constant_tsc pebs bts sync_rdtsc pni monitor ds_cpl vmx cid cx16 xtpr lahf_lm
bogomips: 6383.74
clflush size    : 64


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

bugreport kernel panic on early stage, with HIGHMEM4G:

2008-01-06 Thread Denys Fedoryshchenko

Hi

After physical memory upgrade from 3GB to 4GB (also it happens on 5GB) got 
kernel panic.

Because it is happening on early stage and my machine doesn't contain serial 
port, i had to take photo.
Kernel boots fine with 64GB highmem, no highmem, or highmem4G with limited 
memory by mem=3G. All dmesg attached.
Also i attach dmidecode and lspci -vvv output, probably it will be useful.


Photo (2.8MB, sorry, just original size from camera):
http://www.nuclearcat.com/files/panic-07012008/img_1232.jpg

dmesg without highmem
http://www.nuclearcat.com/files/panic-07012008/dmesg-nohighmem.txt

with highmem64G
http://www.nuclearcat.com/files/panic-07012008/dmesg-highmem64G.txt

with highmem4G limited by mem=3G
http://www.nuclearcat.com/files/panic-07012008/dmesg-highmem4G-memlim3G.txt
Kernel config for this specific boot:
http://www.nuclearcat.com/files/panic-07012008/config.txt

dmidecode output
http://www.nuclearcat.com/files/panic-07012008/dmidecode.txt

lspci output
http://www.nuclearcat.com/files/panic-07012008/lspci.txt

--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

nmi_watchdog killing tickless feature

2007-12-24 Thread Denys Fedoryshchenko

Hi

Please CC me on reply, i am not subscribed to list.

I did small test, and notice that if nmi_watchdog is enabled
mpstat 1

06:11:00 CPU   %user   %nice%sys %iowait%irq   %soft  %steal  
%idleintr/s
06:11:01 all0.000.000.000.000.000.000.00 
100.00993.07
06:11:02 all0.000.000.000.000.000.000.00 
100.00   1007.00
06:11:03 all0.000.000.000.000.000.000.00 
100.00   1005.00

if disabled
06:13:52 CPU   %user   %nice%sys %iowait%irq   %soft  %steal  
%idleintr/s
06:13:53 all0.000.000.000.000.000.000.00 
100.00  1.00
06:13:54 all0.000.000.000.000.000.000.00 
100.00  9.00
06:13:55 all0.000.000.000.000.000.000.00 
100.00  4.00
06:13:56 all0.000.000.000.000.000.000.00 
100.00  2.00
06:13:57 all0.000.000.000.000.000.000.00 
100.00  2.00

The difference is huge, probably in power consumption too. Kernel is
relatively standard (2.6.24-rc6-git1), if required i can attach .config
If it is not bug, probably good to document, that it is killing powersaving
features?

--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SC1200 failure in 2.6.23 and 2.6.24-rc1-git10

2007-11-08 Thread Denys Fedoryshchenko

Thanks, it works like that.

Seems in libata there is no fall-back to non-DMA mode, if DMA didn't work.

On Thu, 8 Nov 2007 12:31:39 -0500, Jeff Garzik wrote
> On Thu, Nov 08, 2007 at 06:44:31PM +0200, Denys Fedoryshchenko wrote:
> > Doesn't help
> > 
> > WRAP ~ #cat /proc/cmdline
> > console=ttyS0,38400n8 libata.dma_mask=3
> 
> It's "libata.dma" if its built into the kernel, or 'dma' module 
> option if built as a kernel module.
> 
>   Jeff


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SC1200 failure in 2.6.23 and 2.6.24-rc1-git10

2007-11-08 Thread Denys Fedoryshchenko

1195]  res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 
(timeout)
[   11.627411] ata1.00: status: { DRDY }
[   11.638503] ata1: soft resetting link
[   11.652281] ata1.00: configured for MWDMA1
[   11.664726] ata1: EH complete
[   11.864037] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 
frozen
[   11.885338] ata1.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 cdb 0x0 
data 4096 in
[   11.885382]  res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 
(timeout)
[   11.931623] ata1.00: status: { DRDY }
[   11.942712] ata1: soft resetting link
[   11.956488] ata1.00: configured for MWDMA1
[   11.968934] ata1: EH complete
[   12.168151] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 
frozen
[   12.189442] ata1.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 cdb 0x0 
data 4096 in
[   12.189485]  res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 
(timeout)
[   12.235701] ata1.00: status: { DRDY }
[   12.246790] ata1: soft resetting link
[   12.260575] ata1.00: configured for MWDMA1
[   12.273015] ata1: EH complete
[   12.472358] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 
frozen
[   12.493668] ata1.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 cdb 0x0 
data 4096 in
[   12.493712]  res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 
(timeout)
[   12.539929] ata1.00: status: { DRDY }
[   12.551019] ata1: soft resetting link
[   12.564802] ata1.00: configured for MWDMA1
[   12.577247] sd 0:0:0:0: [sda] Result: hostbyte=0x00 driverbyte=0x08
[   12.596225] sd 0:0:0:0: [sda] Sense Key : 0xb [current] [descriptor]
[   12.615694] Descriptor sense data with sense descriptors (in hex):
[   12.634324] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[   12.654472] 00 00 00 00
[   12.664516] sd 0:0:0:0: [sda] ASC=0x0 ASCQ=0x0
[   12.678039] end_request: I/O error, dev sda, sector 0
[   12.693273] Buffer I/O error on device sda, logical block 0
[   12.710123] ata1: EH complete
[   12.719187]  unable to read partition table
[   12.732602] sd 0:0:0:0: [sda] Attached SCSI removable disk



On Thu, 8 Nov 2007 10:48:13 +, Alan Cox wrote
> On Thu, 8 Nov 2007 09:16:35 +0200
> "Denys Fedoryshchenko" <[EMAIL PROTECTED]> wrote:
> 
> > Does it work as kernel parameter?
> > 
> > I tried libata_dma_mask=0x4 and to set 0xf or 0xff - doesn't help. How to 
> > disable DMA in libata, if it is compiled in kernel?
> 
> libata.dma_mask=3
> 
> will leave you with CD and disk DMA but not CF DMA
> 
> (Note libata[DOT] not underscore)


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SC1200 failure in 2.6.23 and 2.6.24-rc1-git10

2007-11-07 Thread Denys Fedoryshchenko

Does it work as kernel parameter?

I tried libata_dma_mask=0x4 and to set 0xf or 0xff - doesn't help. How to 
disable DMA in libata, if it is compiled in kernel?

On Thu, 8 Nov 2007 01:30:53 +0100, Bartlomiej Zolnierkiewicz wrote
> On Thursday 08 November 2007, Denys Fedoryshchenko wrote:
> > 2.6.24-rc2 not working very well
> > 
> > 
> > dmesg
> > [   12.386395] Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
> > [   12.405579] ide: Assuming 33MHz system bus speed for PIO modes; 
override 
> > with idebus=xx
> > [   12.430441] SC1200: IDE controller (0x100b:0x0502 rev 0x01) at  PCI 
slot 
> > :00:12.2
> > [   12.454070] SC1200: not 100% native mode: will probe irqs later
> > [   12.471947] ide0: BM-DMA at 0xfc00-0xfc07, BIOS settings: hda:pio, 
> > hdb:pio
> > [   12.493873] ide1: BM-DMA at 0xfc08-0xfc0f, BIOS settings: hdc:pio, 
> > hdd:pio
> > [   12.515810] Probing IDE interface ide0...
> > [   12.528810] Clocksource tsc unstable (delta = -497423729 ns)
> > [   12.545888] Time: pit clocksource has been installed.
> > [   12.563379] hda: SanDisk SDCFH-1024, CFA DISK drive
> > [   12.578340] hda: applying conservative PIO "downgrade"
> > [   12.593869] hda: host max PIO4 wanted PIO255(auto-tune) selected PIO1
> > [   12.594006] hda: MW DMA 2 mode selected
> > [   12.594297] ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
> > [   12.608778] Probing IDE interface ide1...
> > [   12.623192] hda: max request size: 128KiB
> > [   12.635322] hda: 2001888 sectors (1024 MB) w/1KiB Cache, CHS=1986/16/
63, 
> > DMA
> > [   12.657134]  hda:<4>hda: dma_timer_expiry: dma status == 0x21
> > [   12.865846] hda: DMA timeout error
> > [   12.876092]  ide_dma_end dma_stat=21 err=1 newerr=0
> > [   12.890753] hda: dma timeout error: status=0x58 { DriveReady 
SeekComplete 
> > DataRequest }
> > [   12.914977] ide: failed opcode was: unknown
> > [   12.927743] hda: DMA disabled
> > [   12.937035] ide0: reset: success
> > [   12.948324]  hda1
> > 
> > Mounting taking long time on 1GB card cause of DMA issues. In dmesg i am 
not 
> > sure about timestamp showing few seconds, in real life it took about 2 
> > minutes.
> 
> Please try booting with "hda=nodma".
> 
> It could be a hardware problem (CF adapter without DMA lines).
> 
> Thanks,
> Bart


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SC1200 failure in 2.6.23 and 2.6.24-rc1-git10

2007-11-07 Thread Denys Fedoryshchenko

You are right, seems no dma lines in adapter. hda=nodma helped, no errors 
anymore. I will try now also libata_dma_mask and will mail result. Btw there 
is no notes in Documentation/kernel-parameters.txt about it.

In any case it is complete board, WRAP.2C made by PCEngines in 2003. Kind of 
popular and mass produced, before was widely used by StarOS, probably known 
GPL violator, who didn't bother himself to supply patches, but at same time 
used it in his projects.

If it is valid for all board with this revision, maybe it is better to put it 
in some kind of fixup/quirk/black list, or how it is called?

On Wed, 07 Nov 2007 19:41:15 -0600, Robert Hancock wrote
> Denys wrote:
> > Finally i got full DMESG with 1GB card till end. Seems not readable too.
> >
> 
> ...
> 
> > ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> > ata1.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 
in
> >  res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> > ata1.00: status: { DRDY }
> > ata1: soft resetting link
> > ata1.00: configured for MWDMA1
> > sd 0:0:0:0: [sda] Result: hostbyte=0x00 driverbyte=0x08
> > sd 0:0:0:0: [sda] Sense Key : 0xb [current] [descriptor]
> > Descriptor sense data with sense descriptors (in hex):
> > 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
> > 00 00 00 00
> > sd 0:0:0:0: [sda] ASC=0x0 ASCQ=0x0
> > end_request: I/O error, dev sda, sector 0
> > Buffer I/O error on device sda, logical block 0
> > ata1: EH complete
> 
> I'm guessing that your CF-to-IDE adapter doesn't have the correct 
> lines wired up for DMA to work properly, and the card indicates DMA 
> support, which libata tries to use but which doesn't work. It looks 
> like it never tried falling back to PIO after DMA failed. Seems like 
> a deficiency in the speed-down logic?
> 
> -- 
> Robert Hancock  Saskatoon, SK, Canada
> To email, remove "nospam" from [EMAIL PROTECTED]
> Home Page: http://www.roberthancock.com/


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SC1200 failure in 2.6.23 and 2.6.24-rc1-git10

2007-11-07 Thread Denys Fedoryshchenko

2.6.24-rc2 not working very well


dmesg
[   12.386395] Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
[   12.405579] ide: Assuming 33MHz system bus speed for PIO modes; override 
with idebus=xx
[   12.430441] SC1200: IDE controller (0x100b:0x0502 rev 0x01) at  PCI slot 
:00:12.2
[   12.454070] SC1200: not 100% native mode: will probe irqs later
[   12.471947] ide0: BM-DMA at 0xfc00-0xfc07, BIOS settings: hda:pio, 
hdb:pio
[   12.493873] ide1: BM-DMA at 0xfc08-0xfc0f, BIOS settings: hdc:pio, 
hdd:pio
[   12.515810] Probing IDE interface ide0...
[   12.528810] Clocksource tsc unstable (delta = -497423729 ns)
[   12.545888] Time: pit clocksource has been installed.
[   12.563379] hda: SanDisk SDCFH-1024, CFA DISK drive
[   12.578340] hda: applying conservative PIO "downgrade"
[   12.593869] hda: host max PIO4 wanted PIO255(auto-tune) selected PIO1
[   12.594006] hda: MW DMA 2 mode selected
[   12.594297] ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
[   12.608778] Probing IDE interface ide1...
[   12.623192] hda: max request size: 128KiB
[   12.635322] hda: 2001888 sectors (1024 MB) w/1KiB Cache, CHS=1986/16/63, 
DMA
[   12.657134]  hda:<4>hda: dma_timer_expiry: dma status == 0x21
[   12.865846] hda: DMA timeout error
[   12.876092]  ide_dma_end dma_stat=21 err=1 newerr=0
[   12.890753] hda: dma timeout error: status=0x58 { DriveReady SeekComplete 
DataRequest }
[   12.914977] ide: failed opcode was: unknown
[   12.927743] hda: DMA disabled
[   12.937035] ide0: reset: success
[   12.948324]  hda1

Mounting taking long time on 1GB card cause of DMA issues. In dmesg i am not 
sure about timestamp showing few seconds, in real life it took about 2 
minutes.

after that in dmesg
[   14.965070] hda: dma_timer_expiry: dma status == 0x21
[   15.107909] hda: DMA timeout error
[   15.118149]  ide_dma_end dma_stat=21 err=1 newerr=0
[   15.132809] hda: dma timeout error: status=0x58 { DriveReady SeekComplete 
DataRequest }
[   15.157035] ide: failed opcode was: unknown
[   15.169799] hda: DMA disabled
[   15.178797] ide0: reset: success
[   15.312698] hda: dma_timer_expiry: dma status == 0x21
[   15.650705] hda: DMA timeout error
[   15.660952]  ide_dma_end dma_stat=21 err=1 newerr=0
[   15.675614] hda: dma timeout error: status=0x58 { DriveReady SeekComplete 
DataRequest }
[   15.699836] ide: failed opcode was: unknown
[   15.712601] hda: DMA disabled
[   15.721603] ide0: reset: success
[   16.325999] hda: dma_timer_expiry: dma status == 0x21
[   16.565756] hda: DMA timeout error
[   16.576001]  ide_dma_end dma_stat=21 err=1 newerr=0
[   16.590661] hda: dma timeout error: status=0x58 { DriveReady SeekComplete 
DataRequest }
[   16.614886] ide: failed opcode was: unknown
[   16.627651] hda: DMA disabled
[   16.636659] ide0: reset: success
[   16.650061] EXT2-fs warning: mounting unchecked fs, running e2fsck is 
recommended


On Wed, 7 Nov 2007 18:20:45 -0500, Jeff Garzik wrote
> On Wed, Nov 07, 2007 at 02:12:55PM -0500, Mark Lord wrote:
> > That cannot be correct (??).  Is this with hdparm-7.7 (latest 
sourceforge) 
> > ??
> > Can you show us the "hdparm --Istdout" output as well, please.
> 
> If this is applicable...  FWIW hdparm was only recently (in past <72
> hours) updated from 6.9 to 7.7 in Fedora...
> 
>   Jeff


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SC1200 failure in 2.6.23 and 2.6.24-rc1-git10

2007-11-07 Thread Denys Fedoryshchenko

I am using Gentoo (and it is custom build of linux, actually only busybox + 
kernel + uclibc and few other tools), hdparm is vanilla 7.7

I will try to compile now -rc2 to see if there any changes.

With 16MB 2.6.24-rc1 works fine, 1GB working also with some errors in dmesg. 

And IF that all is important, cause it is relatively old hardware and 
probably if it is only this hardware-specific bug, it is enough to issue 
workaround just to be able to use it. I dont think so someone using them now 
much, but IMHO things must work in kernel if they are there.

On Wed, 7 Nov 2007 18:20:45 -0500, Jeff Garzik wrote
> On Wed, Nov 07, 2007 at 02:12:55PM -0500, Mark Lord wrote:
> > That cannot be correct (??).  Is this with hdparm-7.7 (latest 
sourceforge) 
> > ??
> > Can you show us the "hdparm --Istdout" output as well, please.
> 
> If this is applicable...  FWIW hdparm was only recently (in past <72
> hours) updated from 6.9 to 7.7 in Fedora...
> 
>   Jeff

--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SC1200 failure in 2.6.23 and 2.6.24-rc1-git10

2007-11-07 Thread Denys Fedoryshchenko

4
4346 482d 3130 3234 2020 2020 2020 2020
2020 2020 2020 2020 2020 2020 2020 0004
 0300  0200  0003 07c2 0010
003f 8be0 001e 0100 8be0 001e  0007
0003 0078 0078 0078 0078   
       
0010   4004 4000 0020 0004 4000
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       



On Wed, 07 Nov 2007 14:12:55 -0500, Mark Lord wrote
> > WRAP ~ #./hdparm -I /dev/hda
> > 
> > /dev/hda:
> > 
> > ATAPI Write-once device, with non-removable media
> > Model Number:   SanDisk SDP3B-16
> > Serial Number:  24313671615
> > Firmware Revision:  vdd 1.00
> > Standards:
> > Likely used: 3
> > Configuration:
> > DRQ response: 50us.
> > Packet size: Unknown
> > Capabilities:
> > LBA, IORDY(may be)(cannot be disabled)
> > Buffer size: 1.0kB  bytes avail on r/w long: 4
> > DMA: not supported
> > PIO: pio0 pio1
> 
> "ATAPI Write-once device"  ???
> 
> That cannot be correct (??).  Is this with hdparm-7.7 (latest 
> sourceforge) ?? Can you show us the "hdparm --Istdout" output as 
> well, please.
> 
> thanks.


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SC1200 failure in 2.6.23 and 2.6.24-rc1-git10

2007-11-07 Thread Denys Fedoryshchenko

On Wed, 07 Nov 2007 14:12:55 -0500, Mark Lord wrote
> > WRAP ~ #./hdparm -I /dev/hda
> > 
> > /dev/hda:
> > 
> > ATAPI Write-once device, with non-removable media
> > Model Number:   SanDisk SDP3B-16
> > Serial Number:  24313671615
> > Firmware Revision:  vdd 1.00
> > Standards:
> > Likely used: 3
> > Configuration:
> > DRQ response: 50us.
> > Packet size: Unknown
> > Capabilities:
> > LBA, IORDY(may be)(cannot be disabled)
> > Buffer size: 1.0kB  bytes avail on r/w long: 4
> > DMA: not supported
> > PIO: pio0 pio1
> 
> "ATAPI Write-once device"  ???
> 
> That cannot be correct (??).  Is this with hdparm-7.7 (latest 
> sourceforge) ?? Can you show us the "hdparm --Istdout" output as 
> well, please.
> 
> thanks.
Yes latest hdparm-7.7. But maybe it is related to git kernel? Cause lines
[8.658485] hda: applying conservative PIO "downgrade"
[8.674009] hda: host max PIO4 wanted PIO255(auto-tune) selected PIO0
[8.674264] hda: set_drive_speed_status: status=0x51 { DriveReady SeekComplet
e Error }
[8.698231] hda: set_drive_speed_status: error=0x04 { DriveStatusError }
[8.718590] hda: applying conservative PIO "downgrade"
[8.734085] hda: host max PIO4 wanted PIO255(auto-tune) selected PIO0
[8.734337] hda: set_drive_speed_status: status=0x51 { DriveReady SeekComplet
e Error }
[8.758315] hda: set_drive_speed_status: error=0x04 { DriveStatusError }

appeared only in those kernels (2.6.24-git/rc)


Anyways:

WRAP ~ #./hdparm --Istdout /dev/hda

/dev/hda:
844a 01ea  0002  0240 0020 
7a80  2020 2020 2020 2020 2032 3433
3133 3637 3136 3135 0002 0002 0004 7664
6420 312e 3030 5361 6e44 6973 6b20 5344
5033 422d 3136 2020 2020 2020 2020 2020
2020 2020 2020 2020 2020 2020 2020 0001
 0200  0100  0001 01ea 0002
0020 7a80  0100 7a80   
       
       
       
       
       
       
       
       
       2020
2020 2020 2020 2020 2020 2032 3433 3133
3637 3136 3135     
       
       
       
       
       
       
       
       
       
0000       
       
       
       




--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SC1200 failure in 2.6.23 and 2.6.24-rc1-git10

2007-11-07 Thread Denys Fedoryshchenko

On Thu, 08 Nov 2007 00:23:10 +0900, James Andrewartha wrote
> Denys Fedoryshchenko wrote:
> > On Tue, 6 Nov 2007 22:15:21 -0800, Andrew Morton wrote
> >>> On Thu, 1 Nov 2007 23:30:13 +0200 "Denys" <[EMAIL PROTECTED]> wrote:
> >>> Finally i got full DMESG with 1GB card till end. Seems not readable too.
> >>> scsi0 : sc1200
> >>> scsi1 : sc1200
> >>> ata1: PATA max UDMA/33 cmd 0x1f0 ctl 0x3f6 bmdma 0xfc00 irq 14
> >>> ata2: DUMMY
> >>> ata1.00: CFA: SanDisk SDCFH-1024, HDX 3.07, max MWDMA2
> >>> ata1.00: 2001888 sectors, multi 0: LBA
> >>> ata1.00: configured for MWDMA2
> >>> scsi 0:0:0:0: Direct-Access ATA  SanDisk SDCFH-10 HDX  PQ: 0 
> > ANSI: 5
> >>> sd 0:0:0:0: [sda] 2001888 512-byte hardware sectors (1025 MB)
> >>> sd 0:0:0:0: [sda] Write Protect is off
> >>> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't 
> > support 
> >>> DPO or FUA
> >>> sd 0:0:0:0: [sda] 2001888 512-byte hardware sectors (1025 MB)
> >>> sd 0:0:0:0: [sda] Write Protect is off
> >>> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't 
> > support 
> >>> DPO or FUA
> >>>  sda:<4>Clocksource tsc unstable (delta = -334501841 ns)
> >>> Time: pit clocksource has been installed.
> >>> ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> >>> ata1.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 cdb 0x0 data 
4096 
> > in
> >>>  res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> >>> ata1.00: status: { DRDY }
> >>> ata1: soft resetting link
> >>> ata1.00: configured for MWDMA2
> >>> ata1: EH complete
> >>> ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> >>> ata1.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 cdb 0x0 data 
4096 
> > in
> >>>  res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> >>> ata1.00: status: { DRDY }
> >>> ata1: soft resetting link
> >>> ata1.00: configured for MWDMA2
> >>> ata1: EH complete
> >>> ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> >>> ata1.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 cdb 0x0 data 
4096 
> > in
> >>>  res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> >>> ata1.00: status: { DRDY }
> >>> ata1: soft resetting link
> >>> ata1.00: configured for MWDMA2
> >>> ata1: EH complete
> >>> ata1.00: limiting speed to MWDMA1:PIO4
> >>> ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> >>> ata1.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 cdb 0x0 data 
4096 
> > in
> >>>  res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> >>> ata1.00: status: { DRDY }
> >>> ata1: soft resetting link
> >>> ata1.00: configured for MWDMA1
> >>> ata1: EH complete
> >>> ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> >>> ata1.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 cdb 0x0 data 
4096 
> > in
> >>>  res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> >>> ata1.00: status: { DRDY }
> >>> ata1: soft resetting link
> >>> ata1.00: configured for MWDMA1
> >>> ata1: EH complete
> >>> ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> >>> ata1.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 cdb 0x0 data 
4096 
> > in
> >>>  res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> >>> ata1.00: status: { DRDY }
> >>> ata1: soft resetting link
> >>> ata1.00: configured for MWDMA1
> >>> sd 0:0:0:0: [sda] Result: hostbyte=0x00 driverbyte=0x08
> >>> sd 0:0:0:0: [sda] Sense Key : 0xb [current] [descriptor]
> >>> Descriptor sense data with sense descriptors (in hex):
> >>> 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
> >>> 00 00 00 00
> >>> sd 0:0:0:0: [sda] ASC=0x0 ASCQ=0x0
> >>> end_request: I/O error, dev sda, sector 0
> >>> Buffer I/O error on device sda, logical block 0
> >>> ata1: EH complete
> >>> ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> >>> ata1.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 cdb 0x0 data 
4096 
> > in
> >>>  res 40

Re: SC1200 failure in 2.6.23 and 2.6.24-rc1-git10

2007-11-07 Thread Denys Fedoryshchenko

0:0: [sda] Result: hostbyte=0x00 driverbyte=0x08
> > sd 0:0:0:0: [sda] Sense Key : 0xb [current] [descriptor]
> > Descriptor sense data with sense descriptors (in hex):
> > 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
> > 00 00 00 00
> > sd 0:0:0:0: [sda] ASC=0x0 ASCQ=0x0
> > end_request: I/O error, dev sda, sector 0
> > Buffer I/O error on device sda, logical block 0
> > ata1: EH complete
> >  unable to read partition table
> > sd 0:0:0:0: [sda] Attached SCSI removable disk
> > scx200_wdt: timer margin 60 seconds
> > cpuidle: using governor ladder
> > enabling scx200 high-res timer (1 MHz +0 ppm)
> > TCP cubic registered
> > Time: scx200_hrt clocksource has been installed.
> > NET: Registered protocol family 1
> > NET: Registered protocol family 17
> > Using IPI Shortcut mode
> > VFS: Cannot open root device "" or unknown-block(8,18)
> > Please append a correct "root=" boot option; here are the available 
> > partitions:
> > 08001000944 sda driver: sd
> > Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-
> > block(8,18)
> >
> 
> (+linux-ide)
> 
> So this has never worked on any known kernel?

It is working now on old ide code (not libata).

--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

55 matches

Mail list logo