Re: 4.19.4 nf_conntrack_count kernel panic

2018-11-28 Thread Sami Farin
On Mon, Nov 26, 2018 at 22:14:48 +0200, Denys Fedoryshchenko wrote:
> On 2018-11-26 21:46, Sami Farin wrote:
> > 4.18.20 works OK, but unfortunately 4.18 series is EOL.
...
> Check this patches:
> https://patchwork.ozlabs.org/project/netfilter-devel/list/?series=73972&state=*
> 
> Relevant discussion:
> https://marc.info/?l=linux-netdev&m=154211826106430&w=2

I applied those three patches and 4.19.5 does not crash now the same
way.  BUT I got this when I ran iptables-restore (could not reproduce):

RIP: 0010:rb_erase+0xce/0x380
Code: 01 48 83 e0 fc 0f 84 cf 01 00 00 48 3b 78 10 0f 84 1d 02 00 00 48 89 48 
08 c3 48 8b 17 48 89 d0 48 83 e0 fc 0f 84 d1 01 00 00 <48> 3b 78 10 0f 84 aa 01 
00 00 4c 89 40 08 4d 85 c0 0f 85 c6 01 00
RSP: 0018:bfc10c643cf8 EFLAGS: 00010282
RAX: b488144c79e96970 RBX: a1bf63320e10 RCX: 
RDX: b488144c79e96970 RSI: a1c224983010 RDI: a1bf63320e10
RBP: a1ba766fa210 R08:  R09: a1bea0368bd0
R10:  R11: a1c140038000 R12: a1c224983010
R13: a1c224983808 R14: a1c224983000 R15: a1bf63320e60
FS:  7ff302a1b900() GS:a1c23e88() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 562e3b407000 CR3: 00027666e000 CR4: 003406e0
Call Trace:
nf_conncount_destroy+0x58/0xc0 [nf_conncount]
cleanup_match+0x45/0x70 [ip_tables]
cleanup_entry+0x3e/0xc0 [ip_tables]
do_ipt_set_ctl+0x450/0x4ed [ip_tables]
nf_setsockopt+0x44/0x70
__sys_setsockopt+0x82/0xe0
__x64_sys_setsockopt+0x20/0x30
do_syscall_64+0x6f/0x353
? trace_hardirqs_off_thunk+0x1a/0x1c
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7ff301a1740a
Code: ff ff ff c3 48 8b 15 95 da 2b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb 
b1 0f 1f 80 00 00 00 00 49 89 ca b8 36 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 
c3 48 8b 0d 66 da 2b 00 f7 d8 64 89 01 48
RSP: 002b:7ffd76e9e9a8 EFLAGS: 0206 ORIG_RAX: 0036
RAX: ffda RBX: 562e3b3bece8 RCX: 7ff301a1740a
RDX: 0040 RSI:  RDI: 0004
RBP: 562e3b3f4850 R08: 00013830 R09: 
R10: 562e3b3f4850 R11: 0206 R12: 562e3b3f48b0
R13: 562e3b3bece8 R14: 000137d0 R15: 562e3b3bece0
Modules linked in: act_skbedit cls_u32 algif_hash sch_cake arptable_filter 
arp_tables cls_fw sch_fq nfnetlink_acct ip6table_mangle nf_log_ipv6 xt_hl 
ip6t_REJECT nf_reject_ipv6 ip6t_rt ip6table_filter ip6_tables ipt_MASQUERADE 
iptable_nat nf_nat_ipv4 nf_nat iptable_raw xt_mark xt_connmark iptable_mangle 
nf_log_ipv4 nf_log_common xt_LOG xt_length xt_limit ipt_REJECT nf_reject_ipv4 
xt_connlimit nf_conncount xt_multiport xt_hashlimit xt_owner xt_set 
xt_conntrack iptable_filter nf_conntrack_netlink ip_set_bitmap_port 
ip_set_hash_mac ip_set_hash_net ip_set nfnetlink bnep hwmon_vid iwlmvm mac80211 
snd_usb_audio kvm_amd snd_usbmidi_lib snd_hwdep snd_rawmidi kvm btusb btrtl 
iwlwifi btbcm btintel bluetooth irqbypass ecdh_generic cfg80211 wmi_bmof 
sp5100_tco k10temp i2c_piix4 snd_hda_codec_realtek
snd_hda_codec_generic rfkill snd_hda_codec_hdmi snd_hda_intel snd_hda_codec 
snd_hda_core rtc_cmos acpi_cpufreq binfmt_misc snd_pcm_oss snd_mixer_oss 
snd_seq snd_seq_device snd_pcm tcp_cubic tcp_westwood br_netfilter bridge stp 
llc ip_tables scsi_transport_iscsi algif_skcipher af_alg uas usb_storage usbhid 
mxm_wmi ccp igb xhci_pci xhci_hcd usbcore usb_common wmi button 8021q mrp 
sunrpc snd_timer snd soundcore fuse tun xt_tcpudp x_tables tcp_bbr nf_conntrack 
nf_defrag_ipv6 nf_defrag_ipv4 sch_fq_codel sch_htb sch_pie analog gameport 
joydev i2c_dev ecryptfs autofs4 amdkfd amd_iommu_v2 [last unloaded: pcspkr]
---[ end trace 166aeb108d0289e0 ]---
RIP: 0010:rb_erase+0xce/0x380
Code: 01 48 83 e0 fc 0f 84 cf 01 00 00 48 3b 78 10 0f 84 1d 02 00 00 48 89 48 
08 c3 48 8b 17 48 89 d0 48 83 e0 fc 0f 84 d1 01 00 00 <48> 3b 78 10 0f 84 aa 01 
00 00 4c 89 40 08 4d 85 c0 0f 85 c6 01 00
RSP: 0018:bfc10c643cf8 EFLAGS: 00010282
RAX: b488144c79e96970 RBX: a1bf63320e10 RCX: 
RDX: b488144c79e96970 RSI: a1c224983010 RDI: a1bf63320e10
RBP: a1ba766fa210 R08:  R09: a1bea0368bd0
R10:  R11: a1c140038000 R12: a1c224983010
R13: a1c224983808 R14: a1c224983000 R15: a1bf63320e60
FS:  7ff302a1b900() GS:a1c23e88() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 562e3b407000 CR3: 00027666e000 CR4: 003406e0

-- 
Do what you love because life is too short for anything else.
https://samifar.in/



4.19.4 nf_conntrack_count kernel panic

2018-11-26 Thread Sami Farin
4.18.20 works OK, but unfortunately 4.18 series is EOL.
I have Ryzen 1600X, 32 GB RAM, Fedora 28, gcc-8.2.1-5, nosmt=force, igb module 
for Intel I211,
using XFS filesystems only.

To reproduce, I only do this: connect to VPN using a tunnel (e.g. tun0),
start downloading a file with qbittorrent (allow port for incoming
TCP connections in qbittorrent and iptables) and wait a couple of minutes.
I am also using ipset and connlimit modules.
I reproduced this bug three times.
With 4.18 I use fq+htb and  with 4.19 I use CAKE for traffic control.

Only this message in kernel log:
[  363.935074] TCP: request_sock_TCP: Possible SYN flooding on port 19044. 
Dropping request.  Check SNMP counters.
I get this message with both 4.18.20 and 4.19.4.

RIP: 0010:rb_insert_color+0x64
Call Trace:
  nf_conntrack_count [nf_conncount]
  ip_set_test [ip_set]
  connlimit_mt [xt_connlimit]
  set_match_v4 [xt_set]
  ipt_do_table [ip_tables]
  ip_route_input_noref
  nf_hook_slow 
  ip_local_deliver
  inet_add_protocol
  ip_rcv
  ip_rcv_finish_core
  __netif_receive_skb_one_core
  netif_receive_skb_internal
  tun_rx_batched
  tun_get_user
  __local_bh_enable_ip
  tun_get_user
  tun_chr_write_iter
  __vfs_write
  vfs_write
  ksys_write
  do_syscall_64
  trace_hardirqs_off_thunk
  entry_SYSCALL_64_after_hwframe

...

Kernel panic - not syncing: Fatal exception in interrupt

-- 
Do what you love because life is too short for anything else.
https://samifar.in/



4.18.6 dl_seq_start [xt_hashlimit] unable to handle kernel NULL pointer dereference at 0000000000000050

2018-09-05 Thread Sami Farin
4.17 worked ok, this with 32 GB Ryzen system.

BUG: unable to handle kernel NULL pointer dereference at 0050
PGD 0 P4D 0 
Oops:  [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 6303 Comm: grep Tainted: GT 4.18.6+ #16
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X370 Taichi, BIOS 
P4.60 03/02/2018
RIP: 0010:dl_seq_start+0x11/0x60 [xt_hashlimit]
Code: ff 5d 41 5c 41 5d 41 5e 41 5f c3 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 
00 0f 1f 44 00 00 55 48 89 f5 53 48 8b 87 d8 00 00 00 <48> 8b 78 50 e8 36 3b 6f 
de 48 89 c3 48 8d 78 48 e8 ca e6 0a df 8b 
RSP: 0018:a79e88befde0 EFLAGS: 00010246
RAX:  RBX:  RCX: 
RDX:  RSI: a79e88befe18 RDI: 9a64733417a0
RBP: a79e88befe18 R08:  R09: 657547bf
R10: 9f2bf98a R11: 9a6470f5a6c0 R12: a79e88befeb0
R13: 9a6471879200 R14: 9a6471879200 R15: 9a64733417a0
FS:  7f6798784740() GS:9a649ea0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0050 CR3: 0007f335c000 CR4: 003406f0
Call Trace:
 seq_read+0xc0/0x470
 proc_reg_read+0x49/0x70
 vfs_read+0x8a/0x140
 ksys_read+0x52/0xc0
 do_syscall_64+0x6f/0x353
 ? trace_hardirqs_off_thunk+0x1a/0x1c
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f67980eee21
Code: fe ff ff 50 48 8d 3d 46 b6 09 00 e8 f9 04 02 00 66 0f 1f 84 00 00 00 00 
00 48 8d 05 c1 3b 2d 00 8b 00 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 
c3 66 0f 1f 44 00 00 41 54 49 89 d4 55 48 
RSP: 002b:7ffc314f7a68 EFLAGS: 0246 ORIG_RAX: 
RAX: ffda RBX: 8000 RCX: 7f67980eee21
RDX: 8000 RSI: 55f0d317f000 RDI: 0003
RBP: 8000 R08:  R09: 9008
R10:  R11: 0246 R12: 55f0d317f000
R13: 0003 R14: 55f0d317e830 R15: 0003
Modules linked in: arptable_filter arp_tables nfnetlink_acct ip6table_mangle 
nf_log_ipv6 xt_hl ip6t_REJECT nf_reject_ipv6 ip6t_rt ip6table_filter ip6_tables 
ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat iptable_raw xt_mark xt_connmark 
iptable_mangle nf_log_ipv4 nf_log_common xt_LOG xt_length xt_limit ipt_REJECT 
nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_connlimit nf_conncount 
xt_multiport xt_hashlimit xt_owner xt_set xt_conntrack iptable_filter 
ip_set_bitmap_port ip_set_hash_mac ip_set_hash_net ip_set nf_conntrack_netlink 
nfnetlink bnep hwmon_vid iwlmvm snd_usb_audio snd_usbmidi_lib snd_hwdep 
snd_rawmidi mac80211 iwlwifi btusb btrtl kvm_amd btbcm btintel bluetooth kvm 
cfg80211 ecdh_generic irqbypass sp5100_tco wmi_bmof k10temp i2c_piix4 
snd_hda_codec_realtek rfkill snd_hda_codec_generic
 snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hda_core rtc_cmos 
acpi_cpufreq binfmt_misc snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device 
snd_pcm tcp_cubic tcp_westwood br_netfilter bridge stp llc ip_tables 
scsi_transport_iscsi algif_skcipher af_alg uas usb_storage usbhid mxm_wmi ccp 
igb xhci_pci xhci_hcd usbcore usb_common wmi button 8021q mrp sunrpc snd_timer 
snd soundcore fuse tun xt_tcpudp x_tables tcp_bbr nf_conntrack_ipv6 
nf_defrag_ipv6 nf_conntrack sch_fq_codel sch_htb sch_pie analog gameport joydev 
i2c_dev ecryptfs autofs4 amdkfd amd_iommu_v2 [last unloaded: pcspkr]
CR2: 0050
---[ end trace 0e097a943554aa36 ]---


-- 
Do what you love because life is too short for anything else.
https://samifar.in/



Panic since 4.15.15 in tcp_retransmit_timer when doing ss -K

2018-04-13 Thread Sami Farin
I started getting this since 4.15.15.  It's easy to trigger,
for example I get new IP address via dhcp (NetworkManager),
then ss -K the_old_ip_address .

Happens on Ryzen and SandyBridge systems.
My guess of the cause: commit 960058fe196397aecb16bb14e64980e265d2bc5e
(didn't try reverting)

BUG: unable to handle kernel NULL pointer dereference at 30
IP: tcp_retransmit_skb+0x57/0xc0
PGD 0 P4D 0
Oops:  [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 0 Comm: swapper/0 Tainted: GW   4.15.17+ #42
Hardware name: To Be Filled By O.E.M./To Be Filled By O.E.M./X370 Taichi, BIOS 
P4.60 03/02/2018
RIP: 0010:tcp_retransmit_skb+0x57/0xc0
RSP: 0018:95b1dea03e00 EFLAGS: 00010206
RAX: fff5 RBX: 95b15876d000 RCX: 5
RDX: 5 RSI: 0 RDI: 95b115876d000

... 
Call Trace:

tcp_retransmit_timer
tcp_write_timer_handler
tcp_write_timer
? tcp_write_timer_handler
expire_timers
run_timer_softirq
sched_clock
sched_clock
sched_clock_cpu
irqtime_account_irq
__do_softirq
sched_clock
irq_exit
smp_apic_timer_interrupt
apic_timer_interrupt


-- 
Do what you love because life is too short for anything else.



Re: [NETFILTER] xt_hashlimit : speedups hash_dst()

2007-12-17 Thread Sami Farin
On Sat, Dec 15, 2007 at 21:42:19 -0800, David Miller wrote:
> From: Eric Dumazet <[EMAIL PROTECTED]>
> Date: Sat, 15 Dec 2007 12:04:47 +0100
> 
> > I prefer to let admins chose their size, since it makes attacker life more 
> > difficult :)
> > 
> > For example, I can tell you I have a server, were size is between 2.000.000 
> > and 3.500.000, I dont want to be forced to use 2097152
> > 
> > A multiply is cheap, at least on current hardware.
> 
> I agree, and I see nothing wrong with Eric's patch and it
> should be merged ASAP.

You could do the same optimization for 
net/netfilter/nf_conntrack_core.c:__hash_conntrack() , too.

-- 
Do what you love because life is too short for anything else.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: updated tcatm patches for kernel/iproute 2.6.22

2007-07-19 Thread Sami Farin
On Thu, Jul 19, 2007 at 00:58:44 +0200, Patrick McHardy wrote:
> Sami Farin wrote:
> > I had used tcatm patch with 2.6.16 kernel and I was
> > happy with it.
> > Now I patched Linux kernel 2.6.22 and iproute-2.6.22
> > for tcatm.  Seems to work (TM).  Only HTB tested.
> 
> 
> Let me repeat again before we get in an endless discussion
> again - I will NACK these patches until we have a solution
> that works for all qdiscs.

I googled for
"Accurate packet scheduling for ATM" mchardy
and found some threads from 2006.

http://www.mail-archive.com/netdev@vger.kernel.org/msg15706.html
http://www.mail-archive.com/netdev@vger.kernel.org/msg21395.html

So, Patrick:
1) what is this "solution"?  do you have code?
   or a brief explanation...
2) are you still working on this?
   is somebody else?
3) do you have patches for iproute/kernel 2.6.22?
4) if you have some patches, but they are not yet
   complete, what else needs to be done?

-- 
Do what you love because life is too short for anything else.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux 2.6.22: Leak r=1 1

2007-07-18 Thread Sami Farin
On Wed, Jul 18, 2007 at 12:16:56 +0300, Ilpo Järvinen wrote:
> On Wed, 11 Jul 2007, Sami Farin wrote:
> 
> > That's right, so descriptive is the new Linux kernel 2.6.22.
> > 
> > Linux safari.finland.fbi 2.6.22-cfs-v19 #3 SMP Tue Jul 10 00:22:25 EEST 
> > 2007 i686 i686 i386 GNU/Linux
> > 
> > [EMAIL PROTECTED] /proc/sys/net/ipv4]# grep . *
> 
> ...snip...
> 
> > tcp_frto:1
> 
> ...This is fully unrelated to the issue but I'm a bit curious who enabled 
> frto on your machine (since it's disabled by default), did you do it by 
> yourself or the distribution perhaps?

I enabled it by myself...

If you'd like to get more widespread testing,
try suggesting Fedora project to add the tuning
to /etc/sysctl.conf (or something like that).
Maybe fedora-devel-list:
http://www.redhat.com/mailman/listinfo/fedora-devel-list
Note that they have antispam on that list which requires
that email address found on From: header field must be
subscribed to the list or otherwise your email is
devnulled.

-- 
Do what you love because life is too short for anything else.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


updated tcatm patches for kernel/iproute 2.6.22

2007-07-18 Thread Sami Farin
I got tired of getting 15% packet loss [1] when doing
lots of DNS lookups on my ADSL link...
And that was even when limiting outgoing DNS traffic
to 200 Kbit/s (ADSL modem upstream speed is 512 Kbit/s).

I had used tcatm patch with 2.6.16 kernel and I was
happy with it.
Now I patched Linux kernel 2.6.22 and iproute-2.6.22
for tcatm.  Seems to work (TM).  Only HTB tested.

Now I get 0% packet loss when doing lots of DNS queries
(270pps) and DNS traffic is limited with
HTB/ESFQ to 504Kbit/s.  I used "tc class add ... atm overhead 20"
but I am not sure is it really 20 (Sonera in Finland).
Without tcatm I had to have there 420Kbit and it still sucked.

I keep patches at
http://safari.iki.fi/tcatm/

I read one thread from this year where there were objections
about tcatm's some parts and then discussion petered out...
I and probably Russell Stuart would probably like to get
these patches fixed so that everyone is pleased
and these can be incorporated into kernel some year,
because I believe ADSL is very popular nowadays ( =) )
and people would probably like if traffic control was
actually usable for them ( =) )...

[1] ping -A 80.223.96.1

-- 
Do what you love because life is too short for anything else.

#
# include/linux/pkt_sched.h |5 +++--
# include/net/sch_generic.h |   15 +++
# net/sched/act_police.c|4 ++--
# net/sched/sch_cbq.c   |2 +-
# net/sched/sch_htb.c   |9 -
# net/sched/sch_tbf.c   |4 ++--
# 6 files changed, 27 insertions(+), 12 deletions(-)
#
--- linux-2.6.22/include/linux/pkt_sched.h.bak  2007-07-09 21:58:23.559346000 
+0300
+++ linux-2.6.22/include/linux/pkt_sched.h  2007-07-18 21:46:00.084770053 
+0300
@@ -77,8 +77,9 @@ struct tc_ratespec
 {
unsigned char   cell_log;
unsigned char   __reserved;
-   unsigned short  feature;
-   short   addend;
+   unsigned short  feature;/* Always 0 in pre-atm patch kernels */
+   charcell_align; /* Always 0 in pre-atm patch kernels */
+   unsigned char   __reserved2;
unsigned short  mpu;
__u32   rate;
 };
--- linux-2.6.22/include/net/sch_generic.h.bak  2007-07-09 02:32:17.0 
+0300
+++ linux-2.6.22/include/net/sch_generic.h  2007-07-18 21:44:40.024580754 
+0300
@@ -302,4 +302,19 @@ drop:
return NET_XMIT_DROP;
 }
 
+/* Lookup a qdisc_rate_table to determine how long it will take to send a
+ * packet given its size.
+ */
+static inline u32 qdisc_l2t(struct qdisc_rate_table* rtab, int pktlen)
+{
+   int slot = pktlen + rtab->rate.cell_align;
+
+   if (slot < 0)
+   slot = 0;
+   slot >>= rtab->rate.cell_log;
+   if (slot > 255)
+   return rtab->data[255] + 1;
+   return rtab->data[slot];
+}
+
 #endif
--- linux-2.6.22/net/sched/act_police.c.bak 2007-07-09 02:32:17.0 
+0300
+++ linux-2.6.22/net/sched/act_police.c 2007-07-18 21:42:49.275936447 +0300
@@ -32,8 +32,8 @@
 #include 
 #include 
 
-#define L2T(p,L)   ((p)->tcfp_R_tab->data[(L)>>(p)->tcfp_R_tab->rate.cell_log])
-#define L2T_P(p,L) ((p)->tcfp_P_tab->data[(L)>>(p)->tcfp_P_tab->rate.cell_log])
+#define L2T(p,L)   qdisc_l2t((p)->tcfp_R_tab,L)
+#define L2T_P(p,L) qdisc_l2t((p)->tcfp_P_tab,L)
 
 #define POL_TAB_MASK 15
 static struct tcf_common *tcf_police_ht[POL_TAB_MASK + 1];
--- linux-2.6.22/net/sched/sch_cbq.c.bak2007-07-09 02:32:17.0 
+0300
+++ linux-2.6.22/net/sched/sch_cbq.c2007-07-18 21:51:12.794420373 +0300
@@ -192,7 +192,7 @@ struct cbq_sched_data
 };
 
 
-#define L2T(cl,len)((cl)->R_tab->data[(len)>>(cl)->R_tab->rate.cell_log])
+#define L2T(cl,len)qdisc_l2t((cl)->R_tab,len)
 
 
 static __inline__ unsigned cbq_hash(u32 h)
--- linux-2.6.22/net/sched/sch_htb.c.bak2007-07-09 21:17:53.417438000 
+0300
+++ linux-2.6.22/net/sched/sch_htb.c2007-07-18 21:50:08.602465126 +0300
@@ -157,12 +157,11 @@ struct htb_class {
 static inline long L2T(struct htb_class *cl, struct qdisc_rate_table *rate,
   int size)
 {
-   int slot = size >> rate->rate.cell_log;
-   if (slot > 255) {
+   long result = qdisc_l2t(rate, size);
+
+   if (result > rate->data[255])
cl->xstats.giants++;
-   slot = 255;
-   }
-   return rate->data[slot];
+   return result;
 }
 
 struct htb_sched {
--- linux-2.6.22/net/sched/sch_tbf.c.bak2007-07-09 02:32:17.0 
+0300
+++ linux-2.6.22/net/sched/sch_tbf.c2007-07-18 21:52:10.665281840 +0300
@@ -132,8 +132,8 @@ struct tbf_sched_data
struct qdisc_watchdog watchdog; /* Watchdog timer */
 };
 
-#define L2T(q,L)   ((q)->R_tab->data[(L)>>(q)->R_tab->rate.cell_log])
-#define L2T_P(q,L) ((q)->P_tab->data[(L)>>(q)->P_tab->rate.cell_log])
+#define L2T(q,L)   qdisc_l2t((q)->R_tab,L)
+#define L2T_P(q,L) qdisc_l2t((q)->P_tab,L)
 
 static int tbf_enqueue(struct sk_buff *skb, struct Qdisc* sch)
 {
#
# include/linux/pkt_sched.h |5 +-
#

Re: Linux 2.6.22: Leak r=1 1

2007-07-12 Thread Sami Farin
On Thu, Jul 12, 2007 at 10:53:57 +0300, Ilpo Järvinen wrote:
> On Wed, 11 Jul 2007, Sami Farin wrote:
> 
> > That's right, so descriptive is the new Linux kernel 2.6.22.
> > Took a while to grep what is "leaking".
> > 
> > Linux safari.finland.fbi 2.6.22-cfs-v19 #3 SMP Tue Jul 10 00:22:25 EEST 
> > 2007 i686 i686 i386 GNU/Linux
> > 
> > Just normal Internet usage, azureus for example =)
> > I think this is easy to trigger.
> 
> I guess those packet loss periods help you to reproduce it so easily.
...
> I'd be interested to study some tcpdumps that relate to Leak cases you're 
> seeing. Could you record some Sami? I'm not sure though how one can figure 

I now have 300 MB capture and several new&retarded music videos...
And 10 WARNINGs and 0 Leak printk's.

2007-07-12 12:03:18.910712500 <4>[ 1318.606826] WARNING: at 
net/ipv4/tcp_input.c:1402 tcp_enter_frto_loss()
2007-07-12 12:21:55.575049500 <4>[ 2434.941077] WARNING: at 
net/ipv4/tcp_input.c:1402 tcp_enter_frto_loss()
2007-07-12 12:25:56.626918500 <4>[ 2675.917531] WARNING: at 
net/ipv4/tcp_input.c:1402 tcp_enter_frto_loss()
2007-07-12 12:26:01.421714500 <4>[ 2680.710860] WARNING: at 
net/ipv4/tcp_input.c:1402 tcp_enter_frto_loss()
2007-07-12 12:27:55.996561500 <4>[ 2795.252008] WARNING: at 
net/ipv4/tcp_input.c:1402 tcp_enter_frto_loss()
2007-07-12 12:33:03.405492500 <4>[ 3102.570088] WARNING: at 
net/ipv4/tcp_input.c:1402 tcp_enter_frto_loss()
2007-07-12 12:33:59.837033500 <4>[ 3158.985152] WARNING: at 
net/ipv4/tcp_input.c:1402 tcp_enter_frto_loss()
2007-07-12 12:44:59.580682500 <4>[ 3818.697530] WARNING: at 
net/ipv4/tcp_input.c:1402 tcp_enter_frto_loss()
2007-07-12 12:45:06.146194500 <4>[ 3825.261028] WARNING: at 
net/ipv4/tcp_input.c:1402 tcp_enter_frto_loss()
2007-07-12 12:45:07.637015500 <4>[ 3826.751240] WARNING: at 
net/ipv4/tcp_input.c:1402 tcp_enter_frto_loss()

This is MAYBE the guilty connection if timestamps are to be believed:

2007-07-12 12:02:35.311410 IP (tos 0x0, ttl  61, id 17078, offset 0, flags 
[none], proto: TCP (6), length: 60) 80.223.106.128.43771 > 
62.203.174.236.24442: SWE, cksum 0x26f7 (correct), 1227344370:1227344370(0) win 
5720 
2007-07-12 12:02:38.281251 IP (tos 0x0, ttl  61, id 17079, offset 0, flags 
[none], proto: TCP (6), length: 60) 80.223.106.128.43771 > 
62.203.174.236.24442: SWE, cksum 0x1b3f (correct), 1227344370:1227344370(0) win 
5720 
2007-07-12 12:02:38.792865 IP (tos 0x0, ttl 113, id 46391, offset 0, flags 
[DF], proto: TCP (6), length: 52) 62.203.174.236.24442 > 80.223.106.128.43771: 
., cksum 0xc936 (correct), ack 1227344371 win 17640 
2007-07-12 12:02:38.854298 IP (tos 0x0, ttl 113, id 46396, offset 0, flags 
[DF], proto: TCP (6), length: 64) 62.203.174.236.24442 > 80.223.106.128.43771: 
S, cksum 0x319e (correct), 602133927:602133927(0) ack 1227344371 win 17640 
2007-07-12 12:02:38.854335 IP (tos 0x0, ttl  61, id 17080, offset 0, flags 
[none], proto: TCP (6), length: 52) 80.223.106.128.43771 > 
62.203.174.236.24442: ., cksum 0x6251 (correct), ack 602133928 win 715 

2007-07-12 12:02:38.858231 IP (tos 0x0, ttl  61, id 17081, offset 0, flags 
[none], proto: TCP (6), length: 372) 80.223.106.128.43771 > 
62.203.174.236.24442: P, cksum 0xaa7d (incorrect (-> 0x006d), 
1227344371:1227344691(320) ack 602133928 win 715 
2007-07-12 12:02:39.305447 IP (tos 0x0, ttl 113, id 46441, offset 0, flags 
[DF], proto: TCP (6), length: 159) 62.203.174.236.24442 > 80.223.106.128.43771: 
P, cksum 0x18b6 (correct), 602133928:602134035(107) ack 1227344691 win 17320 

2007-07-12 12:02:39.305482 IP (tos 0x0, ttl  61, id 17082, offset 0, flags 
[none], proto: TCP (6), length: 52) 80.223.106.128.43771 > 
62.203.174.236.24442: ., cksum 0xf9de (correct), ack 602134035 win 715 

2007-07-12 12:02:39.309403 IP (tos 0x0, ttl  61, id 17083, offset 0, flags 
[none], proto: TCP (6), length: 263) 80.223.106.128.43771 > 
62.203.174.236.24442: P, cksum 0xaa10 (incorrect (-> 0xf1b3), 
1227344691:1227344902(211) ack 602134035 win 715 
2007-07-12 12:02:40.649923 IP (tos 0x0, ttl  61, id 17084, offset 0, flags 
[none], proto: TCP (6), length: 263) 80.223.106.128.43771 > 
62.203.174.236.24442: P, cksum 0xaa10 (incorrect (-> 0xec76), 
1227344691:1227344902(211) ack 602134035 win 715 
2007-07-12 12:02:41.148856 IP (tos 0x0, ttl 113, id 46591, offset 0, flags 
[DF], proto: TCP (6), length: 52) 62.203.174.236.24442 > 80.223.106.128.43771: 
., cksum 0xb73b (correct), ack 1227344902 win 17109 
2007-07-12 12:02:42.679961 IP (tos 0x0, ttl 113, id 46707, offset 0, flags 
[DF], proto: TCP (6), length: 484) 62.203.174.236.24442 > 80.223.106.128.43771: 
P, cksum 0x3390 (correct), 602134035:602134467(432) ack 1227344902 win 17109 

2007-07-12 12:02:42.703122 IP (tos 0x0, ttl  61, id 17085, offset 0, flags 
[none], proto: TCP (6), length: 120) 80.22

Linux 2.6.22: Leak r=1 1

2007-07-11 Thread Sami Farin
That's right, so descriptive is the new Linux kernel 2.6.22.
Took a while to grep what is "leaking".

Linux safari.finland.fbi 2.6.22-cfs-v19 #3 SMP Tue Jul 10 00:22:25 EEST 2007 
i686 i686 i386 GNU/Linux

Just normal Internet usage, azureus for example =)
I think this is easy to trigger.
But that printk is not very useful, or is it?
I am also using HTB+ESFQ to limit outgoing bandwidth...

# ss -n|wc -l
870

# ping -A 80.223.96.1 
PING 80.223.96.1 (80.223.96.1) 56(84) bytes of data.
64 bytes from 80.223.96.1: icmp_seq=1 ttl=255 time=431 ms
...
--- 80.223.96.1 ping statistics ---
40 packets transmitted, 25 received, 37% packet loss, time 17954ms
rtt min/avg/max/mdev = 406.000/467.758/530.983/29.384 ms, pipe 2, ipg/ewma 
460.361/456.381 ms

But ploss is only temporary (when I am downloading with azureus =) ,
when only uploading (95% of bandwidth used) rtt avg = 32ms).

# dmesg|grep Leak
[114992.191011] Leak r=1 4
[124231.713348] Leak r=1 4
[142807.938284] Leak r=1 4
[142999.674521] Leak r=1 1
[143177.462073] Leak r=1 4
[143230.001570] Leak r=1 4
[143232.982560] Leak r=1 4
[143234.537096] Leak r=1 4
[143297.927760] Leak r=1 4
[143300.633603] Leak r=1 4
[143302.172917] Leak r=1 4
[143357.083193] Leak r=1 1
[143361.780879] Leak r=1 4
[143413.706490] Leak r=1 4
[143552.996598] Leak r=1 1

[EMAIL PROTECTED] /proc/sys/net/ipv4]# grep . *
icmp_echo_ignore_all:0
icmp_echo_ignore_broadcasts:1
icmp_errors_use_inbound_ifaddr:0
icmp_ignore_bogus_error_responses:1
icmp_ratelimit:1000
icmp_ratemask:6168
igmp_max_memberships:20
igmp_max_msf:10
inet_peer_gc_maxtime:120
inet_peer_gc_mintime:10
inet_peer_maxttl:600
inet_peer_minttl:120
inet_peer_threshold:65664
ip_default_ttl:61
ip_dynaddr:0
ip_forward:0
ip_local_port_range:4   65535
ip_no_pmtu_disc:1
ip_nonlocal_bind:0
ipfrag_high_thresh:262144
ipfrag_low_thresh:196608
ipfrag_max_dist:64
ipfrag_secret_interval:600
ipfrag_time:30
tcp_abc:2
tcp_abort_on_overflow:0
tcp_adv_win_scale:2
tcp_allowed_congestion_control:cubic bic reno
tcp_app_win:31
tcp_available_congestion_control:cubic bic reno westwood vegas scalable
hybla htcp highspeed
tcp_base_mss:512
tcp_congestion_control:cubic
tcp_dma_copybreak:4096
tcp_dsack:1
tcp_ecn:1
tcp_fack:1
tcp_fin_timeout:30
tcp_frto:1
tcp_frto_response:0
tcp_keepalive_intvl:75
tcp_keepalive_probes:9
tcp_keepalive_time:3300
tcp_low_latency:0
tcp_max_orphans:2048
tcp_max_ssthresh:0
tcp_max_syn_backlog:1024
tcp_max_tw_buckets:18
tcp_mem:95136   126848  190272
tcp_moderate_rcvbuf:1
tcp_mtu_probing:0
tcp_no_metrics_save:0
tcp_orphan_retries:0
tcp_reordering:3
tcp_retrans_collapse:0
tcp_retries1:3
tcp_retries2:15
tcp_rfc1337:0
tcp_rmem:4096   87380   262144
tcp_sack:1
tcp_slow_start_after_idle:0
tcp_stdurg:0
tcp_syn_retries:5
tcp_synack_retries:5
tcp_syncookies:0
tcp_timestamps:1
tcp_tso_win_divisor:3
tcp_tw_recycle:1
tcp_tw_reuse:0
tcp_window_scaling:1
tcp_wmem:4096   16384   262144
tcp_workaround_signed_windows:0

-- 
Do what you love because life is too short for anything else.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][IPv6]: Fix incorrect length check in rawv6_sendmsg()

2007-03-29 Thread Sami Farin
On Thu, Mar 29, 2007 at 14:26:44 -0700, David Miller wrote:
> From: Sridhar Samudrala <[EMAIL PROTECTED]>
> Date: Thu, 29 Mar 2007 14:17:28 -0700
> 
> > The check for length in rawv6_sendmsg() is incorrect.
> > As len is an unsigned int, (len < 0) will never be TRUE.
> > I think checking for IPV6_MAXPLEN(65535) is better.
> > 
> > Is it possible to send ipv6 jumbo packets using raw
> > sockets? If so, we can remove this check.
> 
> I don't see why such a limitation against jumbo would exist,
> does anyone else?
> 
> Thanks for catching this Sridhar.  A good compiler should simply
> fail to compile "if (x < 0)" when 'x' is an unsigned type, don't
> you think :-)

gcc warns if you use -W or -Wextra (but not if only -Wall is used):

* An unsigned value is compared against zero with '<' or '>='.

-- 
Do what you love because life is too short for anything else.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: asm volatile [Was: [RFC] div64_64 support]

2007-03-08 Thread Sami Farin
On Wed, Mar 07, 2007 at 00:24:35 +0200, Sami Farin wrote:
> On Tue, Mar 06, 2007 at 23:53:49 +0200, Sami Farin wrote:
> ...
> > And I found bug in gcc-4.1.2, it gave 0 for ncubic results
> > when doing 1000 loops test... gcc-4.0.3 works.
> 
> Found it.
> 
> --- cbrt-test.c~  2007-03-07 00:20:54.735248105 +0200
> +++ cbrt-test.c   2007-03-07 00:21:03.964864343 +0200
> @@ -209,7 +209,7 @@
>  
>   __asm__("bsrl %1,%0\n\t"
>   "cmovzl %2,%0"
> - : "=&r" (r) : "rm" (x), "rm" (-1));
> + : "=&r" (r) : "rm" (x), "rm" (-1) : "memory");
>   return r+1;
>  }
>  
> Now Linux 2.6 does not have "memory" in fls, maybe it causes
> some gcc funnies some people are seeing.

It also works without "memory" if I do "__asm__ volatile".

Why some functions have volatile and some have not in include/asm-*/*.h ?

-- 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] div64_64 support

2007-03-07 Thread Sami Farin
On Wed, Mar 07, 2007 at 11:11:49 -0500, Chuck Ebbert wrote:
> Sami Farin wrote:
> > On Tue, Mar 06, 2007 at 23:53:49 +0200, Sami Farin wrote:
> > ...
> >> And I found bug in gcc-4.1.2, it gave 0 for ncubic results
> >> when doing 1000 loops test... gcc-4.0.3 works.
> > 
> > Found it.
> > 
> > --- cbrt-test.c~2007-03-07 00:20:54.735248105 +0200
> > +++ cbrt-test.c 2007-03-07 00:21:03.964864343 +0200
> > @@ -209,7 +209,7 @@
> >  
> > __asm__("bsrl %1,%0\n\t"
> > "cmovzl %2,%0"
> > -   : "=&r" (r) : "rm" (x), "rm" (-1));
> > +   : "=&r" (r) : "rm" (x), "rm" (-1) : "memory");
> > return r+1;
> >  }
> >  
> > Now Linux 2.6 does not have "memory" in fls, maybe it causes
> > some gcc funnies some people are seeing.
> 
> Can you post the difference in the generated code with that change?

Fun.. looks when not using "memory" gcc does not even bother
calling ncubic() 666 times.  So it gets better timings ( 42/666=0 ) =)

--- cbrt-test-no_memory.s   2007-03-07 20:22:27.838466385 +0200
+++ cbrt-test-using_memory.s2007-03-07 20:22:38.237013197 +0200
...
 main:
leal4(%esp), %ecx
andl$-16, %esp
pushl   -4(%ecx)
pushl   %ebp
pushl   %edi
pushl   %esi
pushl   %ebx
pushl   %ecx
-   subl$136, %esp
+   subl$152, %esp
movl$.LC0, (%esp)
callputs
xorl%edx, %edx
movl$27, %eax
callncubic
cmpl$3, %eax
-   je  .L83
+   je  .L87
movl$.LC1, (%esp)
callputs
-.L83:
-   xorl%eax, %eax
-   xorl%edi, %edi
-   movl%eax, 88(%esp)
+.L87:
xorl%eax, %eax
-   xorl%esi, %esi
+   xorl%ebp, %ebp
movl%eax, 92(%esp)
xorl%eax, %eax
-   xorl%ebp, %ebp
+   xorl%edi, %edi
movl%eax, 96(%esp)
xorl%eax, %eax
+   xorl%esi, %esi
movl%eax, 100(%esp)
xorl%eax, %eax
movl%eax, 104(%esp)
xorl%eax, %eax
movl%eax, 108(%esp)
-   movl%edi, 112(%esp)
-   movl%esi, 116(%esp)
-   .p2align 4,,15
-.L84:
+   xorl%eax, %eax
+   movl%eax, 112(%esp)
+   movl%ebp, 116(%esp)
+   movl%edi, 120(%esp)
+   movl%esi, 124(%esp)
+.L88:
 #APP
movl $0, %eax
cpuid
rdtsc
 
 #NO_APP
movl%eax, 56(%esp)
movl%edx, 60(%esp)
 #APP
movl $0, %eax
cpuid
rdtsc
 
 #NO_APP
movl%eax, %esi
movl%edx, %edi
subl56(%esp), %esi
sbbl60(%esp), %edi
cmpl$0, %edi
ja  .L66
cmpl$999, %esi
-   jbe .L84
+   jbe .L88
 .L66:
+   movl92(%esp), %edx
+   leal(%edx,%edx,2), %eax
+   movlcases+4(,%eax,4), %edi
+   movlcases(,%eax,4), %esi
+   movl%edi, %edx
+   movl%esi, %eax
+   callncubic
 #APP
movl $0, %eax
cpuid
rdtsc
 
 #NO_APP
-   movl%eax, %esi
-   movl%edx, %edi
+   movl$666, %ebx
+   movl%eax, 128(%esp)
+   movl%edx, 132(%esp)
+   .p2align 4,,15
+.L67:
+   movl%esi, %eax
+   movl%edi, %edx
+   callncubic
+   decl%ebx
+   movl%eax, %ebp
+   jne .L67
 #APP
movl $0, %eax
cpuid
rdtsc
 
 #NO_APP
-   subl%esi, %eax
+   subl128(%esp), %eax
movl$666, %ebx
-   sbbl%edi, %edx
-   xorl%ecx, %ecx
movl%ebx, 8(%esp)
+   sbbl132(%esp), %edx
+   xorl%ecx, %ecx
movl%ecx, 12(%esp)
movl%eax, (%esp)
movl%edx, 4(%esp)
call__udivdi3
-   addl%eax, 104(%esp)
+   addl%eax, 112(%esp)
movl%edx, %ecx
movl%eax, %ebx
movl%edx, %esi
-   adcl%edx, 108(%esp)
+   adcl%edx, 116(%esp)
imull   %eax, %ecx
mull%ebx
addl%ecx, %ecx
movl%eax, 56(%esp)
addl%ecx, %edx
movl56(%esp), %eax
-   addl%eax, 112(%esp)
+   addl%eax, 120(%esp)
movl%edx, 60(%esp)
movl60(%esp), %edx
-   adcl%edx, 116(%esp)
-   cmpl%esi, 92(%esp)
-   ja  .L67
-   jb  .L68
-   cmpl%ebx, 88(%esp)
-   jae .L67
-.L68:
-   movl%ebx, 88(%esp)
-   movl%esi, 92(%esp)
-.L67:
-   leal(%ebp,%ebp,2), %ebx
-   sall$2, %ebx
-   movlcases+4(%ebx), %edx
-   movlcases(%ebx), %eax
-   callncubic
-   movl 

Re: [RFC] div64_64 support

2007-03-06 Thread Sami Farin
On Tue, Mar 06, 2007 at 23:53:49 +0200, Sami Farin wrote:
...
> And I found bug in gcc-4.1.2, it gave 0 for ncubic results
> when doing 1000 loops test... gcc-4.0.3 works.

Found it.

--- cbrt-test.c~2007-03-07 00:20:54.735248105 +0200
+++ cbrt-test.c 2007-03-07 00:21:03.964864343 +0200
@@ -209,7 +209,7 @@
 
__asm__("bsrl %1,%0\n\t"
"cmovzl %2,%0"
-   : "=&r" (r) : "rm" (x), "rm" (-1));
+   : "=&r" (r) : "rm" (x), "rm" (-1) : "memory");
return r+1;
 }
 
Now Linux 2.6 does not have "memory" in fls, maybe it causes
some gcc funnies some people are seeing.

-- 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] div64_64 support

2007-03-06 Thread Sami Farin
On Tue, Mar 06, 2007 at 10:29:41 -0800, Stephen Hemminger wrote:
> Don't count the existing Newton-Raphson out. It turns out that to get enough
> precision for 32 bits, only 4 iterations are needed. By unrolling those, it
> gets much better timing.
> 
> Slightly gross test program (with original cubic wraparound bug fixed).
...
>   {~0, 2097151},
 ^^^
this should be 2642245.

Without serializing instruction before rdtsc and with one loop
I do not get very accurate results (104 for ncubic, > 1000 for others).

#define rdtscll_serialize(val) \
  __asm__ __volatile__("movl $0, %%eax\n\tcpuid\n\trdtsc\n" : "=A" (val) : : 
"ebx", "ecx")

Here Pentium D timings for 1000 loops. 

~0, 2097151

Function clocks mean(us)  max(us)  std(us) total error
ocubic  9120.306   20.3170.730  545101
ncubic  7770.261   14.7990.486  576263
acbrt  11680.392   21.6810.547  547562
hcbrt   8270.278   15.2440.3872410

~0, 2642245

Function clocks mean(us)  max(us)  std(us) total error
ocubic  9080.305   20.2100.656   7
ncubic  7750.260   14.7920.550   31169
acbrt  11760.395   22.0170.9702468
hcbrt   8260.278   15.3260.670  547504

And I found bug in gcc-4.1.2, it gave 0 for ncubic results
when doing 1000 loops test... gcc-4.0.3 works.

-- 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] div64_64 support

2007-02-24 Thread Sami Farin
On Fri, Feb 23, 2007 at 17:05:27 -0800, Stephen Hemminger wrote:
> Since there already two users of full 64 bit division in the kernel,
> and other places maybe hiding out as well. Add a full 64/64 bit divide.
> 
> Yes this expensive, but there are places where it is necessary.
> It is not clear if doing the scaling buys any advantage on 64 bit platforms,
> so for them a full divide is done.

Still does not work after these fixes... how came?

WARNING: "div64_64" [net/netfilter/xt_connbytes.ko] undefined!
WARNING: "div64_64" [net/ipv4/tcp_cubic.ko] undefined!

--- linux-2.6.19/include/asm-i386/div64.h.bak   2006-11-29 23:57:37.0 
+0200
+++ linux-2.6.19/include/asm-i386/div64.h   2007-02-24 16:24:55.822529880 
+0200
@@ -45,4 +45,7 @@ div_ll_X_l_rem(long long divs, long div,
return dum2;
 
 }
+
+extern uint64_t div64_64(uint64_t dividend, uint64_t divisor);
+
 #endif
--- linux-2.6.19/lib/div64.c.bak2007-02-24 16:10:03.686084000 +0200
+++ linux-2.6.19/lib/div64.c2007-02-24 17:01:11.224517353 +0200
@@ -80,4 +80,6 @@ uint64_t div64_64(uint64_t dividend, uin
return dividend;
 }
 
+EXPORT_SYMBOL(div64_64);
+
 #endif /* BITS_PER_LONG == 32 */

-- 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html