Re: 4.19.4 nf_conntrack_count kernel panic
On Mon, Nov 26, 2018 at 22:14:48 +0200, Denys Fedoryshchenko wrote: > On 2018-11-26 21:46, Sami Farin wrote: > > 4.18.20 works OK, but unfortunately 4.18 series is EOL. ... > Check this patches: > https://patchwork.ozlabs.org/project/netfilter-devel/list/?series=73972&state=* > > Relevant discussion: > https://marc.info/?l=linux-netdev&m=154211826106430&w=2 I applied those three patches and 4.19.5 does not crash now the same way. BUT I got this when I ran iptables-restore (could not reproduce): RIP: 0010:rb_erase+0xce/0x380 Code: 01 48 83 e0 fc 0f 84 cf 01 00 00 48 3b 78 10 0f 84 1d 02 00 00 48 89 48 08 c3 48 8b 17 48 89 d0 48 83 e0 fc 0f 84 d1 01 00 00 <48> 3b 78 10 0f 84 aa 01 00 00 4c 89 40 08 4d 85 c0 0f 85 c6 01 00 RSP: 0018:bfc10c643cf8 EFLAGS: 00010282 RAX: b488144c79e96970 RBX: a1bf63320e10 RCX: RDX: b488144c79e96970 RSI: a1c224983010 RDI: a1bf63320e10 RBP: a1ba766fa210 R08: R09: a1bea0368bd0 R10: R11: a1c140038000 R12: a1c224983010 R13: a1c224983808 R14: a1c224983000 R15: a1bf63320e60 FS: 7ff302a1b900() GS:a1c23e88() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 562e3b407000 CR3: 00027666e000 CR4: 003406e0 Call Trace: nf_conncount_destroy+0x58/0xc0 [nf_conncount] cleanup_match+0x45/0x70 [ip_tables] cleanup_entry+0x3e/0xc0 [ip_tables] do_ipt_set_ctl+0x450/0x4ed [ip_tables] nf_setsockopt+0x44/0x70 __sys_setsockopt+0x82/0xe0 __x64_sys_setsockopt+0x20/0x30 do_syscall_64+0x6f/0x353 ? trace_hardirqs_off_thunk+0x1a/0x1c entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7ff301a1740a Code: ff ff ff c3 48 8b 15 95 da 2b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b1 0f 1f 80 00 00 00 00 49 89 ca b8 36 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 66 da 2b 00 f7 d8 64 89 01 48 RSP: 002b:7ffd76e9e9a8 EFLAGS: 0206 ORIG_RAX: 0036 RAX: ffda RBX: 562e3b3bece8 RCX: 7ff301a1740a RDX: 0040 RSI: RDI: 0004 RBP: 562e3b3f4850 R08: 00013830 R09: R10: 562e3b3f4850 R11: 0206 R12: 562e3b3f48b0 R13: 562e3b3bece8 R14: 000137d0 R15: 562e3b3bece0 Modules linked in: act_skbedit cls_u32 algif_hash sch_cake arptable_filter arp_tables cls_fw sch_fq nfnetlink_acct ip6table_mangle nf_log_ipv6 xt_hl ip6t_REJECT nf_reject_ipv6 ip6t_rt ip6table_filter ip6_tables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat iptable_raw xt_mark xt_connmark iptable_mangle nf_log_ipv4 nf_log_common xt_LOG xt_length xt_limit ipt_REJECT nf_reject_ipv4 xt_connlimit nf_conncount xt_multiport xt_hashlimit xt_owner xt_set xt_conntrack iptable_filter nf_conntrack_netlink ip_set_bitmap_port ip_set_hash_mac ip_set_hash_net ip_set nfnetlink bnep hwmon_vid iwlmvm mac80211 snd_usb_audio kvm_amd snd_usbmidi_lib snd_hwdep snd_rawmidi kvm btusb btrtl iwlwifi btbcm btintel bluetooth irqbypass ecdh_generic cfg80211 wmi_bmof sp5100_tco k10temp i2c_piix4 snd_hda_codec_realtek snd_hda_codec_generic rfkill snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hda_core rtc_cmos acpi_cpufreq binfmt_misc snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device snd_pcm tcp_cubic tcp_westwood br_netfilter bridge stp llc ip_tables scsi_transport_iscsi algif_skcipher af_alg uas usb_storage usbhid mxm_wmi ccp igb xhci_pci xhci_hcd usbcore usb_common wmi button 8021q mrp sunrpc snd_timer snd soundcore fuse tun xt_tcpudp x_tables tcp_bbr nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 sch_fq_codel sch_htb sch_pie analog gameport joydev i2c_dev ecryptfs autofs4 amdkfd amd_iommu_v2 [last unloaded: pcspkr] ---[ end trace 166aeb108d0289e0 ]--- RIP: 0010:rb_erase+0xce/0x380 Code: 01 48 83 e0 fc 0f 84 cf 01 00 00 48 3b 78 10 0f 84 1d 02 00 00 48 89 48 08 c3 48 8b 17 48 89 d0 48 83 e0 fc 0f 84 d1 01 00 00 <48> 3b 78 10 0f 84 aa 01 00 00 4c 89 40 08 4d 85 c0 0f 85 c6 01 00 RSP: 0018:bfc10c643cf8 EFLAGS: 00010282 RAX: b488144c79e96970 RBX: a1bf63320e10 RCX: RDX: b488144c79e96970 RSI: a1c224983010 RDI: a1bf63320e10 RBP: a1ba766fa210 R08: R09: a1bea0368bd0 R10: R11: a1c140038000 R12: a1c224983010 R13: a1c224983808 R14: a1c224983000 R15: a1bf63320e60 FS: 7ff302a1b900() GS:a1c23e88() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 562e3b407000 CR3: 00027666e000 CR4: 003406e0 -- Do what you love because life is too short for anything else. https://samifar.in/
4.19.4 nf_conntrack_count kernel panic
4.18.20 works OK, but unfortunately 4.18 series is EOL. I have Ryzen 1600X, 32 GB RAM, Fedora 28, gcc-8.2.1-5, nosmt=force, igb module for Intel I211, using XFS filesystems only. To reproduce, I only do this: connect to VPN using a tunnel (e.g. tun0), start downloading a file with qbittorrent (allow port for incoming TCP connections in qbittorrent and iptables) and wait a couple of minutes. I am also using ipset and connlimit modules. I reproduced this bug three times. With 4.18 I use fq+htb and with 4.19 I use CAKE for traffic control. Only this message in kernel log: [ 363.935074] TCP: request_sock_TCP: Possible SYN flooding on port 19044. Dropping request. Check SNMP counters. I get this message with both 4.18.20 and 4.19.4. RIP: 0010:rb_insert_color+0x64 Call Trace: nf_conntrack_count [nf_conncount] ip_set_test [ip_set] connlimit_mt [xt_connlimit] set_match_v4 [xt_set] ipt_do_table [ip_tables] ip_route_input_noref nf_hook_slow ip_local_deliver inet_add_protocol ip_rcv ip_rcv_finish_core __netif_receive_skb_one_core netif_receive_skb_internal tun_rx_batched tun_get_user __local_bh_enable_ip tun_get_user tun_chr_write_iter __vfs_write vfs_write ksys_write do_syscall_64 trace_hardirqs_off_thunk entry_SYSCALL_64_after_hwframe ... Kernel panic - not syncing: Fatal exception in interrupt -- Do what you love because life is too short for anything else. https://samifar.in/
4.18.6 dl_seq_start [xt_hashlimit] unable to handle kernel NULL pointer dereference at 0000000000000050
4.17 worked ok, this with 32 GB Ryzen system. BUG: unable to handle kernel NULL pointer dereference at 0050 PGD 0 P4D 0 Oops: [#1] PREEMPT SMP NOPTI CPU: 0 PID: 6303 Comm: grep Tainted: GT 4.18.6+ #16 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X370 Taichi, BIOS P4.60 03/02/2018 RIP: 0010:dl_seq_start+0x11/0x60 [xt_hashlimit] Code: ff 5d 41 5c 41 5d 41 5e 41 5f c3 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 f5 53 48 8b 87 d8 00 00 00 <48> 8b 78 50 e8 36 3b 6f de 48 89 c3 48 8d 78 48 e8 ca e6 0a df 8b RSP: 0018:a79e88befde0 EFLAGS: 00010246 RAX: RBX: RCX: RDX: RSI: a79e88befe18 RDI: 9a64733417a0 RBP: a79e88befe18 R08: R09: 657547bf R10: 9f2bf98a R11: 9a6470f5a6c0 R12: a79e88befeb0 R13: 9a6471879200 R14: 9a6471879200 R15: 9a64733417a0 FS: 7f6798784740() GS:9a649ea0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 0050 CR3: 0007f335c000 CR4: 003406f0 Call Trace: seq_read+0xc0/0x470 proc_reg_read+0x49/0x70 vfs_read+0x8a/0x140 ksys_read+0x52/0xc0 do_syscall_64+0x6f/0x353 ? trace_hardirqs_off_thunk+0x1a/0x1c entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f67980eee21 Code: fe ff ff 50 48 8d 3d 46 b6 09 00 e8 f9 04 02 00 66 0f 1f 84 00 00 00 00 00 48 8d 05 c1 3b 2d 00 8b 00 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 c3 66 0f 1f 44 00 00 41 54 49 89 d4 55 48 RSP: 002b:7ffc314f7a68 EFLAGS: 0246 ORIG_RAX: RAX: ffda RBX: 8000 RCX: 7f67980eee21 RDX: 8000 RSI: 55f0d317f000 RDI: 0003 RBP: 8000 R08: R09: 9008 R10: R11: 0246 R12: 55f0d317f000 R13: 0003 R14: 55f0d317e830 R15: 0003 Modules linked in: arptable_filter arp_tables nfnetlink_acct ip6table_mangle nf_log_ipv6 xt_hl ip6t_REJECT nf_reject_ipv6 ip6t_rt ip6table_filter ip6_tables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat iptable_raw xt_mark xt_connmark iptable_mangle nf_log_ipv4 nf_log_common xt_LOG xt_length xt_limit ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_connlimit nf_conncount xt_multiport xt_hashlimit xt_owner xt_set xt_conntrack iptable_filter ip_set_bitmap_port ip_set_hash_mac ip_set_hash_net ip_set nf_conntrack_netlink nfnetlink bnep hwmon_vid iwlmvm snd_usb_audio snd_usbmidi_lib snd_hwdep snd_rawmidi mac80211 iwlwifi btusb btrtl kvm_amd btbcm btintel bluetooth kvm cfg80211 ecdh_generic irqbypass sp5100_tco wmi_bmof k10temp i2c_piix4 snd_hda_codec_realtek rfkill snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hda_core rtc_cmos acpi_cpufreq binfmt_misc snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device snd_pcm tcp_cubic tcp_westwood br_netfilter bridge stp llc ip_tables scsi_transport_iscsi algif_skcipher af_alg uas usb_storage usbhid mxm_wmi ccp igb xhci_pci xhci_hcd usbcore usb_common wmi button 8021q mrp sunrpc snd_timer snd soundcore fuse tun xt_tcpudp x_tables tcp_bbr nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack sch_fq_codel sch_htb sch_pie analog gameport joydev i2c_dev ecryptfs autofs4 amdkfd amd_iommu_v2 [last unloaded: pcspkr] CR2: 0050 ---[ end trace 0e097a943554aa36 ]--- -- Do what you love because life is too short for anything else. https://samifar.in/
Panic since 4.15.15 in tcp_retransmit_timer when doing ss -K
I started getting this since 4.15.15. It's easy to trigger, for example I get new IP address via dhcp (NetworkManager), then ss -K the_old_ip_address . Happens on Ryzen and SandyBridge systems. My guess of the cause: commit 960058fe196397aecb16bb14e64980e265d2bc5e (didn't try reverting) BUG: unable to handle kernel NULL pointer dereference at 30 IP: tcp_retransmit_skb+0x57/0xc0 PGD 0 P4D 0 Oops: [#1] PREEMPT SMP NOPTI CPU: 0 PID: 0 Comm: swapper/0 Tainted: GW 4.15.17+ #42 Hardware name: To Be Filled By O.E.M./To Be Filled By O.E.M./X370 Taichi, BIOS P4.60 03/02/2018 RIP: 0010:tcp_retransmit_skb+0x57/0xc0 RSP: 0018:95b1dea03e00 EFLAGS: 00010206 RAX: fff5 RBX: 95b15876d000 RCX: 5 RDX: 5 RSI: 0 RDI: 95b115876d000 ... Call Trace: tcp_retransmit_timer tcp_write_timer_handler tcp_write_timer ? tcp_write_timer_handler expire_timers run_timer_softirq sched_clock sched_clock sched_clock_cpu irqtime_account_irq __do_softirq sched_clock irq_exit smp_apic_timer_interrupt apic_timer_interrupt -- Do what you love because life is too short for anything else.
Re: [NETFILTER] xt_hashlimit : speedups hash_dst()
On Sat, Dec 15, 2007 at 21:42:19 -0800, David Miller wrote: > From: Eric Dumazet <[EMAIL PROTECTED]> > Date: Sat, 15 Dec 2007 12:04:47 +0100 > > > I prefer to let admins chose their size, since it makes attacker life more > > difficult :) > > > > For example, I can tell you I have a server, were size is between 2.000.000 > > and 3.500.000, I dont want to be forced to use 2097152 > > > > A multiply is cheap, at least on current hardware. > > I agree, and I see nothing wrong with Eric's patch and it > should be merged ASAP. You could do the same optimization for net/netfilter/nf_conntrack_core.c:__hash_conntrack() , too. -- Do what you love because life is too short for anything else. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: updated tcatm patches for kernel/iproute 2.6.22
On Thu, Jul 19, 2007 at 00:58:44 +0200, Patrick McHardy wrote: > Sami Farin wrote: > > I had used tcatm patch with 2.6.16 kernel and I was > > happy with it. > > Now I patched Linux kernel 2.6.22 and iproute-2.6.22 > > for tcatm. Seems to work (TM). Only HTB tested. > > > Let me repeat again before we get in an endless discussion > again - I will NACK these patches until we have a solution > that works for all qdiscs. I googled for "Accurate packet scheduling for ATM" mchardy and found some threads from 2006. http://www.mail-archive.com/netdev@vger.kernel.org/msg15706.html http://www.mail-archive.com/netdev@vger.kernel.org/msg21395.html So, Patrick: 1) what is this "solution"? do you have code? or a brief explanation... 2) are you still working on this? is somebody else? 3) do you have patches for iproute/kernel 2.6.22? 4) if you have some patches, but they are not yet complete, what else needs to be done? -- Do what you love because life is too short for anything else. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux 2.6.22: Leak r=1 1
On Wed, Jul 18, 2007 at 12:16:56 +0300, Ilpo Järvinen wrote: > On Wed, 11 Jul 2007, Sami Farin wrote: > > > That's right, so descriptive is the new Linux kernel 2.6.22. > > > > Linux safari.finland.fbi 2.6.22-cfs-v19 #3 SMP Tue Jul 10 00:22:25 EEST > > 2007 i686 i686 i386 GNU/Linux > > > > [EMAIL PROTECTED] /proc/sys/net/ipv4]# grep . * > > ...snip... > > > tcp_frto:1 > > ...This is fully unrelated to the issue but I'm a bit curious who enabled > frto on your machine (since it's disabled by default), did you do it by > yourself or the distribution perhaps? I enabled it by myself... If you'd like to get more widespread testing, try suggesting Fedora project to add the tuning to /etc/sysctl.conf (or something like that). Maybe fedora-devel-list: http://www.redhat.com/mailman/listinfo/fedora-devel-list Note that they have antispam on that list which requires that email address found on From: header field must be subscribed to the list or otherwise your email is devnulled. -- Do what you love because life is too short for anything else. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
updated tcatm patches for kernel/iproute 2.6.22
I got tired of getting 15% packet loss [1] when doing lots of DNS lookups on my ADSL link... And that was even when limiting outgoing DNS traffic to 200 Kbit/s (ADSL modem upstream speed is 512 Kbit/s). I had used tcatm patch with 2.6.16 kernel and I was happy with it. Now I patched Linux kernel 2.6.22 and iproute-2.6.22 for tcatm. Seems to work (TM). Only HTB tested. Now I get 0% packet loss when doing lots of DNS queries (270pps) and DNS traffic is limited with HTB/ESFQ to 504Kbit/s. I used "tc class add ... atm overhead 20" but I am not sure is it really 20 (Sonera in Finland). Without tcatm I had to have there 420Kbit and it still sucked. I keep patches at http://safari.iki.fi/tcatm/ I read one thread from this year where there were objections about tcatm's some parts and then discussion petered out... I and probably Russell Stuart would probably like to get these patches fixed so that everyone is pleased and these can be incorporated into kernel some year, because I believe ADSL is very popular nowadays ( =) ) and people would probably like if traffic control was actually usable for them ( =) )... [1] ping -A 80.223.96.1 -- Do what you love because life is too short for anything else. # # include/linux/pkt_sched.h |5 +++-- # include/net/sch_generic.h | 15 +++ # net/sched/act_police.c|4 ++-- # net/sched/sch_cbq.c |2 +- # net/sched/sch_htb.c |9 - # net/sched/sch_tbf.c |4 ++-- # 6 files changed, 27 insertions(+), 12 deletions(-) # --- linux-2.6.22/include/linux/pkt_sched.h.bak 2007-07-09 21:58:23.559346000 +0300 +++ linux-2.6.22/include/linux/pkt_sched.h 2007-07-18 21:46:00.084770053 +0300 @@ -77,8 +77,9 @@ struct tc_ratespec { unsigned char cell_log; unsigned char __reserved; - unsigned short feature; - short addend; + unsigned short feature;/* Always 0 in pre-atm patch kernels */ + charcell_align; /* Always 0 in pre-atm patch kernels */ + unsigned char __reserved2; unsigned short mpu; __u32 rate; }; --- linux-2.6.22/include/net/sch_generic.h.bak 2007-07-09 02:32:17.0 +0300 +++ linux-2.6.22/include/net/sch_generic.h 2007-07-18 21:44:40.024580754 +0300 @@ -302,4 +302,19 @@ drop: return NET_XMIT_DROP; } +/* Lookup a qdisc_rate_table to determine how long it will take to send a + * packet given its size. + */ +static inline u32 qdisc_l2t(struct qdisc_rate_table* rtab, int pktlen) +{ + int slot = pktlen + rtab->rate.cell_align; + + if (slot < 0) + slot = 0; + slot >>= rtab->rate.cell_log; + if (slot > 255) + return rtab->data[255] + 1; + return rtab->data[slot]; +} + #endif --- linux-2.6.22/net/sched/act_police.c.bak 2007-07-09 02:32:17.0 +0300 +++ linux-2.6.22/net/sched/act_police.c 2007-07-18 21:42:49.275936447 +0300 @@ -32,8 +32,8 @@ #include #include -#define L2T(p,L) ((p)->tcfp_R_tab->data[(L)>>(p)->tcfp_R_tab->rate.cell_log]) -#define L2T_P(p,L) ((p)->tcfp_P_tab->data[(L)>>(p)->tcfp_P_tab->rate.cell_log]) +#define L2T(p,L) qdisc_l2t((p)->tcfp_R_tab,L) +#define L2T_P(p,L) qdisc_l2t((p)->tcfp_P_tab,L) #define POL_TAB_MASK 15 static struct tcf_common *tcf_police_ht[POL_TAB_MASK + 1]; --- linux-2.6.22/net/sched/sch_cbq.c.bak2007-07-09 02:32:17.0 +0300 +++ linux-2.6.22/net/sched/sch_cbq.c2007-07-18 21:51:12.794420373 +0300 @@ -192,7 +192,7 @@ struct cbq_sched_data }; -#define L2T(cl,len)((cl)->R_tab->data[(len)>>(cl)->R_tab->rate.cell_log]) +#define L2T(cl,len)qdisc_l2t((cl)->R_tab,len) static __inline__ unsigned cbq_hash(u32 h) --- linux-2.6.22/net/sched/sch_htb.c.bak2007-07-09 21:17:53.417438000 +0300 +++ linux-2.6.22/net/sched/sch_htb.c2007-07-18 21:50:08.602465126 +0300 @@ -157,12 +157,11 @@ struct htb_class { static inline long L2T(struct htb_class *cl, struct qdisc_rate_table *rate, int size) { - int slot = size >> rate->rate.cell_log; - if (slot > 255) { + long result = qdisc_l2t(rate, size); + + if (result > rate->data[255]) cl->xstats.giants++; - slot = 255; - } - return rate->data[slot]; + return result; } struct htb_sched { --- linux-2.6.22/net/sched/sch_tbf.c.bak2007-07-09 02:32:17.0 +0300 +++ linux-2.6.22/net/sched/sch_tbf.c2007-07-18 21:52:10.665281840 +0300 @@ -132,8 +132,8 @@ struct tbf_sched_data struct qdisc_watchdog watchdog; /* Watchdog timer */ }; -#define L2T(q,L) ((q)->R_tab->data[(L)>>(q)->R_tab->rate.cell_log]) -#define L2T_P(q,L) ((q)->P_tab->data[(L)>>(q)->P_tab->rate.cell_log]) +#define L2T(q,L) qdisc_l2t((q)->R_tab,L) +#define L2T_P(q,L) qdisc_l2t((q)->P_tab,L) static int tbf_enqueue(struct sk_buff *skb, struct Qdisc* sch) { # # include/linux/pkt_sched.h |5 +- #
Re: Linux 2.6.22: Leak r=1 1
On Thu, Jul 12, 2007 at 10:53:57 +0300, Ilpo Järvinen wrote: > On Wed, 11 Jul 2007, Sami Farin wrote: > > > That's right, so descriptive is the new Linux kernel 2.6.22. > > Took a while to grep what is "leaking". > > > > Linux safari.finland.fbi 2.6.22-cfs-v19 #3 SMP Tue Jul 10 00:22:25 EEST > > 2007 i686 i686 i386 GNU/Linux > > > > Just normal Internet usage, azureus for example =) > > I think this is easy to trigger. > > I guess those packet loss periods help you to reproduce it so easily. ... > I'd be interested to study some tcpdumps that relate to Leak cases you're > seeing. Could you record some Sami? I'm not sure though how one can figure I now have 300 MB capture and several new&retarded music videos... And 10 WARNINGs and 0 Leak printk's. 2007-07-12 12:03:18.910712500 <4>[ 1318.606826] WARNING: at net/ipv4/tcp_input.c:1402 tcp_enter_frto_loss() 2007-07-12 12:21:55.575049500 <4>[ 2434.941077] WARNING: at net/ipv4/tcp_input.c:1402 tcp_enter_frto_loss() 2007-07-12 12:25:56.626918500 <4>[ 2675.917531] WARNING: at net/ipv4/tcp_input.c:1402 tcp_enter_frto_loss() 2007-07-12 12:26:01.421714500 <4>[ 2680.710860] WARNING: at net/ipv4/tcp_input.c:1402 tcp_enter_frto_loss() 2007-07-12 12:27:55.996561500 <4>[ 2795.252008] WARNING: at net/ipv4/tcp_input.c:1402 tcp_enter_frto_loss() 2007-07-12 12:33:03.405492500 <4>[ 3102.570088] WARNING: at net/ipv4/tcp_input.c:1402 tcp_enter_frto_loss() 2007-07-12 12:33:59.837033500 <4>[ 3158.985152] WARNING: at net/ipv4/tcp_input.c:1402 tcp_enter_frto_loss() 2007-07-12 12:44:59.580682500 <4>[ 3818.697530] WARNING: at net/ipv4/tcp_input.c:1402 tcp_enter_frto_loss() 2007-07-12 12:45:06.146194500 <4>[ 3825.261028] WARNING: at net/ipv4/tcp_input.c:1402 tcp_enter_frto_loss() 2007-07-12 12:45:07.637015500 <4>[ 3826.751240] WARNING: at net/ipv4/tcp_input.c:1402 tcp_enter_frto_loss() This is MAYBE the guilty connection if timestamps are to be believed: 2007-07-12 12:02:35.311410 IP (tos 0x0, ttl 61, id 17078, offset 0, flags [none], proto: TCP (6), length: 60) 80.223.106.128.43771 > 62.203.174.236.24442: SWE, cksum 0x26f7 (correct), 1227344370:1227344370(0) win 5720 2007-07-12 12:02:38.281251 IP (tos 0x0, ttl 61, id 17079, offset 0, flags [none], proto: TCP (6), length: 60) 80.223.106.128.43771 > 62.203.174.236.24442: SWE, cksum 0x1b3f (correct), 1227344370:1227344370(0) win 5720 2007-07-12 12:02:38.792865 IP (tos 0x0, ttl 113, id 46391, offset 0, flags [DF], proto: TCP (6), length: 52) 62.203.174.236.24442 > 80.223.106.128.43771: ., cksum 0xc936 (correct), ack 1227344371 win 17640 2007-07-12 12:02:38.854298 IP (tos 0x0, ttl 113, id 46396, offset 0, flags [DF], proto: TCP (6), length: 64) 62.203.174.236.24442 > 80.223.106.128.43771: S, cksum 0x319e (correct), 602133927:602133927(0) ack 1227344371 win 17640 2007-07-12 12:02:38.854335 IP (tos 0x0, ttl 61, id 17080, offset 0, flags [none], proto: TCP (6), length: 52) 80.223.106.128.43771 > 62.203.174.236.24442: ., cksum 0x6251 (correct), ack 602133928 win 715 2007-07-12 12:02:38.858231 IP (tos 0x0, ttl 61, id 17081, offset 0, flags [none], proto: TCP (6), length: 372) 80.223.106.128.43771 > 62.203.174.236.24442: P, cksum 0xaa7d (incorrect (-> 0x006d), 1227344371:1227344691(320) ack 602133928 win 715 2007-07-12 12:02:39.305447 IP (tos 0x0, ttl 113, id 46441, offset 0, flags [DF], proto: TCP (6), length: 159) 62.203.174.236.24442 > 80.223.106.128.43771: P, cksum 0x18b6 (correct), 602133928:602134035(107) ack 1227344691 win 17320 2007-07-12 12:02:39.305482 IP (tos 0x0, ttl 61, id 17082, offset 0, flags [none], proto: TCP (6), length: 52) 80.223.106.128.43771 > 62.203.174.236.24442: ., cksum 0xf9de (correct), ack 602134035 win 715 2007-07-12 12:02:39.309403 IP (tos 0x0, ttl 61, id 17083, offset 0, flags [none], proto: TCP (6), length: 263) 80.223.106.128.43771 > 62.203.174.236.24442: P, cksum 0xaa10 (incorrect (-> 0xf1b3), 1227344691:1227344902(211) ack 602134035 win 715 2007-07-12 12:02:40.649923 IP (tos 0x0, ttl 61, id 17084, offset 0, flags [none], proto: TCP (6), length: 263) 80.223.106.128.43771 > 62.203.174.236.24442: P, cksum 0xaa10 (incorrect (-> 0xec76), 1227344691:1227344902(211) ack 602134035 win 715 2007-07-12 12:02:41.148856 IP (tos 0x0, ttl 113, id 46591, offset 0, flags [DF], proto: TCP (6), length: 52) 62.203.174.236.24442 > 80.223.106.128.43771: ., cksum 0xb73b (correct), ack 1227344902 win 17109 2007-07-12 12:02:42.679961 IP (tos 0x0, ttl 113, id 46707, offset 0, flags [DF], proto: TCP (6), length: 484) 62.203.174.236.24442 > 80.223.106.128.43771: P, cksum 0x3390 (correct), 602134035:602134467(432) ack 1227344902 win 17109 2007-07-12 12:02:42.703122 IP (tos 0x0, ttl 61, id 17085, offset 0, flags [none], proto: TCP (6), length: 120) 80.22
Linux 2.6.22: Leak r=1 1
That's right, so descriptive is the new Linux kernel 2.6.22. Took a while to grep what is "leaking". Linux safari.finland.fbi 2.6.22-cfs-v19 #3 SMP Tue Jul 10 00:22:25 EEST 2007 i686 i686 i386 GNU/Linux Just normal Internet usage, azureus for example =) I think this is easy to trigger. But that printk is not very useful, or is it? I am also using HTB+ESFQ to limit outgoing bandwidth... # ss -n|wc -l 870 # ping -A 80.223.96.1 PING 80.223.96.1 (80.223.96.1) 56(84) bytes of data. 64 bytes from 80.223.96.1: icmp_seq=1 ttl=255 time=431 ms ... --- 80.223.96.1 ping statistics --- 40 packets transmitted, 25 received, 37% packet loss, time 17954ms rtt min/avg/max/mdev = 406.000/467.758/530.983/29.384 ms, pipe 2, ipg/ewma 460.361/456.381 ms But ploss is only temporary (when I am downloading with azureus =) , when only uploading (95% of bandwidth used) rtt avg = 32ms). # dmesg|grep Leak [114992.191011] Leak r=1 4 [124231.713348] Leak r=1 4 [142807.938284] Leak r=1 4 [142999.674521] Leak r=1 1 [143177.462073] Leak r=1 4 [143230.001570] Leak r=1 4 [143232.982560] Leak r=1 4 [143234.537096] Leak r=1 4 [143297.927760] Leak r=1 4 [143300.633603] Leak r=1 4 [143302.172917] Leak r=1 4 [143357.083193] Leak r=1 1 [143361.780879] Leak r=1 4 [143413.706490] Leak r=1 4 [143552.996598] Leak r=1 1 [EMAIL PROTECTED] /proc/sys/net/ipv4]# grep . * icmp_echo_ignore_all:0 icmp_echo_ignore_broadcasts:1 icmp_errors_use_inbound_ifaddr:0 icmp_ignore_bogus_error_responses:1 icmp_ratelimit:1000 icmp_ratemask:6168 igmp_max_memberships:20 igmp_max_msf:10 inet_peer_gc_maxtime:120 inet_peer_gc_mintime:10 inet_peer_maxttl:600 inet_peer_minttl:120 inet_peer_threshold:65664 ip_default_ttl:61 ip_dynaddr:0 ip_forward:0 ip_local_port_range:4 65535 ip_no_pmtu_disc:1 ip_nonlocal_bind:0 ipfrag_high_thresh:262144 ipfrag_low_thresh:196608 ipfrag_max_dist:64 ipfrag_secret_interval:600 ipfrag_time:30 tcp_abc:2 tcp_abort_on_overflow:0 tcp_adv_win_scale:2 tcp_allowed_congestion_control:cubic bic reno tcp_app_win:31 tcp_available_congestion_control:cubic bic reno westwood vegas scalable hybla htcp highspeed tcp_base_mss:512 tcp_congestion_control:cubic tcp_dma_copybreak:4096 tcp_dsack:1 tcp_ecn:1 tcp_fack:1 tcp_fin_timeout:30 tcp_frto:1 tcp_frto_response:0 tcp_keepalive_intvl:75 tcp_keepalive_probes:9 tcp_keepalive_time:3300 tcp_low_latency:0 tcp_max_orphans:2048 tcp_max_ssthresh:0 tcp_max_syn_backlog:1024 tcp_max_tw_buckets:18 tcp_mem:95136 126848 190272 tcp_moderate_rcvbuf:1 tcp_mtu_probing:0 tcp_no_metrics_save:0 tcp_orphan_retries:0 tcp_reordering:3 tcp_retrans_collapse:0 tcp_retries1:3 tcp_retries2:15 tcp_rfc1337:0 tcp_rmem:4096 87380 262144 tcp_sack:1 tcp_slow_start_after_idle:0 tcp_stdurg:0 tcp_syn_retries:5 tcp_synack_retries:5 tcp_syncookies:0 tcp_timestamps:1 tcp_tso_win_divisor:3 tcp_tw_recycle:1 tcp_tw_reuse:0 tcp_window_scaling:1 tcp_wmem:4096 16384 262144 tcp_workaround_signed_windows:0 -- Do what you love because life is too short for anything else. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][IPv6]: Fix incorrect length check in rawv6_sendmsg()
On Thu, Mar 29, 2007 at 14:26:44 -0700, David Miller wrote: > From: Sridhar Samudrala <[EMAIL PROTECTED]> > Date: Thu, 29 Mar 2007 14:17:28 -0700 > > > The check for length in rawv6_sendmsg() is incorrect. > > As len is an unsigned int, (len < 0) will never be TRUE. > > I think checking for IPV6_MAXPLEN(65535) is better. > > > > Is it possible to send ipv6 jumbo packets using raw > > sockets? If so, we can remove this check. > > I don't see why such a limitation against jumbo would exist, > does anyone else? > > Thanks for catching this Sridhar. A good compiler should simply > fail to compile "if (x < 0)" when 'x' is an unsigned type, don't > you think :-) gcc warns if you use -W or -Wextra (but not if only -Wall is used): * An unsigned value is compared against zero with '<' or '>='. -- Do what you love because life is too short for anything else. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: asm volatile [Was: [RFC] div64_64 support]
On Wed, Mar 07, 2007 at 00:24:35 +0200, Sami Farin wrote: > On Tue, Mar 06, 2007 at 23:53:49 +0200, Sami Farin wrote: > ... > > And I found bug in gcc-4.1.2, it gave 0 for ncubic results > > when doing 1000 loops test... gcc-4.0.3 works. > > Found it. > > --- cbrt-test.c~ 2007-03-07 00:20:54.735248105 +0200 > +++ cbrt-test.c 2007-03-07 00:21:03.964864343 +0200 > @@ -209,7 +209,7 @@ > > __asm__("bsrl %1,%0\n\t" > "cmovzl %2,%0" > - : "=&r" (r) : "rm" (x), "rm" (-1)); > + : "=&r" (r) : "rm" (x), "rm" (-1) : "memory"); > return r+1; > } > > Now Linux 2.6 does not have "memory" in fls, maybe it causes > some gcc funnies some people are seeing. It also works without "memory" if I do "__asm__ volatile". Why some functions have volatile and some have not in include/asm-*/*.h ? -- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] div64_64 support
On Wed, Mar 07, 2007 at 11:11:49 -0500, Chuck Ebbert wrote: > Sami Farin wrote: > > On Tue, Mar 06, 2007 at 23:53:49 +0200, Sami Farin wrote: > > ... > >> And I found bug in gcc-4.1.2, it gave 0 for ncubic results > >> when doing 1000 loops test... gcc-4.0.3 works. > > > > Found it. > > > > --- cbrt-test.c~2007-03-07 00:20:54.735248105 +0200 > > +++ cbrt-test.c 2007-03-07 00:21:03.964864343 +0200 > > @@ -209,7 +209,7 @@ > > > > __asm__("bsrl %1,%0\n\t" > > "cmovzl %2,%0" > > - : "=&r" (r) : "rm" (x), "rm" (-1)); > > + : "=&r" (r) : "rm" (x), "rm" (-1) : "memory"); > > return r+1; > > } > > > > Now Linux 2.6 does not have "memory" in fls, maybe it causes > > some gcc funnies some people are seeing. > > Can you post the difference in the generated code with that change? Fun.. looks when not using "memory" gcc does not even bother calling ncubic() 666 times. So it gets better timings ( 42/666=0 ) =) --- cbrt-test-no_memory.s 2007-03-07 20:22:27.838466385 +0200 +++ cbrt-test-using_memory.s2007-03-07 20:22:38.237013197 +0200 ... main: leal4(%esp), %ecx andl$-16, %esp pushl -4(%ecx) pushl %ebp pushl %edi pushl %esi pushl %ebx pushl %ecx - subl$136, %esp + subl$152, %esp movl$.LC0, (%esp) callputs xorl%edx, %edx movl$27, %eax callncubic cmpl$3, %eax - je .L83 + je .L87 movl$.LC1, (%esp) callputs -.L83: - xorl%eax, %eax - xorl%edi, %edi - movl%eax, 88(%esp) +.L87: xorl%eax, %eax - xorl%esi, %esi + xorl%ebp, %ebp movl%eax, 92(%esp) xorl%eax, %eax - xorl%ebp, %ebp + xorl%edi, %edi movl%eax, 96(%esp) xorl%eax, %eax + xorl%esi, %esi movl%eax, 100(%esp) xorl%eax, %eax movl%eax, 104(%esp) xorl%eax, %eax movl%eax, 108(%esp) - movl%edi, 112(%esp) - movl%esi, 116(%esp) - .p2align 4,,15 -.L84: + xorl%eax, %eax + movl%eax, 112(%esp) + movl%ebp, 116(%esp) + movl%edi, 120(%esp) + movl%esi, 124(%esp) +.L88: #APP movl $0, %eax cpuid rdtsc #NO_APP movl%eax, 56(%esp) movl%edx, 60(%esp) #APP movl $0, %eax cpuid rdtsc #NO_APP movl%eax, %esi movl%edx, %edi subl56(%esp), %esi sbbl60(%esp), %edi cmpl$0, %edi ja .L66 cmpl$999, %esi - jbe .L84 + jbe .L88 .L66: + movl92(%esp), %edx + leal(%edx,%edx,2), %eax + movlcases+4(,%eax,4), %edi + movlcases(,%eax,4), %esi + movl%edi, %edx + movl%esi, %eax + callncubic #APP movl $0, %eax cpuid rdtsc #NO_APP - movl%eax, %esi - movl%edx, %edi + movl$666, %ebx + movl%eax, 128(%esp) + movl%edx, 132(%esp) + .p2align 4,,15 +.L67: + movl%esi, %eax + movl%edi, %edx + callncubic + decl%ebx + movl%eax, %ebp + jne .L67 #APP movl $0, %eax cpuid rdtsc #NO_APP - subl%esi, %eax + subl128(%esp), %eax movl$666, %ebx - sbbl%edi, %edx - xorl%ecx, %ecx movl%ebx, 8(%esp) + sbbl132(%esp), %edx + xorl%ecx, %ecx movl%ecx, 12(%esp) movl%eax, (%esp) movl%edx, 4(%esp) call__udivdi3 - addl%eax, 104(%esp) + addl%eax, 112(%esp) movl%edx, %ecx movl%eax, %ebx movl%edx, %esi - adcl%edx, 108(%esp) + adcl%edx, 116(%esp) imull %eax, %ecx mull%ebx addl%ecx, %ecx movl%eax, 56(%esp) addl%ecx, %edx movl56(%esp), %eax - addl%eax, 112(%esp) + addl%eax, 120(%esp) movl%edx, 60(%esp) movl60(%esp), %edx - adcl%edx, 116(%esp) - cmpl%esi, 92(%esp) - ja .L67 - jb .L68 - cmpl%ebx, 88(%esp) - jae .L67 -.L68: - movl%ebx, 88(%esp) - movl%esi, 92(%esp) -.L67: - leal(%ebp,%ebp,2), %ebx - sall$2, %ebx - movlcases+4(%ebx), %edx - movlcases(%ebx), %eax - callncubic - movl
Re: [RFC] div64_64 support
On Tue, Mar 06, 2007 at 23:53:49 +0200, Sami Farin wrote: ... > And I found bug in gcc-4.1.2, it gave 0 for ncubic results > when doing 1000 loops test... gcc-4.0.3 works. Found it. --- cbrt-test.c~2007-03-07 00:20:54.735248105 +0200 +++ cbrt-test.c 2007-03-07 00:21:03.964864343 +0200 @@ -209,7 +209,7 @@ __asm__("bsrl %1,%0\n\t" "cmovzl %2,%0" - : "=&r" (r) : "rm" (x), "rm" (-1)); + : "=&r" (r) : "rm" (x), "rm" (-1) : "memory"); return r+1; } Now Linux 2.6 does not have "memory" in fls, maybe it causes some gcc funnies some people are seeing. -- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] div64_64 support
On Tue, Mar 06, 2007 at 10:29:41 -0800, Stephen Hemminger wrote: > Don't count the existing Newton-Raphson out. It turns out that to get enough > precision for 32 bits, only 4 iterations are needed. By unrolling those, it > gets much better timing. > > Slightly gross test program (with original cubic wraparound bug fixed). ... > {~0, 2097151}, ^^^ this should be 2642245. Without serializing instruction before rdtsc and with one loop I do not get very accurate results (104 for ncubic, > 1000 for others). #define rdtscll_serialize(val) \ __asm__ __volatile__("movl $0, %%eax\n\tcpuid\n\trdtsc\n" : "=A" (val) : : "ebx", "ecx") Here Pentium D timings for 1000 loops. ~0, 2097151 Function clocks mean(us) max(us) std(us) total error ocubic 9120.306 20.3170.730 545101 ncubic 7770.261 14.7990.486 576263 acbrt 11680.392 21.6810.547 547562 hcbrt 8270.278 15.2440.3872410 ~0, 2642245 Function clocks mean(us) max(us) std(us) total error ocubic 9080.305 20.2100.656 7 ncubic 7750.260 14.7920.550 31169 acbrt 11760.395 22.0170.9702468 hcbrt 8260.278 15.3260.670 547504 And I found bug in gcc-4.1.2, it gave 0 for ncubic results when doing 1000 loops test... gcc-4.0.3 works. -- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] div64_64 support
On Fri, Feb 23, 2007 at 17:05:27 -0800, Stephen Hemminger wrote: > Since there already two users of full 64 bit division in the kernel, > and other places maybe hiding out as well. Add a full 64/64 bit divide. > > Yes this expensive, but there are places where it is necessary. > It is not clear if doing the scaling buys any advantage on 64 bit platforms, > so for them a full divide is done. Still does not work after these fixes... how came? WARNING: "div64_64" [net/netfilter/xt_connbytes.ko] undefined! WARNING: "div64_64" [net/ipv4/tcp_cubic.ko] undefined! --- linux-2.6.19/include/asm-i386/div64.h.bak 2006-11-29 23:57:37.0 +0200 +++ linux-2.6.19/include/asm-i386/div64.h 2007-02-24 16:24:55.822529880 +0200 @@ -45,4 +45,7 @@ div_ll_X_l_rem(long long divs, long div, return dum2; } + +extern uint64_t div64_64(uint64_t dividend, uint64_t divisor); + #endif --- linux-2.6.19/lib/div64.c.bak2007-02-24 16:10:03.686084000 +0200 +++ linux-2.6.19/lib/div64.c2007-02-24 17:01:11.224517353 +0200 @@ -80,4 +80,6 @@ uint64_t div64_64(uint64_t dividend, uin return dividend; } +EXPORT_SYMBOL(div64_64); + #endif /* BITS_PER_LONG == 32 */ -- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html