[PATCH net-next] phy: make some bits preserved while setup forced mode
When tested the PHY SGMII Loopback: 1.set the LOOPBACK bit, 2.set the autoneg to AUTONEG_DISABLE, it calls the genphy_setup_forced which will clear the bit. The BMCR_LOOPBACK bit should be preserved. As Florian pointed out that other bits should be preserved too. So I make the BMCR_ISOLATE and BMCR_PDOWN as well. Signed-off-by: Weidong Wang <wangweido...@huawei.com> --- the patch is evoluted from "phy: keep the BCMR_LOOPBACK bit while setup forced mode". --- drivers/net/phy/phy_device.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index d551df6..9b24d7e 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -818,8 +818,9 @@ static int genphy_config_advert(struct phy_device *phydev) */ int genphy_setup_forced(struct phy_device *phydev) { - int ctl = 0; + int ctl = phy_read(phydev, MII_BMCR); + ctl &= BMCR_LOOPBACK | BMCR_ISOLATE | BMCR_PDOWN; phydev->pause = 0; phydev->asym_pause = 0; -- 2.7.0
Re: [RESEND PATCH net-next] phy: keep the BCMR_LOOPBACK bit while setup forced mode
On 2016/4/14 2:41, Florian Fainelli wrote: > On 13/04/16 04:59, Weidong Wang wrote: >> When tested the PHY SGMII Loopback,: >> 1.set the LOOPBACK bit, >> 2.set the autoneg to AUTONEG_DISABLE, it calls the >> genphy_setup_forced which will clear the bit. >> >> So just keep the LOOPBACK bit while setup forced mode. > > Humm, it makes sense why we want this one, but maybe we want other bits > to be preserved too, like MII_ISOLATE for instance? > I will add the MII_ISOLATE as well. > Or maybe we should have a separate way to put the PHY into loopback mode > which is deterministic and takes care of forcing the link at the same time? > I test with the loopback mode, and the link status is undeterminable. >> >> Signed-off-by: Weidong Wang <wangweido...@huawei.com> >> --- >> drivers/net/phy/phy_device.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c >> index e551f3a..8da4b80 100644 >> --- a/drivers/net/phy/phy_device.c >> +++ b/drivers/net/phy/phy_device.c >> @@ -1124,7 +1124,9 @@ static int genphy_config_advert(struct phy_device >> *phydev) >> int genphy_setup_forced(struct phy_device *phydev) >> { >> int ctl = 0; >> +int val = phy_read(phydev, MII_BMCR); >> >> +ctl |= val & BMCR_LOOPBACK; >> phydev->pause = 0; >> phydev->asym_pause = 0; >> >> -- 2.7.0 >> > >
Re: [RESEND PATCH net-next] phy: keep the BCMR_LOOPBACK bit while setup forced mode
On 2016/4/13 22:19, Sergei Shtylyov wrote: > Hello. > > On 4/13/2016 2:59 PM, Weidong Wang wrote: > >> When tested the PHY SGMII Loopback,: >> 1.set the LOOPBACK bit, >> 2.set the autoneg to AUTONEG_DISABLE, it calls the >> genphy_setup_forced which will clear the bit. >> >> So just keep the LOOPBACK bit while setup forced mode. >> >> Signed-off-by: Weidong Wang <wangweido...@huawei.com> >> --- >> drivers/net/phy/phy_device.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c >> index e551f3a..8da4b80 100644 >> --- a/drivers/net/phy/phy_device.c >> +++ b/drivers/net/phy/phy_device.c >> @@ -1124,7 +1124,9 @@ static int genphy_config_advert(struct phy_device >> *phydev) >> int genphy_setup_forced(struct phy_device *phydev) >> { >> int ctl = 0; >> +int val = phy_read(phydev, MII_BMCR); > >Please place this declaration first, DaveM prefers declarations to be > sorted from longest to shortest. > >> >> +ctl |= val & BMCR_LOOPBACK; > >Just =, removing the 'ctl' initializer, please. > > [...] > > MBR, Sergei > Got it. Regards, Weidong > >
[RESEND PATCH net-next] phy: keep the BCMR_LOOPBACK bit while setup forced mode
When tested the PHY SGMII Loopback,: 1.set the LOOPBACK bit, 2.set the autoneg to AUTONEG_DISABLE, it calls the genphy_setup_forced which will clear the bit. So just keep the LOOPBACK bit while setup forced mode. Signed-off-by: Weidong Wang <wangweido...@huawei.com> --- drivers/net/phy/phy_device.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index e551f3a..8da4b80 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -1124,7 +1124,9 @@ static int genphy_config_advert(struct phy_device *phydev) int genphy_setup_forced(struct phy_device *phydev) { int ctl = 0; + int val = phy_read(phydev, MII_BMCR); + ctl |= val & BMCR_LOOPBACK; phydev->pause = 0; phydev->asym_pause = 0; -- 2.7.0
[PATCH net-next] phy: keep the BCMR_LOOPBACK bit while setup forced mode
When tested the PHY SGMII Loopback,: 1.set the LOOPBACK bit, 2.set the autoneg to AUTONEG_DISABLE, it calls the genphy_setup_forced which will clear the bit. So just keep the LOOPBACK bit while setup forced mode. Signed-off-by: Weidong Wang <wangweido...@huawei.com> --- drivers/net/phy/phy_device.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index e551f3a..8da4b80 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -1124,7 +1124,9 @@ static int genphy_config_advert(struct phy_device *phydev) int genphy_setup_forced(struct phy_device *phydev) { int ctl = 0; + int val = phy_read(phydev, MII_BMCR); + ctl |= val & BMCR_LOOPBACK; phydev->pause = 0; phydev->asym_pause = 0; -- 2.7.0
[PATCH net-next] netfilter: nf_conntrack: remove the unneed check for *bucket
In the 'for(...) {}', the *bucket alwasy < net->ct.htable_size, so remove the check Signed-off-by: Weidong Wang <wangweido...@huawei.com> --- net/netfilter/nf_conntrack_core.c | 14 ++ 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 3cb3cb8..cd7d5c8 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -1383,14 +1383,12 @@ get_next_corpse(struct net *net, int (*iter)(struct nf_conn *i, void *data), lockp = _conntrack_locks[*bucket % CONNTRACK_LOCKS]; local_bh_disable(); spin_lock(lockp); - if (*bucket < net->ct.htable_size) { - hlist_nulls_for_each_entry(h, n, >ct.hash[*bucket], hnnode) { - if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL) - continue; - ct = nf_ct_tuplehash_to_ctrack(h); - if (iter(ct, data)) - goto found; - } + hlist_nulls_for_each_entry(h, n, >ct.hash[*bucket], hnnode) { + if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL) + continue; + ct = nf_ct_tuplehash_to_ctrack(h); + if (iter(ct, data)) + goto found; } spin_unlock(lockp); local_bh_enable(); -- 2.7.0
Re: [PATCH net-next] netfilter: nf_conntrack: remove the unneed check for *bucket
On 2016/1/31 5:30, Florian Westphal wrote: > Weidong Wang <wangweido...@huawei.com> wrote: >> In the 'for(...) {}', the *bucket alwasy < net->ct.htable_size, >> so remove the check >> @@ -1383,14 +1383,12 @@ get_next_corpse(struct net *net, int (*iter)(struct >> nf_conn *i, void *data), >> lockp = _conntrack_locks[*bucket % CONNTRACK_LOCKS]; >> local_bh_disable(); >> spin_lock(lockp); >> -if (*bucket < net->ct.htable_size) { > > AFAIU net->ct.htable_size can shrink between for-test and aquiring > the bucket lockp, so this additional if-test is needed. > ok, Got it. So ignore this patch. Regards, Weidong > . >
[PATCH net-next] BNX2: free temp_stats_blk on error path
In bnx2_init_board, missing free temp_stats_blk on error path when some operations do failed. Just add the 'kfree' operation. Signed-off-by: Wang Weidong <wangweido...@huawei.com> --- drivers/net/ethernet/broadcom/bnx2.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/broadcom/bnx2.c b/drivers/net/ethernet/broadcom/bnx2.c index 6259064..8fc3f3c 100644 --- a/drivers/net/ethernet/broadcom/bnx2.c +++ b/drivers/net/ethernet/broadcom/bnx2.c @@ -8476,6 +8476,8 @@ err_out_disable: pci_disable_device(pdev); err_out: + kfree(bp->temp_stats_blk); + return rc; } -- 1.9.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v3] BNX2: fix a Null Pointer for stats_blk
we have two processes to do: P1#: ifconfig eth0 down; which will call bnx2_close, then will , and set Null to stats_blk P2#: ifconfig eth0; which will call bnx2_get_stats64, it will use stats_blk. In one case: --P1#-- --P2#-- stats_blk(no null) bnx2_free_mem ->bp->stats_blk = NULL GET_64BIT_NET_STATS then it will cause 'NULL Pointer' Problem. it is as well with 'ethtool -S ethx'. Allocate the statistics block at probe time so that this problem is impossible Signed-off-by: Wang Weidong <wangweido...@huawei.com> --- Change in v2: - Use Allocate the statistics block instead of spinlock, which suggested by David Miller. - Updating commit message according to changes. Change in v3: - bnx2_alloc_stats_blk is just allocating the stats block. - use 'bp->status_blk' to store the status_blk which would used in other funcs, just to key the codes simplified. --- drivers/net/ethernet/broadcom/bnx2.c | 79 drivers/net/ethernet/broadcom/bnx2.h | 1 + 2 files changed, 53 insertions(+), 27 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnx2.c b/drivers/net/ethernet/broadcom/bnx2.c index 2b66ef3..6259064 100644 --- a/drivers/net/ethernet/broadcom/bnx2.c +++ b/drivers/net/ethernet/broadcom/bnx2.c @@ -813,6 +813,46 @@ bnx2_alloc_rx_mem(struct bnx2 *bp) } static void +bnx2_free_stats_blk(struct net_device *dev) +{ + struct bnx2 *bp = netdev_priv(dev); + + if (bp->status_blk) { + dma_free_coherent(>pdev->dev, bp->status_stats_size, + bp->status_blk, + bp->status_blk_mapping); + bp->status_blk = NULL; + bp->stats_blk = NULL; + } +} + +static int +bnx2_alloc_stats_blk(struct net_device *dev) +{ + int status_blk_size; + void *status_blk; + struct bnx2 *bp = netdev_priv(dev); + + /* Combine status and statistics blocks into one allocation. */ + status_blk_size = L1_CACHE_ALIGN(sizeof(struct status_block)); + if (bp->flags & BNX2_FLAG_MSIX_CAP) + status_blk_size = L1_CACHE_ALIGN(BNX2_MAX_MSIX_HW_VEC * +BNX2_SBLK_MSIX_ALIGN_SIZE); + bp->status_stats_size = status_blk_size + + sizeof(struct statistics_block); + status_blk = dma_zalloc_coherent(>pdev->dev, bp->status_stats_size, +>status_blk_mapping, GFP_KERNEL); + if (status_blk == NULL) + return -ENOMEM; + + bp->status_blk = status_blk; + bp->stats_blk = status_blk + status_blk_size; + bp->stats_blk_mapping = bp->status_blk_mapping + status_blk_size; + + return 0; +} + +static void bnx2_free_mem(struct bnx2 *bp) { int i; @@ -829,37 +869,19 @@ bnx2_free_mem(struct bnx2 *bp) bp->ctx_blk[i] = NULL; } } - if (bnapi->status_blk.msi) { - dma_free_coherent(>pdev->dev, bp->status_stats_size, - bnapi->status_blk.msi, - bp->status_blk_mapping); + + if (bnapi->status_blk.msi) bnapi->status_blk.msi = NULL; - bp->stats_blk = NULL; - } } static int bnx2_alloc_mem(struct bnx2 *bp) { - int i, status_blk_size, err; + int i, err; struct bnx2_napi *bnapi; - void *status_blk; - - /* Combine status and statistics blocks into one allocation. */ - status_blk_size = L1_CACHE_ALIGN(sizeof(struct status_block)); - if (bp->flags & BNX2_FLAG_MSIX_CAP) - status_blk_size = L1_CACHE_ALIGN(BNX2_MAX_MSIX_HW_VEC * -BNX2_SBLK_MSIX_ALIGN_SIZE); - bp->status_stats_size = status_blk_size + - sizeof(struct statistics_block); - - status_blk = dma_zalloc_coherent(>pdev->dev, bp->status_stats_size, ->status_blk_mapping, GFP_KERNEL); - if (status_blk == NULL) - goto alloc_mem_err; bnapi = >bnx2_napi[0]; - bnapi->status_blk.msi = status_blk; + bnapi->status_blk.msi = bp->status_blk; bnapi->hw_tx_cons_ptr = >status_blk.msi->status_tx_quick_consumer_index0; bnapi->hw_rx_cons_ptr = @@ -870,7 +892,7 @@ bnx2_alloc_mem(struct bnx2 *bp) bnapi = >bnx2_napi[i]; - sblk = (status_blk + BNX2_SBLK_MSIX_ALIGN_SIZE * i); + sblk = (bp->status_blk + BNX2_SBLK_MSIX_ALIGN_SIZE * i); bnapi->status_blk.msix = sblk; bnapi->hw_tx_cons_pt
[PATCH net-next v2] BNX2: fix a Null Pointer for stats_blk
we have two processes to do: P1#: ifconfig eth0 down; which will call bnx2_close, then will , and set Null to stats_blk P2#: ifconfig eth0; which will call bnx2_get_stats64, it will use stats_blk. In one case: --P1#-- --P2#-- stats_blk(no null) bnx2_free_mem ->bp->stats_blk = NULL GET_64BIT_NET_STATS then it will cause 'NULL Pointer' Problem. it is as well with 'ethtool -S ethx'. Allocate the statistics block at probe time so that this problem is impossible Signed-off-by: Tianhong Ding--- Change in v2: - Use Allocate the statistics block instead of spinlock, which suggested by David Miller. - Updating commit message according to changes. --- drivers/net/ethernet/broadcom/bnx2.c | 61 +++- 1 file changed, 40 insertions(+), 21 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnx2.c b/drivers/net/ethernet/broadcom/bnx2.c index 2b66ef3..1f33982 100644 --- a/drivers/net/ethernet/broadcom/bnx2.c +++ b/drivers/net/ethernet/broadcom/bnx2.c @@ -813,22 +813,11 @@ bnx2_alloc_rx_mem(struct bnx2 *bp) } static void -bnx2_free_mem(struct bnx2 *bp) +bnx2_free_stats_blk(struct net_device *dev) { - int i; + struct bnx2 *bp = netdev_priv(dev); struct bnx2_napi *bnapi = >bnx2_napi[0]; - bnx2_free_tx_mem(bp); - bnx2_free_rx_mem(bp); - - for (i = 0; i < bp->ctx_pages; i++) { - if (bp->ctx_blk[i]) { - dma_free_coherent(>pdev->dev, BNX2_PAGE_SIZE, - bp->ctx_blk[i], - bp->ctx_blk_mapping[i]); - bp->ctx_blk[i] = NULL; - } - } if (bnapi->status_blk.msi) { dma_free_coherent(>pdev->dev, bp->status_stats_size, bnapi->status_blk.msi, @@ -839,11 +828,12 @@ bnx2_free_mem(struct bnx2 *bp) } static int -bnx2_alloc_mem(struct bnx2 *bp) +bnx2_alloc_stats_blk(struct net_device *dev) { - int i, status_blk_size, err; + int i, status_blk_size; struct bnx2_napi *bnapi; void *status_blk; + struct bnx2 *bp = netdev_priv(dev); /* Combine status and statistics blocks into one allocation. */ status_blk_size = L1_CACHE_ALIGN(sizeof(struct status_block)); @@ -852,11 +842,10 @@ bnx2_alloc_mem(struct bnx2 *bp) BNX2_SBLK_MSIX_ALIGN_SIZE); bp->status_stats_size = status_blk_size + sizeof(struct statistics_block); - status_blk = dma_zalloc_coherent(>pdev->dev, bp->status_stats_size, >status_blk_mapping, GFP_KERNEL); if (status_blk == NULL) - goto alloc_mem_err; + return -ENOMEM; bnapi = >bnx2_napi[0]; bnapi->status_blk.msi = status_blk; @@ -865,11 +854,10 @@ bnx2_alloc_mem(struct bnx2 *bp) bnapi->hw_rx_cons_ptr = >status_blk.msi->status_rx_quick_consumer_index0; if (bp->flags & BNX2_FLAG_MSIX_CAP) { - for (i = 1; i < bp->irq_nvecs; i++) { + for (i = 1; i < BNX2_MAX_MSIX_HW_VEC; i++) { struct status_block_msix *sblk; bnapi = >bnx2_napi[i]; - sblk = (status_blk + BNX2_SBLK_MSIX_ALIGN_SIZE * i); bnapi->status_blk.msix = sblk; bnapi->hw_tx_cons_ptr = @@ -879,11 +867,35 @@ bnx2_alloc_mem(struct bnx2 *bp) bnapi->int_num = i << 24; } } - bp->stats_blk = status_blk + status_blk_size; - bp->stats_blk_mapping = bp->status_blk_mapping + status_blk_size; + return 0; +} + +static void +bnx2_free_mem(struct bnx2 *bp) +{ + int i; + + bnx2_free_tx_mem(bp); + bnx2_free_rx_mem(bp); + + for (i = 0; i < bp->ctx_pages; i++) { + if (bp->ctx_blk[i]) { + dma_free_coherent(>pdev->dev, BNX2_PAGE_SIZE, + bp->ctx_blk[i], + bp->ctx_blk_mapping[i]); + bp->ctx_blk[i] = NULL; + } + } +} + +static int +bnx2_alloc_mem(struct bnx2 *bp) +{ + int i, err; + if (BNX2_CHIP(bp) == BNX2_CHIP_5709) { bp->ctx_pages = 0x2000 / BNX2_PAGE_SIZE; if (bp->ctx_pages == 0) @@ -8330,6 +8342,11 @@ bnx2_init_board(struct pci_dev *pdev, struct net_device *dev) bp->phy_addr = 1; + /* allocate stats_blk */ + rc = bnx2_alloc_stats_blk(dev); + if (rc) + goto err_out_unmap; + /* Disable WOL support if we are running on a SERDES chip. */ if (BNX2_CHIP(bp) == BNX2_CHIP_5709) bnx2_get_5709_media(bp); @@ -8586,6
[PATCH net-next v2 RESEND] BNX2: fix a Null Pointer for stats_blk
we have two processes to do: P1#: ifconfig eth0 down; which will call bnx2_close, then will , and set Null to stats_blk P2#: ifconfig eth0; which will call bnx2_get_stats64, it will use stats_blk. In one case: --P1#-- --P2#-- stats_blk(no null) bnx2_free_mem ->bp->stats_blk = NULL GET_64BIT_NET_STATS then it will cause 'NULL Pointer' Problem. it is as well with 'ethtool -S ethx'. Allocate the statistics block at probe time so that this problem is impossible Signed-off-by: Weidong Wang <wangweido...@huawei.com> --- Change in v2: - Use Allocate the statistics block instead of spinlock, which suggested by David Miller. - Updating commit message according to changes. --- drivers/net/ethernet/broadcom/bnx2.c | 61 +++- 1 file changed, 40 insertions(+), 21 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnx2.c b/drivers/net/ethernet/broadcom/bnx2.c index 2b66ef3..1f33982 100644 --- a/drivers/net/ethernet/broadcom/bnx2.c +++ b/drivers/net/ethernet/broadcom/bnx2.c @@ -813,22 +813,11 @@ bnx2_alloc_rx_mem(struct bnx2 *bp) } static void -bnx2_free_mem(struct bnx2 *bp) +bnx2_free_stats_blk(struct net_device *dev) { - int i; + struct bnx2 *bp = netdev_priv(dev); struct bnx2_napi *bnapi = >bnx2_napi[0]; - bnx2_free_tx_mem(bp); - bnx2_free_rx_mem(bp); - - for (i = 0; i < bp->ctx_pages; i++) { - if (bp->ctx_blk[i]) { - dma_free_coherent(>pdev->dev, BNX2_PAGE_SIZE, - bp->ctx_blk[i], - bp->ctx_blk_mapping[i]); - bp->ctx_blk[i] = NULL; - } - } if (bnapi->status_blk.msi) { dma_free_coherent(>pdev->dev, bp->status_stats_size, bnapi->status_blk.msi, @@ -839,11 +828,12 @@ bnx2_free_mem(struct bnx2 *bp) } static int -bnx2_alloc_mem(struct bnx2 *bp) +bnx2_alloc_stats_blk(struct net_device *dev) { - int i, status_blk_size, err; + int i, status_blk_size; struct bnx2_napi *bnapi; void *status_blk; + struct bnx2 *bp = netdev_priv(dev); /* Combine status and statistics blocks into one allocation. */ status_blk_size = L1_CACHE_ALIGN(sizeof(struct status_block)); @@ -852,11 +842,10 @@ bnx2_alloc_mem(struct bnx2 *bp) BNX2_SBLK_MSIX_ALIGN_SIZE); bp->status_stats_size = status_blk_size + sizeof(struct statistics_block); - status_blk = dma_zalloc_coherent(>pdev->dev, bp->status_stats_size, >status_blk_mapping, GFP_KERNEL); if (status_blk == NULL) - goto alloc_mem_err; + return -ENOMEM; bnapi = >bnx2_napi[0]; bnapi->status_blk.msi = status_blk; @@ -865,11 +854,10 @@ bnx2_alloc_mem(struct bnx2 *bp) bnapi->hw_rx_cons_ptr = >status_blk.msi->status_rx_quick_consumer_index0; if (bp->flags & BNX2_FLAG_MSIX_CAP) { - for (i = 1; i < bp->irq_nvecs; i++) { + for (i = 1; i < BNX2_MAX_MSIX_HW_VEC; i++) { struct status_block_msix *sblk; bnapi = >bnx2_napi[i]; - sblk = (status_blk + BNX2_SBLK_MSIX_ALIGN_SIZE * i); bnapi->status_blk.msix = sblk; bnapi->hw_tx_cons_ptr = @@ -879,11 +867,35 @@ bnx2_alloc_mem(struct bnx2 *bp) bnapi->int_num = i << 24; } } - bp->stats_blk = status_blk + status_blk_size; - bp->stats_blk_mapping = bp->status_blk_mapping + status_blk_size; + return 0; +} + +static void +bnx2_free_mem(struct bnx2 *bp) +{ + int i; + + bnx2_free_tx_mem(bp); + bnx2_free_rx_mem(bp); + + for (i = 0; i < bp->ctx_pages; i++) { + if (bp->ctx_blk[i]) { + dma_free_coherent(>pdev->dev, BNX2_PAGE_SIZE, + bp->ctx_blk[i], + bp->ctx_blk_mapping[i]); + bp->ctx_blk[i] = NULL; + } + } +} + +static int +bnx2_alloc_mem(struct bnx2 *bp) +{ + int i, err; + if (BNX2_CHIP(bp) == BNX2_CHIP_5709) { bp->ctx_pages = 0x2000 / BNX2_PAGE_SIZE; if (bp->ctx_pages == 0) @@ -8330,6 +8342,11 @@ bnx2_init_board(struct pci_dev *pdev, struct net_device *dev) bp->phy_addr = 1; + /* allocate stats_blk */ + rc = bnx2_alloc_stats_blk(dev); + if (rc) + goto err_out_unmap; +
Re: [PATCH net-next v2] BNX2: fix a Null Pointer for stats_blk
On 2015/9/28 15:01, Weidong Wang wrote: > we have two processes to do: > P1#: ifconfig eth0 down; which will call bnx2_close, then will > , and set Null to stats_blk > P2#: ifconfig eth0; which will call bnx2_get_stats64, it will > use stats_blk. > In one case: > --P1#-- --P2#-- > stats_blk(no null) > bnx2_free_mem > ->bp->stats_blk = NULL > GET_64BIT_NET_STATS > > then it will cause 'NULL Pointer' Problem. > it is as well with 'ethtool -S ethx'. > > Allocate the statistics block at probe time so that this problem is > impossible > > Signed-off-by: Tianhong Ding <dingtianh...@huawei.com> > --- Sorry for that, The sob is Error. I will fixed it. Just Ignore it. Regards. Weidong -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] BNX2: fix a Null Pointer for stats_blk
On 2015/9/24 13:34, David Miller wrote: > From: Weidong Wang <wangweido...@huawei.com> > Date: Thu, 24 Sep 2015 10:00:45 +0800 > >> It does affect the intention. Although, the problem exists then makes the >> system panic within some case. >> >> Do you have any idea about it? > > Allocate the statistics block at probe time so that this problem is > impossible. > It is a good idea. Yet, what is the intention of the dynamic to alloc/free stats_block? what will be affected by allocating the statistics block. Best Regards, Weidong > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] BNX2: fix a Null Pointer for stats_blk
On 2015/9/24 6:31, David Miller wrote: > From: Weidong Wang <wangweido...@huawei.com> > Date: Tue, 22 Sep 2015 20:42:40 +0800 > >> @@ -880,6 +882,7 @@ bnx2_alloc_mem(struct bnx2 *bp) >> } >> } >> >> +spin_lock(>stats64_lock); >> bp->stats_blk = status_blk + status_blk_size; >> >> bp->stats_blk_mapping = bp->status_blk_mapping + status_blk_size; >> @@ -894,20 +897,23 @@ bnx2_alloc_mem(struct bnx2 *bp) >> >ctx_blk_mapping[i], >> GFP_KERNEL); >> if (bp->ctx_blk[i] == NULL) >> -goto alloc_mem_err; >> +goto free_stats64_lock; >> } >> } >> >> err = bnx2_alloc_rx_mem(bp); >> if (err) >> -goto alloc_mem_err; >> +goto free_stats64_lock; > > You're holding a spinlock while doing GFP_KERNEL allocations. > hm, yep, I should move it after the allocations. Like this: @@ -880,7 +882,9 @@ bnx2_alloc_mem(struct bnx2 *bp) } } + spin_lock(>stats64_lock); bp->stats_blk = status_blk + status_blk_size; + spin_unlock(>stats64_lock); the allocations won't use the stats_blk, so I shouldn't hold the lock while doing allocations. > Second of all, taking a spinlock in get_stats64() defeats the whole > intention of making statistics acquisition as fast and as SMP scalable > as possible. > It does affect the intention. Although, the problem exists then makes the system panic within some case. Do you have any idea about it? Best Regards, Weidong > . > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] BNX2: fix a Null Pointer for stats_blk
we have two processes to do: P1#: ifconfig eth0 down; which will call bnx2_close, then will , and set Null to stats_blk P2#: ifconfig eth0; which will call bnx2_get_stats64, it will use stats_blk. In one case: --P1#-- --P2#-- stats_blk(no null) bnx2_free_mem ->bp->stats_blk = NULL GET_64BIT_NET_STATS then it will cause 'NULL Pointer' Problem. it is as well with 'ethtool -S ethx'. BTW, the other branch has this problem as well. So we add a spin_lock to protect stats_blk. Signed-off-by: Wang Weidong <wangweido...@huawei.com> --- drivers/net/ethernet/broadcom/bnx2.c | 42 drivers/net/ethernet/broadcom/bnx2.h | 1 + 2 files changed, 34 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnx2.c b/drivers/net/ethernet/broadcom/bnx2.c index 2b66ef3..aec4081 100644 --- a/drivers/net/ethernet/broadcom/bnx2.c +++ b/drivers/net/ethernet/broadcom/bnx2.c @@ -830,11 +830,13 @@ bnx2_free_mem(struct bnx2 *bp) } } if (bnapi->status_blk.msi) { + spin_lock(>stats64_lock); dma_free_coherent(>pdev->dev, bp->status_stats_size, bnapi->status_blk.msi, bp->status_blk_mapping); bnapi->status_blk.msi = NULL; bp->stats_blk = NULL; + spin_unlock(>stats64_lock); } } @@ -880,6 +882,7 @@ bnx2_alloc_mem(struct bnx2 *bp) } } + spin_lock(>stats64_lock); bp->stats_blk = status_blk + status_blk_size; bp->stats_blk_mapping = bp->status_blk_mapping + status_blk_size; @@ -894,20 +897,23 @@ bnx2_alloc_mem(struct bnx2 *bp) >ctx_blk_mapping[i], GFP_KERNEL); if (bp->ctx_blk[i] == NULL) - goto alloc_mem_err; + goto free_stats64_lock; } } err = bnx2_alloc_rx_mem(bp); if (err) - goto alloc_mem_err; + goto free_stats64_lock; err = bnx2_alloc_tx_mem(bp); if (err) - goto alloc_mem_err; + goto free_stats64_lock; + spin_unlock(>stats64_lock); return 0; +free_stats64_lock: + spin_unlock(>stats64_lock); alloc_mem_err: bnx2_free_mem(bp); return -ENOMEM; @@ -6756,10 +6762,14 @@ bnx2_close(struct net_device *dev) static void bnx2_save_stats(struct bnx2 *bp) { - u32 *hw_stats = (u32 *) bp->stats_blk; - u32 *temp_stats = (u32 *) bp->temp_stats_blk; + u32 *hw_stats; + u32 *temp_stats; int i; + spin_lock(>stats64_lock); + hw_stats = (u32 *) bp->stats_blk; + temp_stats = (u32 *) bp->temp_stats_blk; + /* The 1st 10 counters are 64-bit counters */ for (i = 0; i < 20; i += 2) { u32 hi; @@ -6775,6 +6785,8 @@ bnx2_save_stats(struct bnx2 *bp) for ( ; i < sizeof(struct statistics_block) / 4; i++) temp_stats[i] += hw_stats[i]; + + spin_unlock(>stats64_lock); } #define GET_64BIT_NET_STATS64(ctr) \ @@ -6793,8 +6805,11 @@ bnx2_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *net_stats) { struct bnx2 *bp = netdev_priv(dev); - if (bp->stats_blk == NULL) + spin_lock(>stats64_lock); + if (bp->stats_blk == NULL) { + spin_unlock(>stats64_lock); return net_stats; + } net_stats->rx_packets = GET_64BIT_NET_STATS(stat_IfHCInUcastPkts) + @@ -6858,6 +6873,7 @@ bnx2_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *net_stats) GET_32BIT_NET_STATS(stat_IfInMBUFDiscards) + GET_32BIT_NET_STATS(stat_FwRxDrop); + spin_unlock(>stats64_lock); return net_stats; } @@ -7634,13 +7650,17 @@ bnx2_get_ethtool_stats(struct net_device *dev, { struct bnx2 *bp = netdev_priv(dev); int i; - u32 *hw_stats = (u32 *) bp->stats_blk; - u32 *temp_stats = (u32 *) bp->temp_stats_blk; + u32 *hw_stats; + u32 *temp_stats; u8 *stats_len_arr = NULL; + spin_lock(>stats64_lock); + hw_stats = (u32 *) bp->stats_blk; + temp_stats = (u32 *) bp->temp_stats_blk; + if (hw_stats == NULL) { memset(buf, 0, sizeof(u64) * BNX2_NUM_STATS); - return; + goto free_stats64_lock; } if ((BNX2_CHIP_ID(bp) == BNX2_CHIP_ID_5706_A0) || @@ -7673,6 +7693,9 @@ bnx2_get_ethtool_stats(struct net_device *dev, (((u64) *(temp_stats + offset)) << 32) +
Re: Fix ipOutNoRoutes counter error for TCP and UDP
Mr David Sorry to trouble many times, I will attention to this next time. I have made this patch again, and I tried, it can be patched to the recently kernel of linux-2.6.21.3. following is the patch, and I also attach this patch to the attachment. Signed-off-by: Wei Dong [EMAIL PROTECTED] diff -Nurp a/net/ipv4/datagram.c b/net/ipv4/datagram.c --- a/net/ipv4/datagram.c 2007-06-01 13:22:39.0 +0800 +++ b/net/ipv4/datagram.c 2007-06-01 13:24:21.0 +0800 @@ -50,8 +50,12 @@ int ip4_datagram_connect(struct sock *sk RT_CONN_FLAGS(sk), oif, sk-sk_protocol, inet-sport, usin-sin_port, sk, 1); - if (err) + if (err) { + if (err == -ENETUNREACH) + IP_INC_STATS_BH(IPSTATS_MIB_OUTNOROUTES); return err; + } + if ((rt-rt_flags RTCF_BROADCAST) !sock_flag(sk, SOCK_BROADCAST)) { ip_rt_put(rt); return -EACCES; diff -Nurp a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c --- a/net/ipv4/tcp_ipv4.c 2007-06-01 13:22:39.0 +0800 +++ b/net/ipv4/tcp_ipv4.c 2007-06-01 13:25:07.0 +0800 @@ -192,8 +192,11 @@ int tcp_v4_connect(struct sock *sk, stru RT_CONN_FLAGS(sk), sk-sk_bound_dev_if, IPPROTO_TCP, inet-sport, usin-sin_port, sk, 1); - if (tmp 0) + if (tmp 0) { + if (tmp == -ENETUNREACH) + IP_INC_STATS_BH(IPSTATS_MIB_OUTNOROUTES); return tmp; + } if (rt-rt_flags (RTCF_MULTICAST | RTCF_BROADCAST)) { ip_rt_put(rt); diff -Nurp a/net/ipv4/udp.c b/net/ipv4/udp.c --- a/net/ipv4/udp.c2007-06-01 13:22:39.0 +0800 +++ b/net/ipv4/udp.c2007-06-01 13:26:47.0 +0800 @@ -631,8 +631,11 @@ int udp_sendmsg(struct kiocb *iocb, stru .dport = dport } } }; security_sk_classify_flow(sk, fl); err = ip_route_output_flow(rt, fl, sk, 1); - if (err) + if (err) { + if (err == -ENETUNREACH) + IP_INC_STATS_BH(IPSTATS_MIB_OUTNOROUTES); goto out; + } err = -EACCES; if ((rt-rt_flags RTCF_BROADCAST) David Miller wrote: From: Wei Dong [EMAIL PROTECTED] Date: Thu, 31 May 2007 09:16:50 +0800 Hi Mr. David I have modified my patch according to you advice. I think - EHOSTUNREACH is only for input path. In output path, we can just simply check-ENETUNREACH (^_^), the patch is shown in the end of this mail. I send this patch to you several weeks ago, but you have not replied to me. This patch is not correctly? Your email client is still corrupting the patch, it changes tab characters into spaces. This makes the patch not apply. Please do not send the same unusable patch so many times. Instead, I recommend to email the patch to yourself, and you try to apply it, in the same way someone else receiving your patch posting would. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html diff -Nurp a/net/ipv4/datagram.c b/net/ipv4/datagram.c --- a/net/ipv4/datagram.c 2007-06-01 13:22:39.0 +0800 +++ b/net/ipv4/datagram.c 2007-06-01 13:24:21.0 +0800 @@ -50,8 +50,12 @@ int ip4_datagram_connect(struct sock *sk RT_CONN_FLAGS(sk), oif, sk-sk_protocol, inet-sport, usin-sin_port, sk, 1); - if (err) + if (err) { + if (err == -ENETUNREACH) + IP_INC_STATS_BH(IPSTATS_MIB_OUTNOROUTES); return err; + } + if ((rt-rt_flags RTCF_BROADCAST) !sock_flag(sk, SOCK_BROADCAST)) { ip_rt_put(rt); return -EACCES; diff -Nurp a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c --- a/net/ipv4/tcp_ipv4.c 2007-06-01 13:22:39.0 +0800 +++ b/net/ipv4/tcp_ipv4.c 2007-06-01 13:25:07.0 +0800 @@ -192,8 +192,11 @@ int tcp_v4_connect(struct sock *sk, stru RT_CONN_FLAGS(sk), sk-sk_bound_dev_if, IPPROTO_TCP, inet-sport, usin-sin_port, sk, 1); - if (tmp 0) + if (tmp 0) { + if (tmp == -ENETUNREACH) + IP_INC_STATS_BH(IPSTATS_MIB_OUTNOROUTES); return tmp; + } if (rt-rt_flags (RTCF_MULTICAST | RTCF_BROADCAST)) { ip_rt_put(rt); diff -Nurp a/net/ipv4/udp.c b/net/ipv4/udp.c --- a/net/ipv4/udp.c 2007-06-01 13:22:39.0 +0800 +++ b/net/ipv4/udp.c 2007-06-01 13:26:47.0 +0800 @@ -631,8 +631,11 @@ int udp_sendmsg(struct kiocb *iocb, stru .dport = dport } } }; security_sk_classify_flow(sk, fl); err = ip_route_output_flow(rt, fl, sk, 1); - if (err) + if (err) { + if (err == -ENETUNREACH) +IP_INC_STATS_BH(IPSTATS_MIB_OUTNOROUTES); goto out;
Re: Fix ipOutNoRoutes counter error for TCP and UDP
Hi Mr. David I have modified my patch according to you advice. I think - EHOSTUNREACH is only for input path. In output path, we can just simply check -ENETUNREACH (^_^), the patch is shown in the end of this mail. BTW: my E-mail has been changed to [EMAIL PROTECTED] Function need to fix: tcp_v4_connect(); ip4_datagram_connect(); udp_sendmsg(); I think we need to make these checks more carefully. Route lookup can fail for several reasons other than no route being available. Two examples are: 1) Out of memory error while creating route 2) IPSEC disallows communication to that flow ID As a result, we'll probably best limiting the counter increment when the error is either -EHOSTUNREACH or -ENETUNREACH. signed-off-by: Wei Dong [EMAIL PROTECTED] diff -ruNp a/net/ipv4/datagram.c b/net/ipv4/datagram.c --- a/net/ipv4/datagram.c 2007-04-25 15:20:19.0 +0800 +++ b/net/ipv4/datagram.c 2007-04-25 15:21:42.0 +0800 @@ -50,8 +50,12 @@ int ip4_datagram_connect(struct sock *sk RT_CONN_FLAGS(sk), oif, sk-sk_protocol, inet-sport, usin-sin_port, sk); - if (err) + if (err) { + if (err == -ENETUNREACH) + IP_INC_STATS_BH(IPSTATS_MIB_OUTNOROUTES); return err; + } + if ((rt-rt_flags RTCF_BROADCAST) !sock_flag(sk, SOCK_BROADCAST)) { ip_rt_put(rt); return -EACCES; diff -ruNp a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c --- a/net/ipv4/tcp_ipv4.c 2007-04-25 15:20:19.0 +0800 +++ b/net/ipv4/tcp_ipv4.c 2007-04-25 15:21:42.0 +0800 @@ -192,8 +192,11 @@ int tcp_v4_connect(struct sock *sk, stru RT_CONN_FLAGS(sk), sk-sk_bound_dev_if, IPPROTO_TCP, inet-sport, usin-sin_port, sk); - if (tmp 0) + if (tmp 0) { + if (tmp == -ENETUNREACH) + IP_INC_STATS_BH(IPSTATS_MIB_OUTNOROUTES); return tmp; + } if (rt-rt_flags (RTCF_MULTICAST | RTCF_BROADCAST)) { ip_rt_put(rt); diff -ruNp a/net/ipv4/udp.c b/net/ipv4/udp.c --- a/net/ipv4/udp.c2007-04-25 15:20:19.0 +0800 +++ b/net/ipv4/udp.c2007-04-25 15:21:42.0 +0800 @@ -630,8 +630,11 @@ int udp_sendmsg(struct kiocb *iocb, stru .dport = dport } } }; security_sk_classify_flow(sk, fl); err = ip_route_output_flow(rt, fl, sk, !(msg-msg_flagsMSG_DONTWAIT)); - if (err) + if (err) { + if (err == -ENETUNREACH) + IP_INC_STATS_BH(IPSTATS_MIB_OUTNOROUTES); goto out; + } err = -EACCES; if ((rt-rt_flags RTCF_BROADCAST) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fix ipOutNoRoutes counter error for TCP and UDP
Hi Mr. David I have modified my patch according to you advice. I think - EHOSTUNREACH is only for input path. In output path, we can just simply check-ENETUNREACH (^_^), the patch is shown in the end of this mail. BTW: my E-mail has been changed to [EMAIL PROTECTED] Function need to fix: tcp_v4_connect(); ip4_datagram_connect(); udp_sendmsg(); I think we need to make these checks more carefully. Route lookup can fail for several reasons other than no route being available. Two examples are: 1) Out of memory error while creating route 2) IPSEC disallows communication to that flow ID As a result, we'll probably best limiting the counter increment when the error is either -EHOSTUNREACH or -ENETUNREACH. signed-off-by: Wei Dong [EMAIL PROTECTED] diff -ruNp a/net/ipv4/datagram.c b/net/ipv4/datagram.c --- a/net/ipv4/datagram.c 2007-04-25 15:20:19.0 +0800 +++ b/net/ipv4/datagram.c 2007-04-25 15:21:42.0 +0800 @@ -50,8 +50,12 @@ int ip4_datagram_connect(struct sock *sk RT_CONN_FLAGS(sk), oif, sk-sk_protocol, inet-sport, usin-sin_port, sk); - if (err) + if (err) { + if (err == -ENETUNREACH) + IP_INC_STATS_BH(IPSTATS_MIB_OUTNOROUTES); return err; + } + if ((rt-rt_flags RTCF_BROADCAST) !sock_flag(sk, SOCK_BROADCAST)) { ip_rt_put(rt); return -EACCES; diff -ruNp a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c --- a/net/ipv4/tcp_ipv4.c 2007-04-25 15:20:19.0 +0800 +++ b/net/ipv4/tcp_ipv4.c 2007-04-25 15:21:42.0 +0800 @@ -192,8 +192,11 @@ int tcp_v4_connect(struct sock *sk, stru RT_CONN_FLAGS(sk), sk-sk_bound_dev_if, IPPROTO_TCP, inet-sport, usin-sin_port, sk); - if (tmp 0) + if (tmp 0) { + if (tmp == -ENETUNREACH) + IP_INC_STATS_BH(IPSTATS_MIB_OUTNOROUTES); return tmp; + } if (rt-rt_flags (RTCF_MULTICAST | RTCF_BROADCAST)) { ip_rt_put(rt); diff -ruNp a/net/ipv4/udp.c b/net/ipv4/udp.c --- a/net/ipv4/udp.c2007-04-25 15:20:19.0 +0800 +++ b/net/ipv4/udp.c2007-04-25 15:21:42.0 +0800 @@ -630,8 +630,11 @@ int udp_sendmsg(struct kiocb *iocb, stru .dport = dport } } }; security_sk_classify_flow(sk, fl); err = ip_route_output_flow(rt, fl, sk, !(msg-msg_flagsMSG_DONTWAIT)); - if (err) + if (err) { + if (err == -ENETUNREACH) + IP_INC_STATS_BH(IPSTATS_MIB_OUTNOROUTES); goto out; + } err = -EACCES; if ((rt-rt_flags RTCF_BROADCAST) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fix ipOutNoRoutes counter error for TCP and UDP
Hi Mr. David I have modified my patch according to you advice. I think - EHOSTUNREACH is only for input path. In output path, we can just simply check -ENETUNREACH (^_^), the patch is shown in the end of this mail. BTW: my E-mail has been changed to [EMAIL PROTECTED] Function need to fix: tcp_v4_connect(); ip4_datagram_connect(); udp_sendmsg(); I think we need to make these checks more carefully. Route lookup can fail for several reasons other than no route being available. Two examples are: 1) Out of memory error while creating route 2) IPSEC disallows communication to that flow ID As a result, we'll probably best limiting the counter increment when the error is either -EHOSTUNREACH or -ENETUNREACH. signed-off-by: Wei Dong [EMAIL PROTECTED] diff -ruNp old/net/ipv4/datagram.c new/net/ipv4/datagram.c --- old/net/ipv4/datagram.c 2007-03-27 18:15:56.0 +0800 +++ new/net/ipv4/datagram.c 2007-03-27 18:23:58.0 +0800 @@ -50,8 +50,12 @@ int ip4_datagram_connect(struct sock *sk RT_CONN_FLAGS(sk), oif, sk-sk_protocol, inet-sport, usin-sin_port, sk); - if (err) + if (err) { + if (err == -ENETUNREACH) + IP_INC_STATS_BH(IPSTATS_MIB_OUTNOROUTES); return err; + } + if ((rt-rt_flags RTCF_BROADCAST) !sock_flag(sk, SOCK_BROADCAST)) { ip_rt_put(rt); return -EACCES; diff -ruNp old/net/ipv4/tcp_ipv4.c new/net/ipv4/tcp_ipv4.c --- old/net/ipv4/tcp_ipv4.c 2007-03-27 18:15:56.0 +0800 +++ new/net/ipv4/tcp_ipv4.c 2007-03-27 18:28:38.0 +0800 @@ -192,8 +192,11 @@ int tcp_v4_connect(struct sock *sk, stru RT_CONN_FLAGS(sk), sk-sk_bound_dev_if, IPPROTO_TCP, inet-sport, usin-sin_port, sk); - if (tmp 0) + if (tmp 0) { + if (tmp == -ENETUNREACH) + IP_INC_STATS_BH(IPSTATS_MIB_OUTNOROUTES); return tmp; + } if (rt-rt_flags (RTCF_MULTICAST | RTCF_BROADCAST)) { ip_rt_put(rt); diff -ruNp old/net/ipv4/udp.c new/net/ipv4/udp.c --- old/net/ipv4/udp.c 2007-03-27 18:15:56.0 +0800 +++ new/net/ipv4/udp.c 2007-03-27 18:26:47.0 +0800 @@ -630,8 +630,11 @@ int udp_sendmsg(struct kiocb *iocb, stru .dport = dport } } }; security_sk_classify_flow(sk, fl); err = ip_route_output_flow(rt, fl, sk, !(msg-msg_flagsMSG_DONTWAIT)); - if (err) + if (err) { + if (err == -ENETUNREACH) + IP_INC_STATS_BH(IPSTATS_MIB_OUTNOROUTES); goto out; + } err = -EACCES; if ((rt-rt_flags RTCF_BROADCAST) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Fix bugs in Whether sock accept queue is full checking
Hi, All when I use linux TCP socket, and find there is a bug in function sk_acceptq_is_full(). When a new SYN comes, TCP module first checks its validation. If valid, send SYN,ACK to the client and add the sock to the syn hash table. Next time if received the valid ACK for SYN,ACK from the client. server will accept this connection and increase the sk-sk_ack_backlog -- which is done in function tcp_check_req().We check wether acceptq is full in function tcp_v4_syn_recv_sock(). Consider an example: After listen(sockfd, 1) system call, sk-sk_max_ack_backlog is set to 1. As we know, sk-sk_ack_backlog is initialized to 0. Assuming accept() system call is not invoked now. 1. 1st connection comes. invoke sk_acceptq_is_full(). sk- sk_ack_backlog=0 sk-sk_max_ack_backlog=1, function return 0 accept this connection. Increase the sk-sk_ack_backlog 2. 2nd connection comes. invoke sk_acceptq_is_full(). sk- sk_ack_backlog=1 sk-sk_max_ack_backlog=1, function return 0 accept this connection. Increase the sk-sk_ack_backlog 3. 3rd connection comes. invoke sk_acceptq_is_full(). sk- sk_ack_backlog=2 sk-sk_max_ack_backlog=1, function return 1. Refuse this connection. I think it has bugs. after listen system call. sk-sk_max_ack_backlog=1 but now it can accept 2 connections. The following patch can fix this problem. signed-off-by: Wei Dong [EMAIL PROTECTED] diff -ruN old/include/net/sock.h new/include/net/sock.h --- old/include/net/sock.h 2007-02-03 08:38:21.0 -0500 +++ new/include/net/sock.h 2007-02-03 08:38:30.0 -0500 @@ -426,7 +426,7 @@ static inline int sk_acceptq_is_full(struct sock *sk) { - return sk-sk_ack_backlog sk-sk_max_ack_backlog; + return sk-sk_ack_backlog = sk-sk_max_ack_backlog; } /* - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Fix ipOutNoRoutes counter error for TCP and UDP
Hi, All When I tested Linux-2.6.20 and found that counter ipOutNoRoutes can not increase correctly. The criteria is RFC2011 ipOutNoRoutes OBJECT-TYPE SYNTAX Counter32 MAX-ACCESS read-only STATUS current DESCRIPTION The number of IP datagrams discarded because no route could be found to transmit them to their destination. Note that this counter includes any packets counted in ipForwDatagrams which meet this `no-route' criterion. Note that this includes any datagrams which a host cannot route because all of its default routers are down. ::= { ip 12 } In current Linux TCP/IP stack, maybe we should not increase this counter in input path, but only increase it in output path due to the TCP/IP stack performance. Now in output path, when TCP client tries to connect to an unreachable server(net unreachable, so no route can be found), this counter has no increment. When we use UDP sending UDP datagram to an net unreachable address, this counter also has no increment. Function need to fix: tcp_v4_connect(); ip4_datagram_connect(); udp_sendmsg(); The following patch can fix the problems mentioned above BR Wei Dong signed-off-by: Wei Dong [EMAIL PROTECTED] diff -ruN old/net/ipv4/datagram.c new/net/ipv4/datagram.c --- old/net/ipv4/datagram.c 2007-02-02 12:28:54.0 -0500 +++ new/net/ipv4/datagram.c 2007-02-02 12:29:01.0 -0500 @@ -50,8 +50,10 @@ RT_CONN_FLAGS(sk), oif, sk-sk_protocol, inet-sport, usin-sin_port, sk); - if (err) + if (err) { + IP_INC_STATS_BH(IPSTATS_MIB_OUTNOROUTES); return err; + } if ((rt-rt_flags RTCF_BROADCAST) !sock_flag(sk, SOCK_BROADCAST)) { ip_rt_put(rt); return -EACCES; diff -ruN old/net/ipv4/tcp_ipv4.c new/net/ipv4/tcp_ipv4.c --- old/net/ipv4/tcp_ipv4.c 2007-02-02 12:28:54.0 -0500 +++ new/net/ipv4/tcp_ipv4.c 2007-02-02 12:29:01.0 -0500 @@ -192,8 +192,10 @@ RT_CONN_FLAGS(sk), sk-sk_bound_dev_if, IPPROTO_TCP, inet-sport, usin-sin_port, sk); - if (tmp 0) + if (tmp 0) { + IP_INC_STATS_BH(IPSTATS_MIB_OUTNOROUTES); return tmp; + } if (rt-rt_flags (RTCF_MULTICAST | RTCF_BROADCAST)) { ip_rt_put(rt); diff -ruN old/net/ipv4/udp.c new/net/ipv4/udp.c --- old/net/ipv4/udp.c 2007-02-02 12:28:54.0 -0500 +++ new/net/ipv4/udp.c 2007-02-02 12:29:01.0 -0500 @@ -630,8 +630,10 @@ .dport = dport } } }; security_sk_classify_flow(sk, fl); err = ip_route_output_flow(rt, fl, sk, !(msg-msg_flagsMSG_DONTWAIT)); - if (err) + if (err) { + IP_INC_STATS_BH(IPSTATS_MIB_OUTNOROUTES); goto out; + } err = -EACCES; if ((rt-rt_flags RTCF_BROADCAST) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch][IPv6] Fix wrong routing mechanism for Link Local IPv6 packets
Hello, Mr yoshfuji: Thanks for your reply. The following is the figure. || || | Router | |||---| | |--| |--| || other | | | eth0 |---| eth1 |--||network| | |--| ||--| ||---| |--|--||-| | v| | fe80::20c:29ff:fe24:fa0a | | | | | | | | | | | | v | fe80::20c:29ff:fe24:fa14 | | |--|--| | |--| | | | eth0 |---|---fe80::200:ff:fe00:100 | |--| | | | |Host | |-| Host eth0: fe80::200:ff:fe00:100 Router eth0: fe80::20c:29ff:fe24:fa0a Router eth1: fe80::20c:29ff:fe24:fa14 We ping6 from host's eth0 to Router's eth1. Echo Request's src addr = fe80::200:ff:fe00:100, dst addr = fe80::20c:29ff:fe24:fa14. And Kernel just send ICMPv6 redirect packet and then forward the Echo Request to router's eth0. If we run tcpdump on Host eth0, we can receive the ICMPv6 Redirect packet. And if we send NA which advertises fe80::20c:29ff:fe24:fa14 MAC address(this is very easy for v6eval tool), we also can receive the forwarded Echo Request(src:fe80::200:ff:fe00:100 dst is fe80::20c:29ff:fe24:fa14). I dived into the kernel, and found that maybe function rt6_score_route() has problems. In rt6_score_route(), if rt6_check_dev() return 0, and the dst ipv6 addr is link local addr, rt6_socre_route() return -1 directly. I think this is not correct, we should return -1 only if the entry is in the route cache, and the dst addr is link local addr. Only entries in cache may select wrong IPv6 Link Local NIC for a link local dst addr. because they are copied from static IPv6 fib table entries. Hello, Mr yoshfuji Take ping6 for example. Asumming there is a router which has 2 NICs. eth0 on router has ipv6 addr fe80::20c:29ff:fe24:fa0a, eth1 on router has ipv6 addr fe80::20c:29ff:fe24:fa14. Also there is a host connected to router's eth0, and the host's ipv6 addr is fe80::200:ff:fe00:100. We ping6 : I still need more precise figure. Please draw complete box for the 2-3 boxes (pinger, router (and the destination)), link(s) and interfaces. +-+ |Router | +---+-+---+ eth0| |eth1 | | eth0| +---+-+ |Host1| +-+ Host1 eth0: fe80: Router eth0: fe80: Router eth1: fe80:... Or, something like that I think you may use other tool such as tgif etc. --yoshfuji - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch][IPv6] Fix wrong routing mechanism for Link Local IPv6 packets
Hello, Mr yoshfuji Thanks for your patch. I think maybe we checking oif first is better, and WARN_ON in function rt6_score_route(). The following is my patch Signed-off-by: Wei Dong [EMAIL PROTECTED] diff -ruN old/net/ipv6/route.c new/net/ipv6/route.c --- old/net/ipv6/route.c2007-02-16 13:46:33.0 -0500 +++ new/net/ipv6/route.c2007-02-16 13:44:27.0 -0500 @@ -309,12 +309,21 @@ static int inline rt6_check_dev(struct rt6_info *rt, int oif) { struct net_device *dev = rt-rt6i_dev; - if (!oif || dev-ifindex == oif) + int ret = 0; + + if (!oif) return 2; + if ((dev-flags IFF_LOOPBACK) rt-rt6i_idev rt-rt6i_idev-dev-ifindex == oif) - return 1; - return 0; + ret = 1; + else + return 0; + + if (dev-ifindex == oif) + return 2; + + return ret; } static int inline rt6_check_neigh(struct rt6_info *rt) @@ -339,8 +348,11 @@ int m, n; m = rt6_check_dev(rt, oif); - if (!m (strict RT6_LOOKUP_F_IFACE)) + if (!m (strict RT6_LOOKUP_F_IFACE)) { + WARN_ON(rt-rt6i_dev-flags IFF_LOOPBACK); return -1; + } + #ifdef CONFIG_IPV6_ROUTER_PREF m |= IPV6_DECODE_PREF(IPV6_EXTRACT_PREF(rt-rt6i_flags)) 2; #endif On Wed, 2007-01-31 at 13:00 +0900, Wei Dong wrote: In article [EMAIL PROTECTED] (at Wed, 21 Feb 2007 09:57:12 -0500), weidong [EMAIL PROTECTED] says: The following is the figure. : Host eth0: fe80::200:ff:fe00:100 Router eth0: fe80::20c:29ff:fe24:fa0a Router eth1: fe80::20c:29ff:fe24:fa14 Other network | | eth1 +++ | Router | +++ | eth0 | | eth0 +++ | Host | +-+ We ping6 from host's eth0 to Router's eth1. Echo Request's src addr = fe80::200:ff:fe00:100, dst addr = fe80::20c:29ff:fe24:fa14. And Kernel just send ICMPv6 redirect packet and then forward the Echo Request to router's eth0. If we run tcpdump on Host eth0, we can receive the ICMPv6 Redirect packet. And if we send NA which advertises This is correct, and intended behavior. fe80::20c:29ff:fe24:fa14 MAC address(this is very easy for v6eval tool), we also can receive the forwarded Echo Request(src:fe80::200:ff:fe00:100 dst is fe80::20c:29ff:fe24:fa14). Well, this is known issue, actually. While this cannot happen in normal operation, we should NOT accept such traffic. :-) Here is the (untested) fix. - [IPV6] ROUTE: Do not accept traffic for link-local address on different interface. Signed-off-by: YOSHIFUJI Hideaki [EMAIL PROTECTED] --- diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 5f0043c..a7468e0 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -311,12 +311,19 @@ static inline void rt6_probe(struct rt6_info *rt) static int inline rt6_check_dev(struct rt6_info *rt, int oif) { struct net_device *dev = rt-rt6i_dev; + int ret = 0; + + if (dev-flags IFF_LOOPBACK) { + if (!WARN_ON(rt-rt6i_idev == NULL) + rt-rt6i_idev-dev-ifindex == oif) + ret = 1; + else + return 0; + } if (!oif || dev-ifindex == oif) return 2; - if ((dev-flags IFF_LOOPBACK) - rt-rt6i_idev rt-rt6i_idev-dev-ifindex == oif) - return 1; - return 0; + + return ret; } static int inline rt6_check_neigh(struct rt6_info *rt) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[Patch][IPv6] Fix wrong routing mechanism for Link Local IPv6 packets
Hi, all When I tested linux-2.6.19.2, and found maybe there're some packet routing bugs in linux kernel. My test topology is shown as the following: eth0: fe80::20c:29ff:fe24:fa0a | eth1: fe80::20c:29ff:fe24:fa14 | | | | | | | LAN1 |LAN2 | | - ^ | Send Echo Request(src addr = fe80::200:ff:fe00:100) Now we send Echo Request to eth1(ipv6 src addr is fe80::200:ff:fe00:100, ipv6 dst addr is fe80::20c:29ff:fe24:fa14) on LAN1. Linux will send ICMPv6 Redirect Packet to us. And the TargetAddress = fe80::20c:29ff:fe24:fa14, DestinationAddress = fe80::20c:29ff:fe24:fa14. Obviously, Linux considers that fe80::200:ff:fe00:100 and fe80::20c:29ff:fe24:fa14 is on the SAME link(LAN0). In fact, kernel invoke function ip6_foward(). When Linux decides whether or not making use of a rt6_info entry, it will match the rt-rt6i_idev and rt-rt6_dev. This is done in function rt6_check_dev(). If nothing matched, rt6_check_dev() return 0. Then function rt6_score_route() will check whether the matched ipv6 addr (fe80::20c:29ff:fe24:fa14 in our example) is a link local ipv6 address. If it is a link local address, and rt-rt6i_idev rt-rt6_dev match failed -- rt6_check_dev() return 0. Function rt6_score_route() return -1 directly. I think here is a problem. When kernel match eth1 addr with rt6_info entries, it will lookup in local_table first. In rt6_check_dev() matching rt-rt6i_idev rt- rt6_dev will fail. The reason is oif = 2 , rt-rt6i_idev-dev-ifindex is 3 and rt-rt6i_dev-ifindex is 1. I think even this match failed, rt6_score_route() should not return -1, but return 0. And I think check for RT6_LOOKUP_F_IFACE flag isn't needed here. Checking for this flag is only needed in route cache when matching dst addr. Due to the reason mentioned above, all entries in local table matching dst addr fe80::20c:29ff:fe24:fa14 are failed. And then kernel matches main table. fe80::/64 entry in main table will match successfully. Later ip6_rt_copy() will copy the function pointer rt-u.dst.input. Obviously rt-u.dst.input in main table is ip6_forward(). The following is my patch. signed-off-by: Wei Dong [EMAIL PROTECTED] diff -ruN old/net/ipv6/route.c new/net/ipv6/route.c --- old/net/ipv6/route.c2007-01-10 14:10:37.0 -0500 +++ new/net/ipv6/route.c2007-01-17 18:24:51.336774016 -0500 @@ -343,7 +343,7 @@ int m, n; m = rt6_check_dev(rt, oif); - if (!m (strict RT6_LOOKUP_F_IFACE)) + if (!m (rt-rt6i_flags RTF_CACHE) (strict RT6_LOOKUP_F_IFACE)) return -1; #ifdef CONFIG_IPV6_ROUTER_PREF m |= IPV6_DECODE_PREF(IPV6_EXTRACT_PREF(rt-rt6i_flags)) 2; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[Patch]Fix wrong routing mechanism for Link Local IPv6 packets
Hi, all When I tested linux-2.6.19.2, and found maybe there're a packet routing bug in linux kernel. My test topology is shown as the following: eth0: fe80::20c:29ff:fe24:fa0a | eth1: fe80::20c:29ff:fe24:fa14 | | | | | | | LAN1 |LAN2 | | - ^ | Send Echo Request(src addr = fe80::200:ff:fe00:100) Now we send Echo Request to eth1(ipv6 src addr is fe80::200:ff:fe00:100, ipv6 dst addr is fe80::20c:29ff:fe24:fa14) on LAN1. Linux will send ICMPv6 Redirect Packet to us. And the TargetAddress = fe80::20c:29ff:fe24:fa14, DestinationAddress = fe80::20c:29ff:fe24:fa14. Obviously, Linux considers that fe80::200:ff:fe00:100 and fe80::20c:29ff:fe24:fa14 is on the SAME link(LAN1). In fact, kernel invoke function ip6_foward(). When Linux decides whether or not making use of a rt6_info entry, it will match the rt-rt6i_idev and rt-rt6_dev. This is done in function rt6_check_dev(). If nothing matched, rt6_check_dev() return 0. Then function rt6_score_route() will check whether the matched ipv6 addr (fe80::20c:29ff:fe24:fa14 in our example) is a link local ipv6 address. If it is a link local address, and rt-rt6i_idev rt-rt6_dev match failed -- rt6_check_dev() return 0. Function rt6_score_route() return -1 directly. I think here is a problem. When kernel match eth1 addr with rt6_info entries, it will lookup in local_table first. In rt6_check_dev() matching rt-rt6i_idev rt- rt6_dev will fail. The reason is oif = 2 , rt-rt6i_idev-dev-ifindex is 3 and rt-rt6i_dev-ifindex is 1. I think even this match failed, rt6_score_route() should not return -1, but return 0. And I think check for RT6_LOOKUP_F_IFACE flag isn't needed here. Checking for this flag is only needed in route cache when matching dst addr. Due to the reason mentioned above, all entries in local table matching dst addr fe80::20c:29ff:fe24:fa14 are failed. And then kernel matches main table. fe80::/64 entry in main table will match successfully. Later ip6_rt_copy() will copy the function pointer rt-u.dst.input. Obviously rt-u.dst.input in main table is ip6_forward(). The following is my patch. signed-off-by: Wei Dong [EMAIL PROTECTED] diff -ruN old/net/ipv6/route.c new/net/ipv6/route.c --- old/net/ipv6/route.c2007-01-10 14:10:37.0 -0500 +++ new/net/ipv6/route.c2007-01-17 18:24:51.336774016 -0500 @@ -343,7 +343,7 @@ int m, n; m = rt6_check_dev(rt, oif); - if (!m (strict RT6_LOOKUP_F_IFACE)) + if (!m (rt-rt6i_flags RTF_CACHE) (strict RT6_LOOKUP_F_IFACE)) return -1; #ifdef CONFIG_IPV6_ROUTER_PREF m |= IPV6_DECODE_PREF(IPV6_EXTRACT_PREF(rt-rt6i_flags)) 2; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] Fix SNMPv2 ipOutNoRoutes counter error
Hi All: When I tested linux kernel 2.6.18.3, and find that kernel statistics about IPSTATS_MIB_OUTNOROUTES which exsits in file /proc/net/snmp doesn't increase correctly. The criteria conform to RFC2011: ipOutNoRoutes OBJECT-TYPE SYNTAX Counter32 MAX-ACCESS read-only STATUS current DESCRIPTION The number of IP datagrams discarded because no route could be found to transmit them to their destination. Note that this counter includes any packets counted in ipForwDatagrams which meet this `no-route' criterion. Note that this includes any datagrams which a host cannot route because all of its default routers are down. ::= { ip 12 } When a host received an IP packet, but the destination address is not this host. The kernel just discards the IP packet but with no increment for this counter. When a router received an IP packet that this router can't forward due to no route found. Kernel just simply invoke ip_error(), and send ICMP packet. Also do nothing for this counter. Signed-off-by: Wei Dong [EMAIL PROTECTED] diff -ruN old/net/ipv4/icmp.c new/net/ipv4/icmp.c --- old/net/ipv4/icmp.c 2006-09-20 00:42:06.0 -0300 +++ new/net/ipv4/icmp.c 2006-12-04 11:46:39.0 -0300 @@ -580,6 +580,9 @@ icmp_param.data_len = room; icmp_param.head_len = sizeof(struct icmphdr); + if (code == ICMP_NET_UNREACH) + IP_INC_STATS_BH(IPSTATS_MIB_OUTNOROUTES); + icmp_push_reply(icmp_param, ipc, rt); ende: ip_rt_put(rt); diff -ruN old/net/ipv4/ip_input.c new/net/ipv4/ip_input.c --- old/net/ipv4/ip_input.c 2006-09-20 00:42:06.0 -0300 +++ new/net/ipv4/ip_input.c 2006-11-29 11:15:52.0 -0300 @@ -340,8 +340,10 @@ int err = ip_route_input(skb, iph-daddr, iph-saddr, iph-tos, skb-dev); if (unlikely(err)) { - if (err == -EHOSTUNREACH) + if (err == -EHOSTUNREACH) { IP_INC_STATS_BH(IPSTATS_MIB_INADDRERRORS); + IP_INC_STATS_BH(IPSTATS_MIB_OUTNOROUTES); + } goto drop; } } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
SNMPv2 ipFragFails counter error
Hi, All When I tested Linux kernel 2.6.17.7 about statistics ipFragFails,found that this counter couldn't increase correctly. The criteria is RFC2011: RFC2011 ipFragFails OBJECT-TYPE SYNTAX Counter32 MAX-ACCESS read-only STATUS current DESCRIPTION The number of IP datagrams that have been discarded because they needed to be fragmented at this entity but could not be, e.g., because their Don't Fragment flag was set. ::= { ip 18 } When I send big IP packet to a router with DF bit set to 1 which need to be fragmented, and router just sends an ICMP error message ICMP_FRAG_NEEDED but no increments for this counter(in the function ip_fragment). signed-off-by:Wei Dong [EMAIL PROTECTED] diff -ruN old/net/ipv4/ip_output.c new/net/ipv4/ip_output.c --- old/net/ipv4/ip_output.c2006-07-25 11:36:01.0 +0800 +++ new/net/ipv4/ip_output.c2006-08-30 14:30:49.0 +0800 @@ -441,6 +441,7 @@ iph = skb-nh.iph; if (unlikely((iph-frag_off htons(IP_DF)) !skb-local_df)) { + IP_INC_STATS(IPSTATS_MIB_FRAGFAILS); icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(dst_mtu(rt-u.dst))); kfree_skb(skb); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 0/1]SNMPv2 ipv6IfStatsInHdrErrors counter error
Hi, All When I tested Linux kernel 2.6.17.7 about statistics ipv6IfStatsInHdrErrors, found that this counter couldn't increase correctly. The criteria is RFC2465: ipv6IfStatsInHdrErrors OBJECT-TYPE SYNTAX Counter3 MAX-ACCESS read-only STATUS current DESCRIPTION The number of input datagrams discarded due to errors in their IPv6 headers, including version number mismatch, other format errors, hop count exceeded, errors discovered in processing their IPv6 options, etc. ::= { ipv6IfStatsEntry 2 } When I send TTL=0 and TTL=1 a packet to a router which need to be forwarded, router just sends an ICMPv6 message to tell the sender that TIME_EXCEED and HOPLIMITS, but no increments for this counter(in the function ip6_forward). The following is the patch for this issue. diff -ruN old/net/ipv6/ip6_output.c new/net/ipv6/ip6_output.c --- old/net/ipv6/ip6_output.c 2006-07-25 11:36:01.0 +0800 +++ new/net/ipv6/ip6_output.c 2006-07-31 16:16:13.0 +0800 @@ -356,6 +356,7 @@ skb-dev = dst-dev; icmpv6_send(skb, ICMPV6_TIME_EXCEED, ICMPV6_EXC_HOPLIMIT, 0, skb-dev); + IP6_INC_STATS_BH(IPSTATS_MIB_INHDRERRORS); kfree_skb(skb); return -ETIMEDOUT; signed-off-by:Wei Dong [EMAIL PROTECTED] Regards Wei Dong - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 1/1]SNMPv2 ipv6IfStatsOutFragCreates counter error
Hi, All When I tested linux kernel 2.6.71.7 about statistics ipv6IfStatsOutFragCreates, and found that it couldn't increase correctly. The criteria is RFC 2465: ipv6IfStatsOutFragCreates OBJECT-TYPE SYNTAX Counter32 MAX-ACCESS read-only STATUS current DESCRIPTION The number of output datagram fragments that have been generated as a result of fragmentation at this output interface. ::= { ipv6IfStatsEntry 15 } I think there are two issues in Linux kernel. 1st: RFC2465 specifies the counter is The number of output datagram fragments I think increasing this counter after output a fragment successfully is better. And it should not be increased even though a fragment is created but failed to output. 2nd: If we send a big ICMP/ICMPv6 echo request to a host, and receive ICMP/ICMPv6 echo reply consisted of some fragments. As we know that in Linux kernel first fragmentation occurs in ICMP layer(maybe saying transport layer is better), but this is not the real fragmentation,just do some pre-fragment -- allocate space for date, and form a frag_list, etc. The real fragmentation happens in IP layer -- set offset and MF flag and so on. So I think in fast path for ip_fragment/ip6_fragment, if we send a fragment which pre-fragment by upper layer we should also increase ipv6IfStatsOutFragCreates. The following is the patch for the issues mentioned above: diff -ruN old/net/ipv4/ip_output.c new/net/ipv4/ip_output.c --- old/net/ipv4/ip_output.c2006-07-25 11:36:01.0 +0800 +++ new/net/ipv4/ip_output.c2006-07-31 16:24:57.0 +0800 @@ -527,6 +527,8 @@ err = output(skb); + if (!err) + IP_INC_STATS(IPSTATS_MIB_FRAGCREATES); if (err || !frag) break; @@ -650,9 +652,6 @@ /* * Put this fragment into the sending queue. */ - - IP_INC_STATS(IPSTATS_MIB_FRAGCREATES); - iph-tot_len = htons(len + hlen); ip_send_check(iph); @@ -660,6 +659,8 @@ err = output(skb2); if (err) goto fail; + + IP_INC_STATS(IPSTATS_MIB_FRAGCREATES); } kfree_skb(skb); IP_INC_STATS(IPSTATS_MIB_FRAGOKS); diff -ruN old/net/ipv6/ip6_output.c new/net/ipv6/ip6_output.c --- old/net/ipv6/ip6_output.c 2006-07-25 11:36:01.0 +0800 +++ new/net/ipv6/ip6_output.c 2006-07-31 16:24:21.0 +0800 @@ -593,6 +593,9 @@ } err = output(skb); + if(!err) + IP6_INC_STATS(IPSTATS_MIB_FRAGCREATES); + if (err || !frag) break; @@ -704,12 +707,11 @@ /* * Put this fragment into the sending queue. */ - - IP6_INC_STATS(IPSTATS_MIB_FRAGCREATES); - err = output(frag); if (err) goto fail; + + IP6_INC_STATS(IPSTATS_MIB_FRAGCREATES); } kfree_skb(skb); IP6_INC_STATS(IPSTATS_MIB_FRAGOKS); signed-off-by: Wei Dong [EMAIL PROTECTED] Regards Wei Dong - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html