Re: [PATCH] net: Revert "net_sched: no need to free qdisc in RCU callback"

2017-12-21 Thread Jiri Pirko
Thu, Dec 21, 2017 at 12:34:05AM CET, john.fastab...@gmail.com wrote:
>On 12/20/2017 03:23 PM, Cong Wang wrote:
>> On Wed, Dec 20, 2017 at 3:05 PM, John Fastabend
>>  wrote:
>>> On 12/20/2017 02:41 PM, Cong Wang wrote:
 On Wed, Dec 20, 2017 at 12:09 PM, John Fastabend
  wrote:
> RCU grace period is needed for lockless qdiscs added in the commit
> c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array").
>
> It is needed now that qdiscs may be lockless otherwise we risk
> free'ing a qdisc that is still in use from datapath. Additionally,
> push list cleanup into RCU callback. Otherwise we risk the datapath
> adding skbs during removal.

 What about qdisc_graft() -> dev_deactivate() -> synchronize_net() ?
 It doesn't work with your "lockless" patches?

>>>
>>> Well this is only in the 'parent == NULL' case otherwise we call
>>> cops->graft(). Most sch_* seem to use qdisc_replace and this uses
>>> sch_tree_lock().
>>>
>>> The only converted qdisc mq and mqprio at this point don't care
>>> though and do their own dev_deactivate/activate. So its not fixing
>>> anything in the above mentioned commit.
>> 
>> Sure, removing a class does not impact the whole device,
>> but removing the root qdisc does.
>> 
>> After your "lockless", skb_array_consume_bh() is called in
>> pfifo_fast_reset() and ptr_ring_cleanup() is called in
>> pfifo_fast_destroy(), assuming skb_array is not buggy, what race
>> do we have here with datapath?
>> 
>
>None at the moment.
>
>> 
>>>
>>> I still think it will need to be done eventually. If it resolves
>>> the miniq case it seems like a good idea. Although per Jakub's comment
>>> perhaps I pulled too much into the RCU handler.
>> 
>> The case Jakub reported is a RCU callback missing a rcu
>> barrier. I don't understand why you keep believing it is RCU
>> readers on datapath.> 
>> Not even to mention ingress is not affected by your "lockless"
>> thing.
>> 
>
>I was thinking about the case where we want a lockless qdisc
>with classes. Doing the qdisc destroy after a grace period would
>solve this. Also we could start to cleanup a lot of the locking
>and extra bits around 'running' qdisc and such by doing a clean
>xchg on the qdisc layer. It seems that a dev_activate/deactivate
>just to install a new qdisc is not needed.
>
>Anyways future work. However if it resolves the miniq issue, as
>Jiri indicated, seems like a clean fix. Although Jakub's issue
>with the patch would need to be addressed. Seems he gets a WARN_ON
>if the offload is not disabled but the device is unitialized.

Why just moving qdisc_free to rcu is not enough? It would resolve this
issue and also avoid using synchronize net. Something like:

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 83a3e47..487288e 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -100,6 +100,7 @@ struct Qdisc {
refcount_t  refcnt;
 
spinlock_t  busylock cacheline_aligned_in_smp;
+   struct rcu_head rcu;
 };
 
 static inline void qdisc_refcount_inc(struct Qdisc *qdisc)
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index cd1b200..9beffd1 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -698,8 +698,10 @@ void qdisc_reset(struct Qdisc *qdisc)
 }
 EXPORT_SYMBOL(qdisc_reset);
 
-static void qdisc_free(struct Qdisc *qdisc)
+static void qdisc_free_rcu(struct rcu_head *rcu)
 {
+   struct Qdisc *qdisc = container_of(rcu, struct Qdisc, rcu);
+
if (qdisc_is_percpu_stats(qdisc)) {
free_percpu(qdisc->cpu_bstats);
free_percpu(qdisc->cpu_qstats);
@@ -732,7 +734,7 @@ void qdisc_destroy(struct Qdisc *qdisc)
 
kfree_skb_list(qdisc->gso_skb);
kfree_skb(qdisc->skb_bad_txq);
-   qdisc_free(qdisc);
+   call_rcu(&qdisc->rcu, qdisc_free_rcu);
 }
 EXPORT_SYMBOL(qdisc_destroy);
 
-- 
2.9.5



[PATCH net-next] cxgb4: add new T5 and T6 device id's

2017-12-21 Thread Ganesh Goudar
Add device id's 0x50ac, 0x6087 for T5 and T6 cards
respectively.

Signed-off-by: Ganesh Goudar 
---
 drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h 
b/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h
index 60cf9e0..51b1803 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h
@@ -183,6 +183,7 @@ CH_PCI_DEVICE_ID_TABLE_DEFINE_BEGIN
CH_PCI_ID_TABLE_FENTRY(0x50a9), /* Custom T580-KR */
CH_PCI_ID_TABLE_FENTRY(0x50aa), /* Custom T580-CR */
CH_PCI_ID_TABLE_FENTRY(0x50ab), /* Custom T520-CR */
+   CH_PCI_ID_TABLE_FENTRY(0x50ac), /* Custom T540-BT */
 
/* T6 adapters:
 */
@@ -206,6 +207,7 @@ CH_PCI_DEVICE_ID_TABLE_DEFINE_BEGIN
CH_PCI_ID_TABLE_FENTRY(0x6084), /* Custom T64100-CR QSFP28 */
CH_PCI_ID_TABLE_FENTRY(0x6085), /* Custom T6240-SO */
CH_PCI_ID_TABLE_FENTRY(0x6086), /* Custom T6225-SO-CR */
+   CH_PCI_ID_TABLE_FENTRY(0x6087), /* Custom T6225-CR */
 CH_PCI_DEVICE_ID_TABLE_DEFINE_END;
 
 #endif /* __T4_PCI_ID_TBL_H__ */
-- 
2.1.0



Re: [Patch net] net_sched: fix a missing rcu barrier in mini_qdisc_pair_swap()

2017-12-21 Thread Jiri Pirko
Thu, Dec 21, 2017 at 08:26:24AM CET, xiyou.wangc...@gmail.com wrote:
>The rcu_barrier_bh() in mini_qdisc_pair_swap() is to wait for
>flying RCU callback installed by a previous mini_qdisc_pair_swap(),
>however we miss it on the tp_head==NULL path, which leads to that
>the RCU callback still uses miniq_old->rcu after it is freed together
>with qdisc in qdisc_graft(). So just add it on that path too.
>
>Fixes: 46209401f8f6 ("net: core: introduce mini_Qdisc and eliminate usage of 
>tp->q for clsact fastpath ")

This fixes:
752fbcc33405 ("net_sched: no need to free qdisc in RCU callback")

Before that, the issue was not there as the qdisc struct got removed
after a grace period.


>Reported-by: Jakub Kicinski 
>Tested-by: Jakub Kicinski 
>Cc: Jiri Pirko 
>Cc: John Fastabend 
>Signed-off-by: Cong Wang 
>---
> net/sched/sch_generic.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
>diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
>index cd1b200acae7..661c7144b53a 100644
>--- a/net/sched/sch_generic.c
>+++ b/net/sched/sch_generic.c
>@@ -1040,6 +1040,8 @@ void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
> 
>   if (!tp_head) {
>   RCU_INIT_POINTER(*miniqp->p_miniq, NULL);
>+  /* Wait for flying RCU callback before it is freed. */
>+  rcu_barrier_bh();


>   return;
>   }
> 
>@@ -1055,7 +1057,7 @@ void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
>   rcu_assign_pointer(*miniqp->p_miniq, miniq);
> 
>   if (miniq_old)
>-  /* This is counterpart of the rcu barrier above. We need to
>+  /* This is counterpart of the rcu barriers above. We need to

This is incorrect. Here we block in order to not use the same miniq
again in scenario

miniq1 (X)
miniq2
miniq1 (yet there are reader using X)

This call_rcu has 0 relation to the barrier you are adding.


But again, we don't we just free qdisc in call_rcu and avoid the
barrier?


[PATCH v2 net-next 0/4] sfc: support extra stats on Medford2

2017-12-21 Thread Bert Kenward
X2000-series NICs add port stats for two new features: FEC (Forward Error
 Correction, used on 25G links) and CTPIO (cut-through programmed I/O).
This patch series adds support for reporting both of these sets of stats

v2: add additional Signed-off-by

Bert Kenward (1):
  sfc: expose CTPIO stats on NICs that support them

Edward Cree (3):
  sfc: update MCDI protocol headers
  sfc: support variable number of MAC stats
  sfc: expose FEC stats on Medford2

 drivers/net/ethernet/sfc/ef10.c   |   97 +-
 drivers/net/ethernet/sfc/efx.c|2 +
 drivers/net/ethernet/sfc/mcdi_pcol.h  | 2453 +++--
 drivers/net/ethernet/sfc/mcdi_port.c  |   10 +-
 drivers/net/ethernet/sfc/net_driver.h |3 +
 drivers/net/ethernet/sfc/nic.h|   24 +
 drivers/net/ethernet/sfc/siena.c  |2 +-
 7 files changed, 2435 insertions(+), 156 deletions(-)

-- 
2.13.6



[PATCH v2 net-next 3/4] sfc: expose FEC stats on Medford2

2017-12-21 Thread Bert Kenward
From: Edward Cree 

There's no explicit capability bit, so we just condition them on having
 efx->num_mac_stats >= MC_CMD_MAC_NSTATS_V2.

Signed-off-by: Edward Cree 
Signed-off-by: Bert Kenward 
---
 drivers/net/ethernet/sfc/ef10.c | 24 +++-
 drivers/net/ethernet/sfc/nic.h  |  7 +++
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index 27b981e7e786..352ca43a7395 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -1643,6 +1643,12 @@ static const struct efx_hw_stat_desc 
efx_ef10_stat_desc[EF10_STAT_COUNT] = {
EF10_DMA_STAT(tx_bad, VADAPTER_TX_BAD_PACKETS),
EF10_DMA_STAT(tx_bad_bytes, VADAPTER_TX_BAD_BYTES),
EF10_DMA_STAT(tx_overflow, VADAPTER_TX_OVERFLOW),
+   EF10_DMA_STAT(fec_uncorrected_errors, FEC_UNCORRECTED_ERRORS),
+   EF10_DMA_STAT(fec_corrected_errors, FEC_CORRECTED_ERRORS),
+   EF10_DMA_STAT(fec_corrected_symbols_lane0, FEC_CORRECTED_SYMBOLS_LANE0),
+   EF10_DMA_STAT(fec_corrected_symbols_lane1, FEC_CORRECTED_SYMBOLS_LANE1),
+   EF10_DMA_STAT(fec_corrected_symbols_lane2, FEC_CORRECTED_SYMBOLS_LANE2),
+   EF10_DMA_STAT(fec_corrected_symbols_lane3, FEC_CORRECTED_SYMBOLS_LANE3),
 };
 
 #define HUNT_COMMON_STAT_MASK ((1ULL << EF10_STAT_port_tx_bytes) | \
@@ -1718,6 +1724,19 @@ static const struct efx_hw_stat_desc 
efx_ef10_stat_desc[EF10_STAT_COUNT] = {
(1ULL << EF10_STAT_port_rx_dp_hlb_fetch) |  \
(1ULL << EF10_STAT_port_rx_dp_hlb_wait))
 
+/* These statistics are only provided if the NIC supports MC_CMD_MAC_STATS_V2,
+ * indicated by returning a value >= MC_CMD_MAC_NSTATS_V2 in
+ * MC_CMD_GET_CAPABILITIES_V4_OUT_MAC_STATS_NUM_STATS.
+ * These bits are in the second u64 of the raw mask.
+ */
+#define EF10_FEC_STAT_MASK (   \
+   (1ULL << (EF10_STAT_fec_uncorrected_errors - 64)) | \
+   (1ULL << (EF10_STAT_fec_corrected_errors - 64)) |   \
+   (1ULL << (EF10_STAT_fec_corrected_symbols_lane0 - 64)) |\
+   (1ULL << (EF10_STAT_fec_corrected_symbols_lane1 - 64)) |\
+   (1ULL << (EF10_STAT_fec_corrected_symbols_lane2 - 64)) |\
+   (1ULL << (EF10_STAT_fec_corrected_symbols_lane3 - 64)))
+
 static u64 efx_ef10_raw_stat_mask(struct efx_nic *efx)
 {
u64 raw_mask = HUNT_COMMON_STAT_MASK;
@@ -1756,10 +1775,13 @@ static void efx_ef10_get_stat_mask(struct efx_nic *efx, 
unsigned long *mask)
if (nic_data->datapath_caps &
(1 << MC_CMD_GET_CAPABILITIES_OUT_EVB_LBN)) {
raw_mask[0] |= ~((1ULL << EF10_STAT_rx_unicast) - 1);
-   raw_mask[1] = (1ULL << (EF10_STAT_COUNT - 63)) - 1;
+   raw_mask[1] = (1ULL << (EF10_STAT_V1_COUNT - 64)) - 1;
} else {
raw_mask[1] = 0;
}
+   /* Only show FEC stats when NIC supports MC_CMD_MAC_STATS_V2 */
+   if (efx->num_mac_stats >= MC_CMD_MAC_NSTATS_V2)
+   raw_mask[1] |= EF10_FEC_STAT_MASK;
 
 #if BITS_PER_LONG == 64
BUILD_BUG_ON(BITS_TO_LONGS(EF10_STAT_COUNT) != 2);
diff --git a/drivers/net/ethernet/sfc/nic.h b/drivers/net/ethernet/sfc/nic.h
index 7b51b6371724..e39e7b399252 100644
--- a/drivers/net/ethernet/sfc/nic.h
+++ b/drivers/net/ethernet/sfc/nic.h
@@ -325,6 +325,13 @@ enum {
EF10_STAT_tx_bad,
EF10_STAT_tx_bad_bytes,
EF10_STAT_tx_overflow,
+   EF10_STAT_V1_COUNT,
+   EF10_STAT_fec_uncorrected_errors = EF10_STAT_V1_COUNT,
+   EF10_STAT_fec_corrected_errors,
+   EF10_STAT_fec_corrected_symbols_lane0,
+   EF10_STAT_fec_corrected_symbols_lane1,
+   EF10_STAT_fec_corrected_symbols_lane2,
+   EF10_STAT_fec_corrected_symbols_lane3,
EF10_STAT_COUNT
 };
 
-- 
2.13.6




[PATCH v2 net-next 2/4] sfc: support variable number of MAC stats

2017-12-21 Thread Bert Kenward
From: Edward Cree 

Medford2 NICs support more than MC_CMD_MAC_NSTATS stats, and report the new
 count in a field of MC_CMD_GET_CAPABILITIES_V4.  This also means that the
 end generation count moves (it is, as before, the last 64 bits of the DMA
 buffer, but that is no longer MC_CMD_MAC_GENERATION_END).
So read num_mac_stats from the GET_CAPABILITIES response, if present;
 otherwise assume MC_CMD_MAC_NSTATS; and always use num_mac_stats - 1 rather
 than MC_CMD_MAC_GENERATION_END.

Signed-off-by: Edward Cree 
Signed-off-by: Bert Kenward 
---
 drivers/net/ethernet/sfc/ef10.c   | 23 ++-
 drivers/net/ethernet/sfc/efx.c|  2 ++
 drivers/net/ethernet/sfc/mcdi_port.c  | 10 +-
 drivers/net/ethernet/sfc/net_driver.h |  3 +++
 drivers/net/ethernet/sfc/siena.c  |  2 +-
 5 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index 1f64c7f60943..27b981e7e786 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -233,7 +233,7 @@ static int efx_ef10_get_vf_index(struct efx_nic *efx)
 
 static int efx_ef10_init_datapath_caps(struct efx_nic *efx)
 {
-   MCDI_DECLARE_BUF(outbuf, MC_CMD_GET_CAPABILITIES_V3_OUT_LEN);
+   MCDI_DECLARE_BUF(outbuf, MC_CMD_GET_CAPABILITIES_V4_OUT_LEN);
struct efx_ef10_nic_data *nic_data = efx->nic_data;
size_t outlen;
int rc;
@@ -306,6 +306,19 @@ static int efx_ef10_init_datapath_caps(struct efx_nic *efx)
  efx->vi_stride);
}
 
+   if (outlen >= MC_CMD_GET_CAPABILITIES_V4_OUT_LEN) {
+   efx->num_mac_stats = MCDI_WORD(outbuf,
+   GET_CAPABILITIES_V4_OUT_MAC_STATS_NUM_STATS);
+   netif_dbg(efx, probe, efx->net_dev,
+ "firmware reports num_mac_stats = %u\n",
+ efx->num_mac_stats);
+   } else {
+   /* leave num_mac_stats as the default value, MC_CMD_MAC_NSTATS 
*/
+   netif_dbg(efx, probe, efx->net_dev,
+ "firmware did not report num_mac_stats, assuming 
%u\n",
+ efx->num_mac_stats);
+   }
+
return 0;
 }
 
@@ -1850,7 +1863,7 @@ static int efx_ef10_try_update_nic_stats_pf(struct 
efx_nic *efx)
 
dma_stats = efx->stats_buffer.addr;
 
-   generation_end = dma_stats[MC_CMD_MAC_GENERATION_END];
+   generation_end = dma_stats[efx->num_mac_stats - 1];
if (generation_end == EFX_MC_STATS_GENERATION_INVALID)
return 0;
rmb();
@@ -1898,7 +1911,7 @@ static int efx_ef10_try_update_nic_stats_vf(struct 
efx_nic *efx)
DECLARE_BITMAP(mask, EF10_STAT_COUNT);
__le64 generation_start, generation_end;
u64 *stats = nic_data->stats;
-   u32 dma_len = MC_CMD_MAC_NSTATS * sizeof(u64);
+   u32 dma_len = efx->num_mac_stats * sizeof(u64);
struct efx_buffer stats_buf;
__le64 *dma_stats;
int rc;
@@ -1923,7 +1936,7 @@ static int efx_ef10_try_update_nic_stats_vf(struct 
efx_nic *efx)
}
 
dma_stats = stats_buf.addr;
-   dma_stats[MC_CMD_MAC_GENERATION_END] = EFX_MC_STATS_GENERATION_INVALID;
+   dma_stats[efx->num_mac_stats - 1] = EFX_MC_STATS_GENERATION_INVALID;
 
MCDI_SET_QWORD(inbuf, MAC_STATS_IN_DMA_ADDR, stats_buf.dma_addr);
MCDI_POPULATE_DWORD_1(inbuf, MAC_STATS_IN_CMD,
@@ -1942,7 +1955,7 @@ static int efx_ef10_try_update_nic_stats_vf(struct 
efx_nic *efx)
goto out;
}
 
-   generation_end = dma_stats[MC_CMD_MAC_GENERATION_END];
+   generation_end = dma_stats[efx->num_mac_stats - 1];
if (generation_end == EFX_MC_STATS_GENERATION_INVALID) {
WARN_ON_ONCE(1);
goto out;
diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index 7bcbedce07a5..3780161de5a1 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -2983,6 +2983,8 @@ static int efx_init_struct(struct efx_nic *efx,
efx->type->rx_ts_offset - efx->type->rx_prefix_size;
spin_lock_init(&efx->stats_lock);
efx->vi_stride = EFX_DEFAULT_VI_STRIDE;
+   efx->num_mac_stats = MC_CMD_MAC_NSTATS;
+   BUILD_BUG_ON(MC_CMD_MAC_NSTATS - 1 != MC_CMD_MAC_GENERATION_END);
mutex_init(&efx->mac_lock);
efx->phy_op = &efx_dummy_phy_operations;
efx->mdio.dev = net_dev;
diff --git a/drivers/net/ethernet/sfc/mcdi_port.c 
b/drivers/net/ethernet/sfc/mcdi_port.c
index 6e1f282b2976..65ee1a468170 100644
--- a/drivers/net/ethernet/sfc/mcdi_port.c
+++ b/drivers/net/ethernet/sfc/mcdi_port.c
@@ -1087,7 +1087,7 @@ static int efx_mcdi_mac_stats(struct efx_nic *efx,
int period = action == EFX_STATS_ENABLE ? 1000 : 0;
dma_addr_t dma_addr = efx->stats_buffer.dma_addr;
u32 dma_len = action != EFX_STATS_DISABLE ?
-   MC_CMD_MAC_NSTATS * sizeof(u64) : 0;
+  

[PATCH v2 net-next 4/4] sfc: expose CTPIO stats on NICs that support them

2017-12-21 Thread Bert Kenward
While the Linux driver doesn't use CTPIO ('cut-through programmed I/O'),
 other drivers on the same port might, so if we're responsible for
 reporting per-port stats we need to include the CTPIO stats.

Signed-off-by: Bert Kenward 
Signed-off-by: Edward Cree 
---
 drivers/net/ethernet/sfc/ef10.c | 50 +
 drivers/net/ethernet/sfc/nic.h  | 17 ++
 2 files changed, 67 insertions(+)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index 352ca43a7395..8ae467db9162 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -1649,6 +1649,23 @@ static const struct efx_hw_stat_desc 
efx_ef10_stat_desc[EF10_STAT_COUNT] = {
EF10_DMA_STAT(fec_corrected_symbols_lane1, FEC_CORRECTED_SYMBOLS_LANE1),
EF10_DMA_STAT(fec_corrected_symbols_lane2, FEC_CORRECTED_SYMBOLS_LANE2),
EF10_DMA_STAT(fec_corrected_symbols_lane3, FEC_CORRECTED_SYMBOLS_LANE3),
+   EF10_DMA_STAT(ctpio_dmabuf_start, CTPIO_DMABUF_START),
+   EF10_DMA_STAT(ctpio_vi_busy_fallback, CTPIO_VI_BUSY_FALLBACK),
+   EF10_DMA_STAT(ctpio_long_write_success, CTPIO_LONG_WRITE_SUCCESS),
+   EF10_DMA_STAT(ctpio_missing_dbell_fail, CTPIO_MISSING_DBELL_FAIL),
+   EF10_DMA_STAT(ctpio_overflow_fail, CTPIO_OVERFLOW_FAIL),
+   EF10_DMA_STAT(ctpio_underflow_fail, CTPIO_UNDERFLOW_FAIL),
+   EF10_DMA_STAT(ctpio_timeout_fail, CTPIO_TIMEOUT_FAIL),
+   EF10_DMA_STAT(ctpio_noncontig_wr_fail, CTPIO_NONCONTIG_WR_FAIL),
+   EF10_DMA_STAT(ctpio_frm_clobber_fail, CTPIO_FRM_CLOBBER_FAIL),
+   EF10_DMA_STAT(ctpio_invalid_wr_fail, CTPIO_INVALID_WR_FAIL),
+   EF10_DMA_STAT(ctpio_vi_clobber_fallback, CTPIO_VI_CLOBBER_FALLBACK),
+   EF10_DMA_STAT(ctpio_unqualified_fallback, CTPIO_UNQUALIFIED_FALLBACK),
+   EF10_DMA_STAT(ctpio_runt_fallback, CTPIO_RUNT_FALLBACK),
+   EF10_DMA_STAT(ctpio_success, CTPIO_SUCCESS),
+   EF10_DMA_STAT(ctpio_fallback, CTPIO_FALLBACK),
+   EF10_DMA_STAT(ctpio_poison, CTPIO_POISON),
+   EF10_DMA_STAT(ctpio_erase, CTPIO_ERASE),
 };
 
 #define HUNT_COMMON_STAT_MASK ((1ULL << EF10_STAT_port_tx_bytes) | \
@@ -1737,6 +1754,30 @@ static const struct efx_hw_stat_desc 
efx_ef10_stat_desc[EF10_STAT_COUNT] = {
(1ULL << (EF10_STAT_fec_corrected_symbols_lane2 - 64)) |\
(1ULL << (EF10_STAT_fec_corrected_symbols_lane3 - 64)))
 
+/* These statistics are only provided if the NIC supports MC_CMD_MAC_STATS_V3,
+ * indicated by returning a value >= MC_CMD_MAC_NSTATS_V3 in
+ * MC_CMD_GET_CAPABILITIES_V4_OUT_MAC_STATS_NUM_STATS.
+ * These bits are in the second u64 of the raw mask.
+ */
+#define EF10_CTPIO_STAT_MASK ( \
+   (1ULL << (EF10_STAT_ctpio_dmabuf_start - 64)) | \
+   (1ULL << (EF10_STAT_ctpio_vi_busy_fallback - 64)) | \
+   (1ULL << (EF10_STAT_ctpio_long_write_success - 64)) |   \
+   (1ULL << (EF10_STAT_ctpio_missing_dbell_fail - 64)) |   \
+   (1ULL << (EF10_STAT_ctpio_overflow_fail - 64)) |\
+   (1ULL << (EF10_STAT_ctpio_underflow_fail - 64)) |   \
+   (1ULL << (EF10_STAT_ctpio_timeout_fail - 64)) | \
+   (1ULL << (EF10_STAT_ctpio_noncontig_wr_fail - 64)) |\
+   (1ULL << (EF10_STAT_ctpio_frm_clobber_fail - 64)) | \
+   (1ULL << (EF10_STAT_ctpio_invalid_wr_fail - 64)) |  \
+   (1ULL << (EF10_STAT_ctpio_vi_clobber_fallback - 64)) |  \
+   (1ULL << (EF10_STAT_ctpio_unqualified_fallback - 64)) | \
+   (1ULL << (EF10_STAT_ctpio_runt_fallback - 64)) |\
+   (1ULL << (EF10_STAT_ctpio_success - 64)) |  \
+   (1ULL << (EF10_STAT_ctpio_fallback - 64)) | \
+   (1ULL << (EF10_STAT_ctpio_poison - 64)) |   \
+   (1ULL << (EF10_STAT_ctpio_erase - 64)))
+
 static u64 efx_ef10_raw_stat_mask(struct efx_nic *efx)
 {
u64 raw_mask = HUNT_COMMON_STAT_MASK;
@@ -1783,6 +1824,15 @@ static void efx_ef10_get_stat_mask(struct efx_nic *efx, 
unsigned long *mask)
if (efx->num_mac_stats >= MC_CMD_MAC_NSTATS_V2)
raw_mask[1] |= EF10_FEC_STAT_MASK;
 
+   /* CTPIO stats appear in V3. Only show them on devices that actually
+* support CTPIO. Although this driver doesn't use CTPIO others might,
+* and we may be reporting the stats for the underlying port.
+*/
+   if (efx->num_mac_stats >= MC_CMD_MAC_NSTATS_V3 &&
+   (nic_data->datapath_caps2 &
+(1 << MC_CMD_GET_CAPABILITIES_V4_OUT_CTPIO_LBN)))
+   raw_mask[1] |= EF10_CTPIO_STAT_MASK;
+
 #if BITS_PER_LONG == 64
BUILD_BUG_ON(BITS_TO_LONGS(EF10_STAT_COUNT) != 2);
mask[0] = raw_mask[0];
diff --git a/drivers/net/ethernet/sfc/nic.h b/drivers/net/ethernet/sfc/nic.h
index e39e7b399252..763052214525 100644
--- a/drivers/net/e

Re: [QUESTION] Doubt about NAPI_GRO_CB(skb)->is_atomic in tcpv4 gro process

2017-12-21 Thread Yunsheng Lin
Hi, Alexander

On 2017/12/21 0:24, Alexander Duyck wrote:
> On Wed, Dec 20, 2017 at 1:09 AM, Yunsheng Lin  wrote:
>> Hi, all
>> I have some doubt about NAPI_GRO_CB(skb)->is_atomic when
>> analyzing the tcpv4 gro process:
>>
>> Firstly we set NAPI_GRO_CB(skb)->is_atomic to 1 in dev_gro_receive:
>> https://elixir.free-electrons.com/linux/v4.15-rc4/source/net/core/dev.c#L4838
>>
>> And then in inet_gro_receive, we check the NAPI_GRO_CB(skb)->is_atomic
>> before setting NAPI_GRO_CB(skb)->is_atomic according to IP_DF bit in the ip 
>> header:
>> https://elixir.free-electrons.com/linux/v4.15-rc4/source/net/ipv4/af_inet.c#L1319
>>
>> struct sk_buff **inet_gro_receive(struct sk_buff **head, struct sk_buff *skb)
>> {
>> .
>> for (p = *head; p; p = p->next) {
>> 
>>
>> /* If the previous IP ID value was based on an atomic
>>  * datagram we can overwrite the value and ignore it.
>>  */
>> if (NAPI_GRO_CB(skb)->is_atomic)  //we 
>> check it here
>> NAPI_GRO_CB(p)->flush_id = flush_id;
>> else
>> NAPI_GRO_CB(p)->flush_id |= flush_id;
>> }
>>
>> NAPI_GRO_CB(skb)->is_atomic = !!(iph->frag_off & htons(IP_DF));  
>> //we set it here
>> NAPI_GRO_CB(skb)->flush |= flush;
>> skb_set_network_header(skb, off);
>> 
>> }
>>
>> My question is whether we should check the NAPI_GRO_CB(skb)->is_atomic or 
>> NAPI_GRO_CB(p)->is_atomic?
>> If we should check NAPI_GRO_CB(skb)->is_atomic, then maybe it is unnecessary 
>> because it is alway true.
>> If we should check NAPI_GRO_CB(p)->is_atomic, maybe there is a bug here.
>>
>> So what is the logic here? I am just start analyzing the gro, maybe I miss 
>> something obvious here.
> 
> The logic there is to address the multiple IP header case where there
> are 2 or more IP headers due to things like VXLAN or GRE tunnels. So
> what will happen is that an outer IP header will end up being sent
> with DF not set and will clear the is_atomic value then we want to OR
> in the next header that is applied. It defaults to assignment on
> is_atomic because the first IP header will encounter flush_id with no
> previous configuration occupying it.

I see your point now.

But for the same flow of tunnels packet, the outer and inner ip header must
have the same fixed id or increment id?

For example, if we have a flow of tunnels packet which has fixed id in outer
header and increment id in inner header(the inner header does have DF flag set):

1. For the first packet, NAPI_GRO_CB(skb)->is_atomic will be set to zero when
inet_gro_receive is processing the inner ip header.

2. For the second packet, when inet_gro_receive is processing the outer ip 
header
which has a fixed id, NAPI_GRO_CB(p)->is_atomic is zero according to [1], so
NAPI_GRO_CB(p)->flush_id will be set to 0x, then the second packet will not
be merged to first packet in tcp_gro_receive.


I thought outer ip header could have a fixed id while inner ip header could
have a increment id. Do I miss something here?


> 
> The part I am not sure about is if we should be using assignment for
> is_atomic or using an "&=" to clear the bit and leave it cleared.

I am not sure I understood you here. is_atomic is a bit field, why do you
want to use "&="?


Thank very much for your time reqlying.
Yunsheng Lin

 I
> don't know if there has been much testing of multiple levels of tunnel
> header.
>> Thanks.
> 
> - Alex
> 
> .
> 



[PATCH v4 2/5] batman-adv: Remove usage of BIT(x) in packet.h

2017-12-21 Thread Sven Eckelmann
The BIT(x) macro is no longer available for uapi headers because it is
defined outside of it (linux/bitops.h). The use of it must therefore be
avoided and replaced by an appropriate other representation.

Signed-off-by: Sven Eckelmann 
---
 net/batman-adv/packet.h | 23 +++
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/net/batman-adv/packet.h b/net/batman-adv/packet.h
index 6b6563867455..44f20d03205b 100644
--- a/net/batman-adv/packet.h
+++ b/net/batman-adv/packet.h
@@ -20,7 +20,6 @@
 #define _NET_BATMAN_ADV_PACKET_H_
 
 #include 
-#include 
 #include 
 #include 
 
@@ -92,9 +91,9 @@ enum batadv_subtype {
  * one hop neighbor on the interface where it was originally received.
  */
 enum batadv_iv_flags {
-   BATADV_NOT_BEST_NEXT_HOP   = BIT(0),
-   BATADV_PRIMARIES_FIRST_HOP = BIT(1),
-   BATADV_DIRECTLINK  = BIT(2),
+   BATADV_NOT_BEST_NEXT_HOP   = 1UL << 0,
+   BATADV_PRIMARIES_FIRST_HOP = 1UL << 1,
+   BATADV_DIRECTLINK  = 1UL << 2,
 };
 
 /**
@@ -123,9 +122,9 @@ enum batadv_icmp_packettype {
  * @BATADV_MCAST_WANT_ALL_IPV6: we want all IPv6 multicast packets
  */
 enum batadv_mcast_flags {
-   BATADV_MCAST_WANT_ALL_UNSNOOPABLES  = BIT(0),
-   BATADV_MCAST_WANT_ALL_IPV4  = BIT(1),
-   BATADV_MCAST_WANT_ALL_IPV6  = BIT(2),
+   BATADV_MCAST_WANT_ALL_UNSNOOPABLES  = 1UL << 0,
+   BATADV_MCAST_WANT_ALL_IPV4  = 1UL << 1,
+   BATADV_MCAST_WANT_ALL_IPV6  = 1UL << 2,
 };
 
 /* tt data subtypes */
@@ -139,10 +138,10 @@ enum batadv_mcast_flags {
  * @BATADV_TT_FULL_TABLE: contains full table to replace existing table
  */
 enum batadv_tt_data_flags {
-   BATADV_TT_OGM_DIFF   = BIT(0),
-   BATADV_TT_REQUEST= BIT(1),
-   BATADV_TT_RESPONSE   = BIT(2),
-   BATADV_TT_FULL_TABLE = BIT(4),
+   BATADV_TT_OGM_DIFF   = 1UL << 0,
+   BATADV_TT_REQUEST= 1UL << 1,
+   BATADV_TT_RESPONSE   = 1UL << 2,
+   BATADV_TT_FULL_TABLE = 1UL << 4,
 };
 
 /**
@@ -150,7 +149,7 @@ enum batadv_tt_data_flags {
  * @BATADV_VLAN_HAS_TAG: whether the field contains a valid vlan tag or not
  */
 enum batadv_vlan_flags {
-   BATADV_VLAN_HAS_TAG = BIT(15),
+   BATADV_VLAN_HAS_TAG = 1UL << 15,
 };
 
 /**
-- 
2.11.0



[PATCH v4 5/5] flow_dissector: Parse batman-adv unicast headers

2017-12-21 Thread Sven Eckelmann
The batman-adv unicast packets contain a full layer 2 frame in encapsulated
form. The flow dissector must therefore be able to parse the batman-adv
unicast header to reach the layer 2+3 information.

  ++
  | ip(v6)hdr  |
  ++
  | inner ethhdr   |
  ++
  | batadv unicast hdr |
  ++
  | outer ethhdr   |
  ++

The obtained information from the upper layer can then be used by RPS to
schedule the processing on separate cores. This allows better distribution
of multiple flows from the same neighbor to different cores.

Signed-off-by: Sven Eckelmann 
---
 net/core/flow_dissector.c | 57 +++
 1 file changed, 57 insertions(+)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 15ce30063765..fa0a4879fb9d 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static void dissector_set_key(struct flow_dissector *flow_dissector,
  enum flow_dissector_key_id key_id)
@@ -436,6 +437,57 @@ __skb_flow_dissect_gre(const struct sk_buff *skb,
return FLOW_DISSECT_RET_PROTO_AGAIN;
 }
 
+/**
+ * __skb_flow_dissect_batadv() - dissect batman-adv header
+ * @skb: sk_buff to with the batman-adv header
+ * @key_control: flow dissectors control key
+ * @data: raw buffer pointer to the packet, if NULL use skb->data
+ * @p_proto: pointer used to update the protocol to process next
+ * @p_nhoff: pointer used to update inner network header offset
+ * @hlen: packet header length
+ * @flags: any combination of FLOW_DISSECTOR_F_*
+ *
+ * ETH_P_BATMAN packets are tried to be dissected. Only
+ * &struct batadv_unicast packets are actually processed because they contain 
an
+ * inner ethernet header and are usually followed by actual network header. 
This
+ * allows the flow dissector to continue processing the packet.
+ *
+ * Return: FLOW_DISSECT_RET_PROTO_AGAIN when &struct batadv_unicast was found,
+ *  FLOW_DISSECT_RET_OUT_GOOD when dissector should stop after encapsulation,
+ *  otherwise FLOW_DISSECT_RET_OUT_BAD
+ */
+static enum flow_dissect_ret
+__skb_flow_dissect_batadv(const struct sk_buff *skb,
+ struct flow_dissector_key_control *key_control,
+ void *data, __be16 *p_proto, int *p_nhoff, int hlen,
+ unsigned int flags)
+{
+   struct {
+   struct batadv_unicast_packet batadv_unicast;
+   struct ethhdr eth;
+   } *hdr, _hdr;
+
+   hdr = __skb_header_pointer(skb, *p_nhoff, sizeof(_hdr), data, hlen,
+  &_hdr);
+   if (!hdr)
+   return FLOW_DISSECT_RET_OUT_BAD;
+
+   if (hdr->batadv_unicast.version != BATADV_COMPAT_VERSION)
+   return FLOW_DISSECT_RET_OUT_BAD;
+
+   if (hdr->batadv_unicast.packet_type != BATADV_UNICAST)
+   return FLOW_DISSECT_RET_OUT_BAD;
+
+   *p_proto = hdr->eth.h_proto;
+   *p_nhoff += sizeof(*hdr);
+
+   key_control->flags |= FLOW_DIS_ENCAPSULATION;
+   if (flags & FLOW_DISSECTOR_F_STOP_AT_ENCAP)
+   return FLOW_DISSECT_RET_OUT_GOOD;
+
+   return FLOW_DISSECT_RET_PROTO_AGAIN;
+}
+
 static void
 __skb_flow_dissect_tcp(const struct sk_buff *skb,
   struct flow_dissector *flow_dissector,
@@ -817,6 +869,11 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
   nhoff, hlen);
break;
 
+   case htons(ETH_P_BATMAN):
+   fdret = __skb_flow_dissect_batadv(skb, key_control, data,
+ &proto, &nhoff, hlen, flags);
+   break;
+
default:
fdret = FLOW_DISSECT_RET_OUT_BAD;
break;
-- 
2.11.0



Re: [PATCH] net: phy: micrel: ksz9031: reconfigure autoneg after phy autoneg workaround

2017-12-21 Thread Andrew Lunn
On Wed, Dec 20, 2017 at 06:45:10PM -0600, Grygorii Strashko wrote:
> Under some circumstances driver will perform PHY reset in
> ksz9031_read_status() to fix autoneg failure case (idle error count =
> 0xFF). When this happens ksz9031 will not detect link status change any
> more when connecting to Netgear 1G switch (link can be recovered sometimes by
> restarting netdevice "ifconfig down up"). Reproduced with TI am572x board
> equipped with ksz9031 PHY while connecting to Netgear 1G switch.
> 
> Fix the issue by reconfiguring autonegotiation after PHY reset in
> ksz9031_read_status().

Hi Grygorii

I can understand the fix.

But i'm wondering if there is a better way to do this. Can you call
phy_stop() and phy_start(). You then get the core phy code doing the
same initialisation as what happened the first time. However, i know
this is not easy. _read_status() is being called from the middle of
the state machine, and trying to change the state of the state machine
at this point is problematic.

   Andrew


[PATCH v4 0/5] flow_dissector: Provide basic batman-adv unicast handling

2017-12-21 Thread Sven Eckelmann
Hi,

we are currently starting to use batman-adv as mesh protocol on multicore
embedded devices. These usually don't have a lot of CPU power per core but
are reasonable fast when using multiple cores.

It was noticed that sending was working very well but receiving was
basically only using on CPU core per neighbor. The reason for that is
format of the (normal) incoming packet:

  ++
  | ip(v6)hdr  |
  ++
  | inner ethhdr   |
  ++
  | batadv unicast hdr |
  ++
  | outer ethhdr   |
  ++

The flow dissector will therefore stop after parsing the outer ethernet
header and will not parse the actual ipv(4|6)/... header of the packet. Our
assumption was now that it would help us to add minimal support to the flow
dissector to jump over the batman-adv unicast and inner ethernet header
(like in gre ETH_P_TEB). The patch was implemented in a slightly hacky
way [1] and the results looked quite promising.

I didn't get any feedback how the files should actually be named. So I am
now just using the names from RFC v3

The discussion of the RFC v3 can be found in the related patches of
https://patchwork.ozlabs.org/cover/849345/

The discussion of the RFC v2 can be found in the related patches of
https://patchwork.ozlabs.org/cover/844783/


Changes in v4:
==

* added  patch to change the u8/u16 to __u8/__u16 in
  include/uapi/linux/batadv_packet.h
  - requested by Willem de Bruijn 

Changes in v3:
==

* removed change of uapi/linux/batman_adv.h to uapi/linux/batadv_genl.h
  - requested by Willem de Bruijn 
* removed naming fixes for enums/defines in uapi/linux/batadv_genl.h
  - requested by Willem de Bruijn 
* renamed uapi/linux/batadv.h to uapi/linux/batadv_packet.h
* moved batadv dissector functionality in own function
  - requested by Tom Herbert 
* added support for flags FLOW_DISSECTOR_F_STOP_AT_ENCAP and
  FLOW_DIS_ENCAPSULATION
  - requested by Willem de Bruijn 

Changes in v2:
==

* removed the batman-adv unicast packet header definition from flow_dissector.c
* moved the batman-adv packet.h/uapi headers around to provide the correct
  definitions to flow_dissector.c

Kind regards,
Sven

Sven Eckelmann (5):
  batman-adv: Let packet.h include its headers directly
  batman-adv: Remove usage of BIT(x) in packet.h
  batman-adv: Remove kernel fixed width types in packet.h
  batman-adv: Convert packet.h to uapi header
  flow_dissector: Parse batman-adv unicast headers

 MAINTAINERS|   1 +
 .../packet.h => include/uapi/linux/batadv_packet.h | 245 +++--
 net/batman-adv/bat_iv_ogm.c|   2 +-
 net/batman-adv/bat_v.c |   2 +-
 net/batman-adv/bat_v_elp.c |   2 +-
 net/batman-adv/bat_v_ogm.c |   2 +-
 net/batman-adv/bridge_loop_avoidance.c |   2 +-
 net/batman-adv/distributed-arp-table.h |   2 +-
 net/batman-adv/fragmentation.c |   2 +-
 net/batman-adv/gateway_client.c|   2 +-
 net/batman-adv/gateway_common.c|   2 +-
 net/batman-adv/hard-interface.c|   2 +-
 net/batman-adv/icmp_socket.c   |   2 +-
 net/batman-adv/main.c  |   2 +-
 net/batman-adv/main.h  |   4 +-
 net/batman-adv/multicast.c |   2 +-
 net/batman-adv/netlink.c   |   2 +-
 net/batman-adv/network-coding.c|   2 +-
 net/batman-adv/routing.c   |   2 +-
 net/batman-adv/send.h  |   3 +-
 net/batman-adv/soft-interface.c|   2 +-
 net/batman-adv/sysfs.c |   2 +-
 net/batman-adv/tp_meter.c  |   2 +-
 net/batman-adv/translation-table.c |   2 +-
 net/batman-adv/tvlv.c  |   2 +-
 net/batman-adv/types.h |   3 +-
 net/core/flow_dissector.c  |  57 +
 27 files changed, 205 insertions(+), 150 deletions(-)
 rename net/batman-adv/packet.h => include/uapi/linux/batadv_packet.h (85%)

-- 
2.11.0



[PATCH v4 3/5] batman-adv: Remove kernel fixed width types in packet.h

2017-12-21 Thread Sven Eckelmann
The uapi headers use the __u8/__u16/... version of the fixed width types
instead of u8/u16/... The use of the latter must be avoided before
packet.h is copied to include/uapi/linux/.

Signed-off-by: Sven Eckelmann 
---
 net/batman-adv/packet.h | 214 
 1 file changed, 107 insertions(+), 107 deletions(-)

diff --git a/net/batman-adv/packet.h b/net/batman-adv/packet.h
index 44f20d03205b..3b2d2db993aa 100644
--- a/net/batman-adv/packet.h
+++ b/net/batman-adv/packet.h
@@ -29,7 +29,7 @@
  *
  * Return: 0 when not error was detected, != 0 otherwise
  */
-#define batadv_tp_is_error(n) ((u8)(n) > 127 ? 1 : 0)
+#define batadv_tp_is_error(n) ((__u8)(n) > 127 ? 1 : 0)
 
 /**
  * enum batadv_packettype - types for batman-adv encapsulated packets
@@ -191,8 +191,8 @@ enum batadv_tvlv_type {
  * transport the claim type and the group id
  */
 struct batadv_bla_claim_dst {
-   u8 magic[3];/* FF:43:05 */
-   u8 type;/* bla_claimframe */
+   __u8   magic[3];/* FF:43:05 */
+   __u8   type;/* bla_claimframe */
__be16 group;   /* group id */
 };
 
@@ -212,15 +212,15 @@ struct batadv_bla_claim_dst {
  * @tvlv_len: length of tvlv data following the ogm header
  */
 struct batadv_ogm_packet {
-   u8 packet_type;
-   u8 version;
-   u8 ttl;
-   u8 flags;
+   __u8   packet_type;
+   __u8   version;
+   __u8   ttl;
+   __u8   flags;
__be32 seqno;
-   u8 orig[ETH_ALEN];
-   u8 prev_sender[ETH_ALEN];
-   u8 reserved;
-   u8 tq;
+   __u8   orig[ETH_ALEN];
+   __u8   prev_sender[ETH_ALEN];
+   __u8   reserved;
+   __u8   tq;
__be16 tvlv_len;
/* __packed is not needed as the struct size is divisible by 4,
 * and the largest data type in this struct has a size of 4.
@@ -241,12 +241,12 @@ struct batadv_ogm_packet {
  * @throughput: the currently flooded path throughput
  */
 struct batadv_ogm2_packet {
-   u8 packet_type;
-   u8 version;
-   u8 ttl;
-   u8 flags;
+   __u8   packet_type;
+   __u8   version;
+   __u8   ttl;
+   __u8   flags;
__be32 seqno;
-   u8 orig[ETH_ALEN];
+   __u8   orig[ETH_ALEN];
__be16 tvlv_len;
__be32 throughput;
/* __packed is not needed as the struct size is divisible by 4,
@@ -265,9 +265,9 @@ struct batadv_ogm2_packet {
  * @elp_interval: currently used ELP sending interval in ms
  */
 struct batadv_elp_packet {
-   u8 packet_type;
-   u8 version;
-   u8 orig[ETH_ALEN];
+   __u8   packet_type;
+   __u8   version;
+   __u8   orig[ETH_ALEN];
__be32 seqno;
__be32 elp_interval;
 };
@@ -290,14 +290,14 @@ struct batadv_elp_packet {
  * members are padded the same way as they are in real packets.
  */
 struct batadv_icmp_header {
-   u8 packet_type;
-   u8 version;
-   u8 ttl;
-   u8 msg_type; /* see ICMP message types above */
-   u8 dst[ETH_ALEN];
-   u8 orig[ETH_ALEN];
-   u8 uid;
-   u8 align[3];
+   __u8 packet_type;
+   __u8 version;
+   __u8 ttl;
+   __u8 msg_type; /* see ICMP message types above */
+   __u8 dst[ETH_ALEN];
+   __u8 orig[ETH_ALEN];
+   __u8 uid;
+   __u8 align[3];
 };
 
 /**
@@ -313,14 +313,14 @@ struct batadv_icmp_header {
  * @seqno: ICMP sequence number
  */
 struct batadv_icmp_packet {
-   u8 packet_type;
-   u8 version;
-   u8 ttl;
-   u8 msg_type; /* see ICMP message types above */
-   u8 dst[ETH_ALEN];
-   u8 orig[ETH_ALEN];
-   u8 uid;
-   u8 reserved;
+   __u8   packet_type;
+   __u8   version;
+   __u8   ttl;
+   __u8   msg_type; /* see ICMP message types above */
+   __u8   dst[ETH_ALEN];
+   __u8   orig[ETH_ALEN];
+   __u8   uid;
+   __u8   reserved;
__be16 seqno;
 };
 
@@ -342,15 +342,15 @@ struct batadv_icmp_packet {
  *  store it using network order
  */
 struct batadv_icmp_tp_packet {
-   u8  packet_type;
-   u8  version;
-   u8  ttl;
-   u8  msg_type; /* see ICMP message types above */
-   u8  dst[ETH_ALEN];
-   u8  orig[ETH_ALEN];
-   u8  uid;
-   u8  subtype;
-   u8  session[2];
+   __u8   packet_type;
+   __u8   version;
+   __u8   ttl;
+   __u8   msg_type; /* see ICMP message types above */
+   __u8   dst[ETH_ALEN];
+   __u8   orig[ETH_ALEN];
+   __u8   uid;
+   __u8   subtype;
+   __u8   session[2];
__be32 seqno;
__be32 timestamp;
 };
@@ -381,16 +381,16 @@ enum batadv_icmp_tp_subtype {
  * @rr: route record array
  */
 struct batadv_icmp_packet_rr {
-   u8 packet_type;
-   u8 version;
-   u8 ttl;
-   u8 msg_type; /* see ICMP message types above */
-   u8 dst[ETH_ALEN];
-   u8 orig[ETH_

[PATCH v4 1/5] batman-adv: Let packet.h include its headers directly

2017-12-21 Thread Sven Eckelmann
The headers used by packet.h should also be included by it directly. main.h
is currently dealing with it in batman-adv, but this will no longer work
when this header is moved to include/uapi/linux/.

Signed-off-by: Sven Eckelmann 
---
 net/batman-adv/main.h   | 2 --
 net/batman-adv/packet.h | 2 ++
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/batman-adv/main.h b/net/batman-adv/main.h
index 5ac86df48c42..d5484ac381d3 100644
--- a/net/batman-adv/main.h
+++ b/net/batman-adv/main.h
@@ -217,10 +217,8 @@ enum batadv_uev_type {
 
 /* Kernel headers */
 
-#include  /* for packet.h */
 #include 
 #include 
-#include  /* for packet.h */
 #include 
 #include 
 #include 
diff --git a/net/batman-adv/packet.h b/net/batman-adv/packet.h
index dccbd4a6f019..6b6563867455 100644
--- a/net/batman-adv/packet.h
+++ b/net/batman-adv/packet.h
@@ -20,6 +20,8 @@
 #define _NET_BATMAN_ADV_PACKET_H_
 
 #include 
+#include 
+#include 
 #include 
 
 /**
-- 
2.11.0



[PATCH v4 4/5] batman-adv: Convert packet.h to uapi header

2017-12-21 Thread Sven Eckelmann
The header file is used by different userspace programs to inject packets
or to decode sniffed packets. It should therefore be available to them as
userspace header.

Also other components in the kernel (like the flow dissector) require
access to the packet definitions to be able to decode ETH_P_BATMAN ethernet
packets.

Signed-off-by: Sven Eckelmann 
---
 MAINTAINERS   | 1 +
 net/batman-adv/packet.h => include/uapi/linux/batadv_packet.h | 8 
 net/batman-adv/bat_iv_ogm.c   | 2 +-
 net/batman-adv/bat_v.c| 2 +-
 net/batman-adv/bat_v_elp.c| 2 +-
 net/batman-adv/bat_v_ogm.c| 2 +-
 net/batman-adv/bridge_loop_avoidance.c| 2 +-
 net/batman-adv/distributed-arp-table.h| 2 +-
 net/batman-adv/fragmentation.c| 2 +-
 net/batman-adv/gateway_client.c   | 2 +-
 net/batman-adv/gateway_common.c   | 2 +-
 net/batman-adv/hard-interface.c   | 2 +-
 net/batman-adv/icmp_socket.c  | 2 +-
 net/batman-adv/main.c | 2 +-
 net/batman-adv/main.h | 2 +-
 net/batman-adv/multicast.c| 2 +-
 net/batman-adv/netlink.c  | 2 +-
 net/batman-adv/network-coding.c   | 2 +-
 net/batman-adv/routing.c  | 2 +-
 net/batman-adv/send.h | 3 +--
 net/batman-adv/soft-interface.c   | 2 +-
 net/batman-adv/sysfs.c| 2 +-
 net/batman-adv/tp_meter.c | 2 +-
 net/batman-adv/translation-table.c| 2 +-
 net/batman-adv/tvlv.c | 2 +-
 net/batman-adv/types.h| 3 +--
 26 files changed, 29 insertions(+), 30 deletions(-)
 rename net/batman-adv/packet.h => include/uapi/linux/batadv_packet.h (99%)

diff --git a/MAINTAINERS b/MAINTAINERS
index aa71ab52fd76..4d6af00a5f10 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2563,6 +2563,7 @@ S:Maintained
 F: Documentation/ABI/testing/sysfs-class-net-batman-adv
 F: Documentation/ABI/testing/sysfs-class-net-mesh
 F: Documentation/networking/batman-adv.rst
+F: include/uapi/linux/batadv_packet.h
 F: include/uapi/linux/batman_adv.h
 F: net/batman-adv/
 
diff --git a/net/batman-adv/packet.h b/include/uapi/linux/batadv_packet.h
similarity index 99%
rename from net/batman-adv/packet.h
rename to include/uapi/linux/batadv_packet.h
index 3b2d2db993aa..5cb360be2a11 100644
--- a/net/batman-adv/packet.h
+++ b/include/uapi/linux/batadv_packet.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: GPL-2.0 */
+/* SPDX-License-Identifier: (GPL-2.0 WITH Linux-syscall-note) */
 /* Copyright (C) 2007-2017  B.A.T.M.A.N. contributors:
  *
  * Marek Lindner, Simon Wunderlich
@@ -16,8 +16,8 @@
  * along with this program; if not, see .
  */
 
-#ifndef _NET_BATMAN_ADV_PACKET_H_
-#define _NET_BATMAN_ADV_PACKET_H_
+#ifndef _UAPI_LINUX_BATADV_PACKET_H_
+#define _UAPI_LINUX_BATADV_PACKET_H_
 
 #include 
 #include 
@@ -641,4 +641,4 @@ struct batadv_tvlv_mcast_data {
__u8 reserved[3];
 };
 
-#endif /* _NET_BATMAN_ADV_PACKET_H_ */
+#endif /* _UAPI_LINUX_BATADV_PACKET_H_ */
diff --git a/net/batman-adv/bat_iv_ogm.c b/net/batman-adv/bat_iv_ogm.c
index c9955f29a2bf..a4a331c56a60 100644
--- a/net/batman-adv/bat_iv_ogm.c
+++ b/net/batman-adv/bat_iv_ogm.c
@@ -52,6 +52,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "bat_algo.h"
@@ -63,7 +64,6 @@
 #include "netlink.h"
 #include "network-coding.h"
 #include "originator.h"
-#include "packet.h"
 #include "routing.h"
 #include "send.h"
 #include "translation-table.h"
diff --git a/net/batman-adv/bat_v.c b/net/batman-adv/bat_v.c
index 14ec3677c391..f5abe4a4e247 100644
--- a/net/batman-adv/bat_v.c
+++ b/net/batman-adv/bat_v.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "bat_algo.h"
@@ -49,7 +50,6 @@
 #include "log.h"
 #include "netlink.h"
 #include "originator.h"
-#include "packet.h"
 
 struct sk_buff;
 
diff --git a/net/batman-adv/bat_v_elp.c b/net/batman-adv/bat_v_elp.c
index 59ae96cef596..a83478c46597 100644
--- a/net/batman-adv/bat_v_elp.c
+++ b/net/batman-adv/bat_v_elp.c
@@ -42,13 +42,13 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "bat_algo.h"
 #include "bat_v_ogm.h"
 #include "hard-interface.h"
 #include "log.h"
 #include "originator.h"
-#include "packet.h"
 #include "routing.h"
 #include "send.h"
 
diff --git a/net/batman-adv/bat_v_ogm.c b/

[PATCH net-next] rtnetlink: Replace implementation of ASSERT_RTNL() macro with WARN_ONCE()

2017-12-21 Thread Leon Romanovsky
From: Leon Romanovsky 

ASSERT_RTNL() macro is actual open-coded variant of WARN_ONCE() with
two exceptions. First, it prints stack for multiple hits and not only
once as WARN_ONCE() does. Second, the user can disable prints of
WARN_ONCE by setting CONFIG_BUG to N.

The multiple prints of dump stack are actually not needed, because calls
without rtnl lock are programming errors and user can't do anything
about them except to complain to the mailing list after first occurrence
of such failure.

The user who disabled BUG/WARN prints did it explicitly because by default
in upstream kernel and distributions this option is enabled. It means
that user doesn't want to see prints about missing locks too.

This patch replaces open-coded variant in favor of already existing
macro and change error prints to be once only.

Reviewed-by: Mark Bloch 
Signed-off-by: Leon Romanovsky 
---
 include/linux/rtnetlink.h | 10 +++---
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 2032ce2eb20b..62d508b31f56 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -97,13 +97,9 @@ void rtnetlink_init(void);
 void __rtnl_unlock(void);
 void rtnl_kfree_skbs(struct sk_buff *head, struct sk_buff *tail);

-#define ASSERT_RTNL() do { \
-   if (unlikely(!rtnl_is_locked())) { \
-   printk(KERN_ERR "RTNL: assertion failed at %s (%d)\n", \
-  __FILE__,  __LINE__); \
-   dump_stack(); \
-   } \
-} while(0)
+#define ASSERT_RTNL() \
+   WARN_ONCE(!rtnl_is_locked(), \
+ "RTNL: assertion failed at %s (%d)\n", __FILE__,  __LINE__)

 extern int ndo_dflt_fdb_dump(struct sk_buff *skb,
 struct netlink_callback *cb,
--
2.15.1



null-ptr-deref in tcf_block_put

2017-12-21 Thread Prashant Bhole


Hi,
Recently I tried tools/testing/selftests/net/rtnetlink.sh with KASAN 
enabled and encountered following BUG.


kernel: ==
kernel: BUG: KASAN: null-ptr-deref in tcf_block_put+0x8c/0xc0
kernel: Read of size 8 at addr 0018 by task tc/2966
kernel:
kernel: CPU: 0 PID: 2966 Comm: tc Not tainted 4.15.0-rc3+ #24
kernel: Hardware name: Hewlett-Packard HP Z440 Workstation/212B, BIOS 
M60 v02.34 05/18/2017

kernel: Call Trace:
kernel:  dump_stack+0xaf/0x127
kernel:  ? _atomic_dec_and_lock+0x159/0x159
kernel:  ? tcf_block_put_ext+0x215/0x270
kernel:  kasan_report+0x15f/0x360
kernel:  ? tcf_block_put+0x8c/0xc0
kernel:  tcf_block_put+0x8c/0xc0
kernel:  ? tcf_block_put_ext+0x270/0x270
kernel:  ? kfree+0x9c/0x1b0
kernel:  htb_destroy_class.isra.17+0x54/0x70 [sch_htb]
kernel:  htb_destroy+0x122/0x200 [sch_htb]
kernel:  qdisc_destroy+0xa4/0x2a0
kernel:  ? rtnetlink_send+0x94/0xa0
kernel:  qdisc_graft+0x530/0x650
kernel:  tc_get_qdisc+0x235/0x370
kernel:  ? tc_ctl_tclass+0x5f0/0x5f0
kernel:  ? security_capable+0x2d/0x70
kernel:  rtnetlink_rcv_msg+0x69c/0x790
kernel:  ? rtnl_calcit.isra.26+0x250/0x250
kernel:  ? depot_save_stack+0x12d/0x470
kernel:  ? save_stack+0x89/0xb0
kernel:  ? kasan_kmalloc+0xa0/0xd0
kernel:  ? __kmalloc_node_track_caller+0x192/0x2d0
kernel:  ? __kmalloc_reserve.isra.39+0x2e/0x80
kernel:  ? __alloc_skb+0xf9/0x3a0
kernel:  ? netlink_sendmsg+0x558/0x680
kernel:  ? sock_sendmsg+0x6b/0x80
kernel:  ? ___sys_sendmsg+0x49a/0x500
kernel:  ? __sys_sendmsg+0xb5/0x150
kernel:  ? entry_SYSCALL_64_fastpath+0x1a/0x7d
kernel:  ? __alloc_skb+0xc9/0x3a0
kernel:  ? netlink_sendmsg+0x558/0x680
kernel:  ? sock_sendmsg+0x6b/0x80
kernel:  ? ___sys_sendmsg+0x49a/0x500
kernel:  ? __sys_sendmsg+0xb5/0x150
kernel:  ? entry_SYSCALL_64_fastpath+0x1a/0x7d
kernel:  ? lru_cache_add+0x145/0x210
kernel:  ? lru_cache_add_file+0x10/0x10
kernel:  ? mem_cgroup_low+0x140/0x140
kernel:  ? netlink_compare+0x53/0x70
kernel:  ? __netlink_lookup+0x2d3/0x3e0
kernel:  ? netlink_broadcast+0x20/0x20
kernel:  ? memcg_kmem_get_cache+0x4e0/0x4e0
kernel:  ? netlink_deliver_tap+0x10b/0x530
kernel:  ? kasan_kmalloc+0xa0/0xd0
kernel:  ? netlink_has_listeners+0x170/0x170
kernel:  ? __kmalloc_node_track_caller+0x231/0x2d0
kernel:  ? iov_iter_advance+0x176/0x7a0
kernel:  netlink_rcv_skb+0x122/0x230
kernel:  ? rtnl_calcit.isra.26+0x250/0x250
kernel:  ? netlink_ack+0x4b0/0x4b0
kernel:  ? netlink_trim+0x123/0x1c0
kernel:  ? alloc_pages_vma+0x93/0x260
kernel:  netlink_unicast+0x2c2/0x360
kernel:  ? netlink_attachskb+0x3f0/0x3f0
kernel:  ? import_iovec+0x128/0x1d0
kernel:  netlink_sendmsg+0x528/0x680
kernel:  ? netlink_unicast+0x360/0x360
kernel:  ? netlink_unicast+0x360/0x360
kernel:  sock_sendmsg+0x6b/0x80
kernel:  ___sys_sendmsg+0x49a/0x500
kernel:  ? copy_msghdr_from_user+0x260/0x260
kernel:  ? netlink_sendmsg+0x2b2/0x680
kernel:  ? netlink_unicast+0x360/0x360
kernel:  ? mem_cgroup_from_task+0x9c/0xe0
kernel:  ? mem_cgroup_reset+0x190/0x190
kernel:  ? __fget_light+0x17e/0x200
kernel:  ? expand_files+0x570/0x570
kernel:  ? handle_mm_fault+0x1ca/0x380
kernel:  ? __handle_mm_fault+0x1f10/0x1f10
kernel:  ? vmacache_find+0xe6/0x110
kernel:  ? __do_page_fault+0x5c5/0x6d0
kernel:  ? __sys_sendmsg+0xb5/0x150
kernel:  __sys_sendmsg+0xb5/0x150
kernel:  ? SyS_shutdown+0x160/0x160
kernel:  ? kmem_cache_free+0x7c/0x1f0
kernel:  ? __do_page_fault+0x6d0/0x6d0
kernel:  ? do_sys_open+0x1f0/0x380
kernel:  entry_SYSCALL_64_fastpath+0x1a/0x7d



After some investigation I found this commit:
[1] https://patchwork.ozlabs.org/patch/833596 which fixed this bug.

But recently accepted commit:
[2] https://patchwork.ozlabs.org/patch/849101/ reverted it.

So I tried same fix in [1] on top of latest net-next. The bug did not 
reproduce.



-Prashant




Re: [Intel-wired-lan] v4.15-rc2 on thinkpad x60: ethernet stopped working

2017-12-21 Thread Paul Menzel

Dear Pavel,


On 12/10/17 09:39, Pavel Machek wrote:

In v4.15-rc2+, network manager can not see my ethernet card, and 
manual attempts to ifconfig it up did not really help, either.


Card is:

02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit
Ethernet Controller

Dmesg says:

   dmesg | grep eth
[0.648931] e1000e :02:00.0 eth0: (PCI Express:2.5GT/s:Width x1) 
00:16:d3:25:19:04
[0.648934] e1000e :02:00.0 eth0: Intel(R) PRO/1000 Network Connection
[0.649012] e1000e :02:00.0 eth0: MAC: 2, PHY: 2, PBA No: 005302-003
[0.706510] usbcore: registered new interface driver cdc_ether
[6.557022] e1000e :02:00.0 eth1: renamed from eth0
[6.577554] systemd-udevd[2363]: renamed network interface eth0 to eth1

Any ideas?


I am sorry for jumping in on this so late.

Since using a Lenovo X60t more ofter since a year or so, I experience 
the issue, that the link doesn’t come up during that period occasionally 
[1]. I haven’t tried the latest patches yet, but it might be unrelated.


Did you test, that removing and loading the e1000e module fixes the issue?

Additionally, I was asked privately to test the out of tree driver, but 
haven’t had time yet.



You might give the out-of-tree driver a try, you can download it here
https://downloadcenter.intel.com/download/26549/Intel-Network-Adapter-Driver-for-PCIe-Intel-Gigabit-Ethernet-Network-Connections-Under-Linux-?product=19297
 It's version is 3.3.5, dated 8/2016


Kind regards,

Paul


[1] 
https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20170410/008590.html
--- Begin Message ---

Dear Linux folks,


On a Lenovo X60t, in very rare cases, the link does not get ready with 
Linux 4.11-rc6, and also versions before it.


```
02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet 
Controller

```

`ip a` shows that the link is down, and unplugging and plugging the 
network cable back in does not help.


```
$ journalctl -k -o short-precise | grep -i e1000e
Apr 11 23:07:48.553259 x60 kernel: e1000e: Intel(R) PRO/1000 Network 
Driver - 3.2.6-k
Apr 11 23:07:48.553319 x60 kernel: e1000e: Copyright(c) 1999 - 2015 
Intel Corporation.
Apr 11 23:07:48.553499 x60 kernel: e1000e :01:00.0: Interrupt 
Throttling Rate (ints/sec) set to dynamic conservative mode
Apr 11 23:07:48.577368 x60 kernel: e1000e :01:00.0 eth0: (PCI 
Express:2.5GT/s:Width x1) 00:16:d3:b8:e3:49
Apr 11 23:07:48.577529 x60 kernel: e1000e :01:00.0 eth0: Intel(R) 
PRO/1000 Network Connection
Apr 11 23:07:48.577688 x60 kernel: e1000e :01:00.0 eth0: MAC: 2, 
PHY: 2, PBA No: 005302-003
Apr 11 23:07:48.923514 x60 kernel: e1000e :01:00.0 eth8: renamed 
from eth0

```

Removing the module *e1000e* and loading it again fixes the issue.

```
Apr 11 23:12:01.526318 x60 kernel: e1000e :01:00.0: Disabling ASPM 
L0s L1

Apr 11 23:12:01.687420 x60 kernel: e1000e: eth8 NIC Link is Down
Apr 11 23:12:08.29 x60 kernel: e1000e: Intel(R) PRO/1000 Network 
Driver - 3.2.6-k
Apr 11 23:12:08.295737 x60 kernel: e1000e: Copyright(c) 1999 - 2015 
Intel Corporation.
Apr 11 23:12:08.296006 x60 kernel: e1000e :01:00.0: Interrupt 
Throttling Rate (ints/sec) set to dynamic conservative mode
Apr 11 23:12:08.421070 x60 kernel: e1000e :01:00.0 eth0: (PCI 
Express:2.5GT/s:Width x1) 00:16:d3:b8:e3:49
Apr 11 23:12:08.421389 x60 kernel: e1000e :01:00.0 eth0: Intel(R) 
PRO/1000 Network Connection
Apr 11 23:12:08.421642 x60 kernel: e1000e :01:00.0 eth0: MAC: 2, 
PHY: 2, PBA No: 005302-003
Apr 11 23:12:08.423534 x60 kernel: e1000e :01:00.0 eth8: renamed 
from eth0
Apr 11 23:12:11.609873 x60 kernel: e1000e: eth8 NIC Link is Up 1000 Mbps 
Full Duplex, Flow Control: Rx/Tx

```

Do you know, if that is a known problem? What other information should 
be added to a bug report?



Kind regards,

Paul
___
Intel-wired-lan mailing list
intel-wired-...@lists.osuosl.org
http://lists.osuosl.org/mailman/listinfo/intel-wired-lan
--- End Message ---


smime.p7s
Description: S/MIME Cryptographic Signature


[PATCH bpf-next] samples/bpf: adjust rlimit RLIMIT_MEMLOCK for sampleip

2017-12-21 Thread Prashant Bhole
The default memlock rlimit is 64KB, which causes failure in
creating a map

For example:
test@test# ./sampleip
failed to create a map: 1 Operation not permitted
ERROR: loading BPF program (errno 1):
Try: ulimit -l unlimited

Signed-off-by: Prashant Bhole 
---
 samples/bpf/sampleip_user.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/samples/bpf/sampleip_user.c b/samples/bpf/sampleip_user.c
index 4ed690b907ff..f240a7db7c0a 100644
--- a/samples/bpf/sampleip_user.c
+++ b/samples/bpf/sampleip_user.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "libbpf.h"
 #include "bpf_load.h"
 #include "perf-sys.h"
@@ -132,8 +133,9 @@ static void int_exit(int sig)
 
 int main(int argc, char **argv)
 {
-   char filename[256];
int *pmu_fd, opt, freq = DEFAULT_FREQ, secs = DEFAULT_SECS;
+   struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
+   char filename[256];
 
/* process arguments */
while ((opt = getopt(argc, argv, "F:h")) != -1) {
@@ -154,6 +156,11 @@ int main(int argc, char **argv)
return 1;
}
 
+   if (setrlimit(RLIMIT_MEMLOCK, &r)) {
+   perror("Failed to set memlock rlimit");
+   return 1;
+   }
+
/* initialize kernel symbol translation */
if (load_kallsyms()) {
fprintf(stderr, "ERROR: loading /proc/kallsyms\n");
@@ -171,12 +178,8 @@ int main(int argc, char **argv)
/* load BPF program */
snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
if (load_bpf_file(filename)) {
-   fprintf(stderr, "ERROR: loading BPF program (errno %d):\n",
-   errno);
-   if (strcmp(bpf_log_buf, "") == 0)
-   fprintf(stderr, "Try: ulimit -l unlimited\n");
-   else
-   fprintf(stderr, "%s", bpf_log_buf);
+   fprintf(stderr, "ERROR: loading BPF program (errno %d): %s\n",
+   errno, bpf_log_buf);
return 1;
}
signal(SIGINT, int_exit);
-- 
2.13.6




Re: [PATCH v3 ipsec-next 0/3] xfrm: offload api fixes

2017-12-21 Thread Steffen Klassert
On Tue, Dec 19, 2017 at 03:35:46PM -0800, Shannon Nelson wrote:
> These are a couple of little fixes to the xfrm_offload API to make
> life just a little easier for the poor driver developer.
> 
> Changes from v2:
>  - fix up another kbuild robot complaint when CONFIG_XFRM_OFFLOAD is off
>  - split out checks into a common function for register and feature check
> 
> Changes from v1:
>  - removed netdev_err() notes  (Steffen)
>  - fixed build when CONFIG_XFRM_OFFLOAD is off (kbuild robot)
>  - split into multiple patches (me)
> 
> 
> Shannon Nelson (3):
>   xfrm: check for xdo_dev_state_free
>   xfrm: check for xdo_dev_ops add and delete
>   xfrm: wrap xfrmdev_ops with offload config

All applied to ipsec-next, thanks Shannon!


[PATCH v4 32/36] xfrm: Replace hrtimer tasklet with softirq hrtimer

2017-12-21 Thread Anna-Maria Gleixner
From: Thomas Gleixner 

Switch the timer to HRTIMER_MODE_SOFT, which executed the timer
callback in softirq context and remove the hrtimer_tasklet.

Signed-off-by: Thomas Gleixner 
Signed-off-by: Anna-Maria Gleixner 
Cc: Steffen Klassert 
Cc: netdev@vger.kernel.org
Cc: Herbert Xu 
Cc: "David S. Miller" 
---
 include/net/xfrm.h|  2 +-
 net/xfrm/xfrm_state.c | 30 ++
 2 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index dc28a98ce97c..e706ec81bd14 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -217,7 +217,7 @@ struct xfrm_state {
struct xfrm_stats   stats;
 
struct xfrm_lifetime_cur curlft;
-   struct tasklet_hrtimer  mtimer;
+   struct hrtimer  mtimer;
 
struct xfrm_state_offload xso;
 
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 065d89606888..4be5fc7038af 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -426,7 +426,7 @@ static void xfrm_put_mode(struct xfrm_mode *mode)
 
 static void xfrm_state_gc_destroy(struct xfrm_state *x)
 {
-   tasklet_hrtimer_cancel(&x->mtimer);
+   hrtimer_cancel(&x->mtimer);
del_timer_sync(&x->rtimer);
kfree(x->aead);
kfree(x->aalg);
@@ -471,8 +471,8 @@ static void xfrm_state_gc_task(struct work_struct *work)
 
 static enum hrtimer_restart xfrm_timer_handler(struct hrtimer *me)
 {
-   struct tasklet_hrtimer *thr = container_of(me, struct tasklet_hrtimer, 
timer);
-   struct xfrm_state *x = container_of(thr, struct xfrm_state, mtimer);
+   struct xfrm_state *x = container_of(me, struct xfrm_state, mtimer);
+   enum hrtimer_restart ret = HRTIMER_NORESTART;
unsigned long now = get_seconds();
long next = LONG_MAX;
int warn = 0;
@@ -536,7 +536,8 @@ static enum hrtimer_restart xfrm_timer_handler(struct 
hrtimer *me)
km_state_expired(x, 0, 0);
 resched:
if (next != LONG_MAX) {
-   tasklet_hrtimer_start(&x->mtimer, ktime_set(next, 0), 
HRTIMER_MODE_REL);
+   hrtimer_forward_now(&x->mtimer, ktime_set(next, 0));
+   ret = HRTIMER_RESTART;
}
 
goto out;
@@ -553,7 +554,7 @@ static enum hrtimer_restart xfrm_timer_handler(struct 
hrtimer *me)
 
 out:
spin_unlock(&x->lock);
-   return HRTIMER_NORESTART;
+   return ret;
 }
 
 static void xfrm_replay_timer_handler(struct timer_list *t);
@@ -572,8 +573,8 @@ struct xfrm_state *xfrm_state_alloc(struct net *net)
INIT_HLIST_NODE(&x->bydst);
INIT_HLIST_NODE(&x->bysrc);
INIT_HLIST_NODE(&x->byspi);
-   tasklet_hrtimer_init(&x->mtimer, xfrm_timer_handler,
-   CLOCK_BOOTTIME, HRTIMER_MODE_ABS);
+   hrtimer_init(&x->mtimer, CLOCK_BOOTTIME, HRTIMER_MODE_ABS_SOFT);
+   x->mtimer.function = xfrm_timer_handler;
timer_setup(&x->rtimer, xfrm_replay_timer_handler, 0);
x->curlft.add_time = get_seconds();
x->lft.soft_byte_limit = XFRM_INF;
@@ -1029,7 +1030,9 @@ xfrm_state_find(const xfrm_address_t *daddr, const 
xfrm_address_t *saddr,
hlist_add_head_rcu(&x->byspi, 
net->xfrm.state_byspi + h);
}
x->lft.hard_add_expires_seconds = 
net->xfrm.sysctl_acq_expires;
-   tasklet_hrtimer_start(&x->mtimer, 
ktime_set(net->xfrm.sysctl_acq_expires, 0), HRTIMER_MODE_REL);
+   hrtimer_start(&x->mtimer,
+ ktime_set(net->xfrm.sysctl_acq_expires, 
0),
+ HRTIMER_MODE_REL_SOFT);
net->xfrm.state_num++;
xfrm_hash_grow_check(net, x->bydst.next != NULL);
spin_unlock_bh(&net->xfrm.xfrm_state_lock);
@@ -1140,7 +1143,7 @@ static void __xfrm_state_insert(struct xfrm_state *x)
hlist_add_head_rcu(&x->byspi, net->xfrm.state_byspi + h);
}
 
-   tasklet_hrtimer_start(&x->mtimer, ktime_set(1, 0), HRTIMER_MODE_REL);
+   hrtimer_start(&x->mtimer, ktime_set(1, 0), HRTIMER_MODE_REL_SOFT);
if (x->replay_maxage)
mod_timer(&x->rtimer, jiffies + x->replay_maxage);
 
@@ -1244,7 +1247,9 @@ static struct xfrm_state *__find_acq_core(struct net *net,
x->mark.m = m->m;
x->lft.hard_add_expires_seconds = net->xfrm.sysctl_acq_expires;
xfrm_state_hold(x);
-   tasklet_hrtimer_start(&x->mtimer, 
ktime_set(net->xfrm.sysctl_acq_expires, 0), HRTIMER_MODE_REL);
+   hrtimer_start(&x->mtimer,
+ ktime_set(net->xfrm.sysctl_acq_expires, 0),
+ HRTIMER_MODE_REL_SOFT);
list_add(&x->km.all, &net->xfrm.state_all);
hlist_add_head_rcu(&x->bydst, net->xfrm.stat

[PATCH v4 36/36] net/mvpp2: Replace tasklet with softirq hrtimer

2017-12-21 Thread Anna-Maria Gleixner
From: Thomas Gleixner 

The tx_done_tasklet tasklet is used in invoke the hrtimer
(mvpp2_hr_timer_cb) in softirq context. This can be also achieved without
the tasklet but with HRTIMER_MODE_SOFT as hrtimer mode.

Signed-off-by: Thomas Gleixner 
Signed-off-by: Anna-Maria Gleixner 
Cc: Thomas Petazzoni 
Cc: netdev@vger.kernel.org
Cc: "David S. Miller" 
---
 drivers/net/ethernet/marvell/mvpp2.c | 62 +++-
 1 file changed, 25 insertions(+), 37 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2.c 
b/drivers/net/ethernet/marvell/mvpp2.c
index 634b2f41cc9e..41f12961e4d1 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -901,9 +901,8 @@ struct mvpp2_pcpu_stats {
 /* Per-CPU port control */
 struct mvpp2_port_pcpu {
struct hrtimer tx_done_timer;
+   struct net_device *dev;
bool timer_scheduled;
-   /* Tasklet for egress finalization */
-   struct tasklet_struct tx_done_tasklet;
 };
 
 struct mvpp2_queue_vector {
@@ -6156,46 +6155,34 @@ static void mvpp2_link_event(struct net_device *dev)
}
 }
 
-static void mvpp2_timer_set(struct mvpp2_port_pcpu *port_pcpu)
-{
-   ktime_t interval;
-
-   if (!port_pcpu->timer_scheduled) {
-   port_pcpu->timer_scheduled = true;
-   interval = MVPP2_TXDONE_HRTIMER_PERIOD_NS;
-   hrtimer_start(&port_pcpu->tx_done_timer, interval,
- HRTIMER_MODE_REL_PINNED);
-   }
-}
-
-static void mvpp2_tx_proc_cb(unsigned long data)
+static enum hrtimer_restart mvpp2_hr_timer_cb(struct hrtimer *timer)
 {
-   struct net_device *dev = (struct net_device *)data;
-   struct mvpp2_port *port = netdev_priv(dev);
-   struct mvpp2_port_pcpu *port_pcpu = this_cpu_ptr(port->pcpu);
+   struct net_device *dev;
+   struct mvpp2_port *port;
+   struct mvpp2_port_pcpu *port_pcpu;
unsigned int tx_todo, cause;
 
+   port_pcpu = container_of(timer, struct mvpp2_port_pcpu, tx_done_timer);
+   dev = port_pcpu->dev;
+
if (!netif_running(dev))
-   return;
+   return HRTIMER_NORESTART;
+
port_pcpu->timer_scheduled = false;
+   port = netdev_priv(dev);
 
/* Process all the Tx queues */
cause = (1 << port->ntxqs) - 1;
tx_todo = mvpp2_tx_done(port, cause, smp_processor_id());
 
/* Set the timer in case not all the packets were processed */
-   if (tx_todo)
-   mvpp2_timer_set(port_pcpu);
-}
-
-static enum hrtimer_restart mvpp2_hr_timer_cb(struct hrtimer *timer)
-{
-   struct mvpp2_port_pcpu *port_pcpu = container_of(timer,
-struct mvpp2_port_pcpu,
-tx_done_timer);
-
-   tasklet_schedule(&port_pcpu->tx_done_tasklet);
+   if (tx_todo && !port_pcpu->timer_scheduled) {
+   port_pcpu->timer_scheduled = true;
+   hrtimer_forward_now(&port_pcpu->tx_done_timer,
+   MVPP2_TXDONE_HRTIMER_PERIOD_NS);
 
+   return HRTIMER_RESTART;
+   }
return HRTIMER_NORESTART;
 }
 
@@ -6673,7 +6660,12 @@ static int mvpp2_tx(struct sk_buff *skb, struct 
net_device *dev)
txq_pcpu->count > 0) {
struct mvpp2_port_pcpu *port_pcpu = this_cpu_ptr(port->pcpu);
 
-   mvpp2_timer_set(port_pcpu);
+   if (!port_pcpu->timer_scheduled) {
+   port_pcpu->timer_scheduled = true;
+   hrtimer_start(&port_pcpu->tx_done_timer,
+ MVPP2_TXDONE_HRTIMER_PERIOD_NS,
+ HRTIMER_MODE_REL_PINNED_SOFT);
+   }
}
 
return NETDEV_TX_OK;
@@ -7108,7 +7100,6 @@ static int mvpp2_stop(struct net_device *dev)
 
hrtimer_cancel(&port_pcpu->tx_done_timer);
port_pcpu->timer_scheduled = false;
-   tasklet_kill(&port_pcpu->tx_done_tasklet);
}
}
mvpp2_cleanup_rxqs(port);
@@ -7899,13 +7890,10 @@ static int mvpp2_port_probe(struct platform_device 
*pdev,
port_pcpu = per_cpu_ptr(port->pcpu, cpu);
 
hrtimer_init(&port_pcpu->tx_done_timer, CLOCK_MONOTONIC,
-HRTIMER_MODE_REL_PINNED);
+HRTIMER_MODE_REL_PINNED_SOFT);
port_pcpu->tx_done_timer.function = mvpp2_hr_timer_cb;
port_pcpu->timer_scheduled = false;
-
-   tasklet_init(&port_pcpu->tx_done_tasklet,
-mvpp2_tx_proc_cb,
-(unsigned long)dev);
+   port_pcpu->dev = dev;
}
}
 
-- 
2.11.0



Re: [PATCH net-next] qed*: Utilize FW 8.33.1.0

2017-12-21 Thread kbuild test robot
Hi Tomer,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Tomer-Tayar/qed-Utilize-FW-8-33-1-0/20171221-180506
config: xtensa-allmodconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=xtensa 

All warnings (new ones prefixed by >>):

   In file included from include/linux/printk.h:7:0,
from include/linux/kernel.h:14,
from include/asm-generic/bug.h:18,
from ./arch/xtensa/include/generated/asm/bug.h:1,
from include/linux/bug.h:5,
from include/linux/io.h:23,
from drivers/net//ethernet/qlogic/qed/qed_hw.c:34:
   drivers/net//ethernet/qlogic/qed/qed_hw.c: In function 'qed_dmae_sanity':
>> include/linux/kern_levels.h:5:18: warning: format '%llx' expects argument of 
>> type 'long long unsigned int', but argument 6 has type 'dma_addr_t {aka 
>> unsigned int}' [-Wformat=]
#define KERN_SOH "\001"  /* ASCII Start Of Header */
 ^
   include/linux/kern_levels.h:13:21: note: in expansion of macro 'KERN_SOH'
#define KERN_NOTICE KERN_SOH "5" /* normal but significant condition */
^~~~
>> include/linux/printk.h:306:9: note: in expansion of macro 'KERN_NOTICE'
 printk(KERN_NOTICE pr_fmt(fmt), ##__VA_ARGS__)
^~~
>> include/linux/qed/qed_if.h:761:4: note: in expansion of macro 'pr_notice'
   pr_notice("[%s:%d(%s)]" fmt,\
   ^
>> drivers/net//ethernet/qlogic/qed/qed_hw.c:866:4: note: in expansion of macro 
>> 'DP_NOTICE'
   DP_NOTICE(p_hwfn,
   ^
   drivers/net//ethernet/qlogic/qed/qed_hw.c:867:42: note: format string is 
defined here
  "DMAE sanity [%s]: addr={phys 0x%llx, virt %p}, read_val 0x%08x, 
expected_val 0x%08x\n",
  ~~~^
  %x
--
   In file included from include/linux/printk.h:7:0,
from include/linux/kernel.h:14,
from include/asm-generic/bug.h:18,
from ./arch/xtensa/include/generated/asm/bug.h:1,
from include/linux/bug.h:5,
from include/linux/io.h:23,
from drivers/net/ethernet/qlogic/qed/qed_hw.c:34:
   drivers/net/ethernet/qlogic/qed/qed_hw.c: In function 'qed_dmae_sanity':
>> include/linux/kern_levels.h:5:18: warning: format '%llx' expects argument of 
>> type 'long long unsigned int', but argument 6 has type 'dma_addr_t {aka 
>> unsigned int}' [-Wformat=]
#define KERN_SOH "\001"  /* ASCII Start Of Header */
 ^
   include/linux/kern_levels.h:13:21: note: in expansion of macro 'KERN_SOH'
#define KERN_NOTICE KERN_SOH "5" /* normal but significant condition */
^~~~
>> include/linux/printk.h:306:9: note: in expansion of macro 'KERN_NOTICE'
 printk(KERN_NOTICE pr_fmt(fmt), ##__VA_ARGS__)
^~~
>> include/linux/qed/qed_if.h:761:4: note: in expansion of macro 'pr_notice'
   pr_notice("[%s:%d(%s)]" fmt,\
   ^
   drivers/net/ethernet/qlogic/qed/qed_hw.c:866:4: note: in expansion of macro 
'DP_NOTICE'
   DP_NOTICE(p_hwfn,
   ^
   drivers/net/ethernet/qlogic/qed/qed_hw.c:867:42: note: format string is 
defined here
  "DMAE sanity [%s]: addr={phys 0x%llx, virt %p}, read_val 0x%08x, 
expected_val 0x%08x\n",
  ~~~^
  %x

vim +5 include/linux/kern_levels.h

314ba352 Joe Perches 2012-07-30  4  
04d2c8c8 Joe Perches 2012-07-30 @5  #define KERN_SOH"\001"  /* 
ASCII Start Of Header */
04d2c8c8 Joe Perches 2012-07-30  6  #define KERN_SOH_ASCII  '\001'
04d2c8c8 Joe Perches 2012-07-30  7  

:: The code at line 5 was first introduced by commit
:: 04d2c8c83d0e3ac5f78aeede51babb3236200112 printk: convert the format for 
KERN_ to a 2 byte pattern

:: TO: Joe Perches 
:: CC: Linus Torvalds 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


[PATCH net-next v7 2/2] net: ethernet: socionext: add AVE ethernet driver

2017-12-21 Thread Kunihiko Hayashi
The UniPhier platform from Socionext provides the AVE ethernet
controller that includes MAC and MDIO bus supporting RGMII/RMII
modes. The controller is named AVE.

Signed-off-by: Kunihiko Hayashi 
Signed-off-by: Jassi Brar 
Reviewed-by: Andrew Lunn 
---
 drivers/net/ethernet/Kconfig |1 +
 drivers/net/ethernet/Makefile|1 +
 drivers/net/ethernet/socionext/Kconfig   |   22 +
 drivers/net/ethernet/socionext/Makefile  |5 +
 drivers/net/ethernet/socionext/sni_ave.c | 1736 ++
 5 files changed, 1765 insertions(+)
 create mode 100644 drivers/net/ethernet/socionext/Kconfig
 create mode 100644 drivers/net/ethernet/socionext/Makefile
 create mode 100644 drivers/net/ethernet/socionext/sni_ave.c

diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index c604213..d50519e 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -170,6 +170,7 @@ source "drivers/net/ethernet/sis/Kconfig"
 source "drivers/net/ethernet/sfc/Kconfig"
 source "drivers/net/ethernet/sgi/Kconfig"
 source "drivers/net/ethernet/smsc/Kconfig"
+source "drivers/net/ethernet/socionext/Kconfig"
 source "drivers/net/ethernet/stmicro/Kconfig"
 source "drivers/net/ethernet/sun/Kconfig"
 source "drivers/net/ethernet/tehuti/Kconfig"
diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile
index 39f62733..6cf5ade 100644
--- a/drivers/net/ethernet/Makefile
+++ b/drivers/net/ethernet/Makefile
@@ -82,6 +82,7 @@ obj-$(CONFIG_SFC) += sfc/
 obj-$(CONFIG_SFC_FALCON) += sfc/falcon/
 obj-$(CONFIG_NET_VENDOR_SGI) += sgi/
 obj-$(CONFIG_NET_VENDOR_SMSC) += smsc/
+obj-$(CONFIG_NET_VENDOR_SOCIONEXT) += socionext/
 obj-$(CONFIG_NET_VENDOR_STMICRO) += stmicro/
 obj-$(CONFIG_NET_VENDOR_SUN) += sun/
 obj-$(CONFIG_NET_VENDOR_TEHUTI) += tehuti/
diff --git a/drivers/net/ethernet/socionext/Kconfig 
b/drivers/net/ethernet/socionext/Kconfig
new file mode 100644
index 000..3a1829e
--- /dev/null
+++ b/drivers/net/ethernet/socionext/Kconfig
@@ -0,0 +1,22 @@
+config NET_VENDOR_SOCIONEXT
+   bool "Socionext ethernet drivers"
+   default y
+   ---help---
+ Option to select ethernet drivers for Socionext platforms.
+
+ Note that the answer to this question doesn't directly affect the
+ kernel: saying N will just cause the configurator to skip all
+ the questions about Socionext devices. If you say Y, you will be asked
+ for your specific card in the following questions.
+
+if NET_VENDOR_SOCIONEXT
+
+config SNI_AVE
+   tristate "Socionext AVE ethernet support"
+   depends on (ARCH_UNIPHIER || COMPILE_TEST) && OF
+   select PHYLIB
+   ---help---
+ Driver for gigabit ethernet MACs, called AVE, in the
+ Socionext UniPhier family.
+
+endif #NET_VENDOR_SOCIONEXT
diff --git a/drivers/net/ethernet/socionext/Makefile 
b/drivers/net/ethernet/socionext/Makefile
new file mode 100644
index 000..ab83df6
--- /dev/null
+++ b/drivers/net/ethernet/socionext/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Makefile for all ethernet ip drivers on Socionext platforms
+#
+obj-$(CONFIG_SNI_AVE) += sni_ave.o
diff --git a/drivers/net/ethernet/socionext/sni_ave.c 
b/drivers/net/ethernet/socionext/sni_ave.c
new file mode 100644
index 000..0925675
--- /dev/null
+++ b/drivers/net/ethernet/socionext/sni_ave.c
@@ -0,0 +1,1736 @@
+// SPDX-License-Identifier: GPL-2.0
+/**
+ * sni_ave.c - Socionext UniPhier AVE ethernet driver
+ * Copyright 2014 Panasonic Corporation
+ * Copyright 2015-2017 Socionext Inc.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* General Register Group */
+#define AVE_IDR0x000   /* ID */
+#define AVE_VR 0x004   /* Version */
+#define AVE_GRR0x008   /* Global Reset */
+#define AVE_CFGR   0x00c   /* Configuration */
+
+/* Interrupt Register Group */
+#define AVE_GIMR   0x100   /* Global Interrupt Mask */
+#define AVE_GISR   0x104   /* Global Interrupt Status */
+
+/* MAC Register Group */
+#define AVE_TXCR   0x200   /* TX Setup */
+#define AVE_RXCR   0x204   /* RX Setup */
+#define AVE_RXMAC1R0x208   /* MAC address (lower) */
+#define AVE_RXMAC2R0x20c   /* MAC address (upper) */
+#define AVE_MDIOCTR0x214   /* MDIO Control */
+#define AVE_MDIOAR 0x218   /* MDIO Address */
+#define AVE_MDIOWDR0x21c   /* MDIO Data */
+#define AVE_MDIOSR 0x220   /* MDIO Status */
+#define AVE_MDIORDR0x224   /* MDIO Rd Data */
+
+/* Descriptor Control Register Group */
+#define AVE_DESCC  0x300   /* Descriptor Control */
+#define AVE_TXDC   0x304   /* TX Descriptor Configuration */
+#define AVE_RXDC0  0x308   /* 

[PATCH net-next v7 1/2] dt-bindings: net: add DT bindings for Socionext UniPhier AVE

2017-12-21 Thread Kunihiko Hayashi
DT bindings for the AVE ethernet controller found on Socionext's
UniPhier platforms.

Signed-off-by: Kunihiko Hayashi 
Signed-off-by: Jassi Brar 
Acked-by: Rob Herring 
---
 .../bindings/net/socionext,uniphier-ave4.txt   | 45 ++
 1 file changed, 45 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt

diff --git a/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt 
b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
new file mode 100644
index 000..c73a6f2
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
@@ -0,0 +1,45 @@
+* Socionext AVE ethernet controller
+
+This describes the devicetree bindings for AVE ethernet controller
+implemented on Socionext UniPhier SoCs.
+
+Required properties:
+ - compatible: Should be
+   - "socionext,uniphier-pro4-ave4" : for Pro4 SoC
+   - "socionext,uniphier-pxs2-ave4" : for PXs2 SoC
+   - "socionext,uniphier-ld11-ave4" : for LD11 SoC
+   - "socionext,uniphier-ld20-ave4" : for LD20 SoC
+ - reg: Address where registers are mapped and size of region.
+ - interrupts: Should contain the MAC interrupt.
+ - phy-mode: See ethernet.txt in the same directory. Allow to choose
+   "rgmii", "rmii", or "mii" according to the PHY.
+ - phy-handle: Should point to the external phy device.
+   See ethernet.txt file in the same directory.
+ - clocks: A phandle to the clock for the MAC.
+
+Optional properties:
+ - resets: A phandle to the reset control for the MAC
+ - local-mac-address: See ethernet.txt in the same directory.
+
+Required subnode:
+ - mdio: Device tree subnode with the following required properties:
+
+Example:
+
+   ether: ethernet@6500 {
+   compatible = "socionext,uniphier-ld20-ave4";
+   reg = <0x6500 0x8500>;
+   interrupts = <0 66 4>;
+   phy-mode = "rgmii";
+   phy-handle = <ðphy>;
+   clocks = <&sys_clk 6>;
+   resets = <&sys_rst 6>;
+   local-mac-address = [00 00 00 00 00 00];
+   mdio {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   ethphy: ethphy@1 {
+   reg = <1>;
+   };
+   };
+   };
-- 
2.7.4



[PATCH net-next v7 0/2] add UniPhier AVE ethernet support

2017-12-21 Thread Kunihiko Hayashi
This series adds support for Socionext AVE ethernet controller implemented
on UniPhier SoCs. This driver supports RGMII/RMII modes.

v6: https://www.spinics.net/lists/netdev/msg472133.html

The PHY patch included in v1 has already separated in:
http://www.spinics.net/lists/netdev/msg454595.html

Changes since v6:
- sort the order of local variables from longest to shortest line
- fix ave_probe() which calls register_netdev() at the end of initialization
- dt-bindings: remove phy node descriptions in mdio node

Changes since v5:
- replace license boilerplate with SPDX Identifier
- remove inline directives and an unused function

Changes since v4:
- fix larger integer warning on AVE_PFMBYTE_MASK0

Changes since v3:
- remove checking dma address and use dma_set_mask() to restirct address
- replace ave_mdio_busywait() with read_poll_timeout()
- replace functions to access to registers with readl/writel() directly
- replace a function to access to macaddr with ave_hw_write_macaddr()
- change return value of ave_dma_map() to error value
- move mdiobus_unregister() from ave_remove() to ave_uninit()
- eliminate else block at the end of ave_dma_map()
- add mask definitions for packet filter
- sort bitmap definitions in descending order
- add error check to some functions
- rename and sort functions to clear sub-categories
- fix error value consistency
- remove unneeded initializers
- change type of constant arrays

Changes since v2:
- replace clk_get() with devm_clk_get()
- replace reset_control_get() with devm_reset_control_get_optional_shared()
- add error return when the error occurs on the above *_get functions
- sort soc data and compatible strings
- remove clearly obvious comments
- modify dt-bindings document consistent with these modifications

Changes since v1:
- add/remove devicetree properties and sub-node
  - remove "internal-phy-interrupt" and "desc-bits" property
  - add SoC data structures based on compatible strings
  - add node operation to apply "mdio" sub-node
- add support for features
  - add support for {get,set}_pauseparam and pause frame operations
  - add support for ndo_get_stats64 instead of ndo_get_stats
- replace with desiable functions
  - replace check for valid phy_mode with phy_interface{_mode}_is_rgmii()
  - replace phy attach message with phy_attached_info()
  - replace 32bit operation with {upper,lower}_32_bits() on ave_wdesc_addr()
  - replace nway_reset and get_link with generic functions
- move operations to proper functions
  - move phy_start_aneg() to ndo_open,
and remove unnecessary PHY interrupt operations
See http://www.spinics.net/lists/netdev/msg454590.html
  - move irq initialization and descriptor memory allocation to ndo_open
  - move initialization of reset and clock and mdiobus to ndo_init
- fix skbuffer operations
  - fix skb alignment operations and add Rx buffer adjustment for descriptor
See http://www.spinics.net/lists/netdev/msg456014.html
  - add error returns when dma_map_single() failed 
- clean up code structures
  - clean up wait-loop and wake-queue conditions
  - add ave_wdesc_addr() and offset definitions
  - add ave_macaddr_init() to clean up mac-address operation
  - fix checking whether Tx entry is not enough
  - fix supported features of phydev
  - add necessary free/disable operations
  - add phydev check on ave_{get,set}_wol()
  - remove netif_carrier functions, phydev initializer, and Tx budget check
- change obsolate codes
  - replace ndev->{base_addr,irq} with the members of ave_private
- rename goto labels and mask definitions, and remove unused codes

Kunihiko Hayashi (2):
  dt-bindings: net: add DT bindings for Socionext UniPhier AVE
  net: ethernet: socionext: add AVE ethernet driver

 .../bindings/net/socionext,uniphier-ave4.txt   |   45 +
 drivers/net/ethernet/Kconfig   |1 +
 drivers/net/ethernet/Makefile  |1 +
 drivers/net/ethernet/socionext/Kconfig |   22 +
 drivers/net/ethernet/socionext/Makefile|5 +
 drivers/net/ethernet/socionext/sni_ave.c   | 1736 
 6 files changed, 1810 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
 create mode 100644 drivers/net/ethernet/socionext/Kconfig
 create mode 100644 drivers/net/ethernet/socionext/Makefile
 create mode 100644 drivers/net/ethernet/socionext/sni_ave.c

-- 
2.7.4



Re: [PATCH net-next v7 1/2] dt-bindings: net: add DT bindings for Socionext UniPhier AVE

2017-12-21 Thread Andrew Lunn
> +Optional properties:
> + - resets: A phandle to the reset control for the MAC
> + - local-mac-address: See ethernet.txt in the same directory.
> +
> +Required subnode:
> + - mdio: Device tree subnode with the following required properties:
> +
> +Example:

It sounds like there should be some properties before the Example.

   Andrew

> +
> + ether: ethernet@6500 {
> + compatible = "socionext,uniphier-ld20-ave4";
> + reg = <0x6500 0x8500>;
> + interrupts = <0 66 4>;
> + phy-mode = "rgmii";
> + phy-handle = <ðphy>;
> + clocks = <&sys_clk 6>;
> + resets = <&sys_rst 6>;
> + local-mac-address = [00 00 00 00 00 00];

Typically you would put a blank line here, before the mdio node.

> + mdio {
> + #address-cells = <1>;
> + #size-cells = <0>;
> + ethphy: ethphy@1 {
> + reg = <1>;
> + };
> + };
> + };

  Andrew


Re: [PATCH net-next] phylink: avoid attaching more than one PHY

2017-12-21 Thread Andrew Lunn
On Wed, Dec 20, 2017 at 11:23:33PM +, Russell King wrote:
> Attaching more than one PHY to phylink is bad news, as we store a
> pointer to the PHY in a single location. Error out if more than one
> PHY is attempted to be attached.
> 
> Signed-off-by: Russell King 

Reviewed-by: Andrew Lunn 

Andrew


Problems with 82579LM devices in Dell OptiPlex990/06D7TR

2017-12-21 Thread Paul Menzel

Dear Intel folks,


Changing our login setup, we noticed sometimes the RPC time-out 
`do_ypcall: clnt_call: RPC: Timed out` when calling `getspnam()` on 
exactly one type of machines, Dell OptiPlex 990, with the ethernet 
device 82579LM.


Please find DMI info fetched with `dmicode -s …`, where the string is in 
the order below, and the values separated by a hash.


bios-vendor bios-version system-manufacturer system-product-name 
system-version baseboard-product-name baseboard-version


Dell Inc.#A11#Dell Inc.#OptiPlex 990#01#06D7TR#A01
Dell Inc.#A11#Dell Inc.#OptiPlex 990#01#06D7TR#A01
Dell Inc.#A10#Dell Inc.#OptiPlex 990#01#06D7TR#A01
Dell Inc.#A07#Dell Inc.#OptiPlex 990#01#06D7TR#A00
Dell Inc.#A07#Dell Inc.#OptiPlex 990#01#06D7TR#A00
Dell Inc.#A07#Dell Inc.#OptiPlex 990#01#06D7TR#A00
Dell Inc.#A07#Dell Inc.#OptiPlex 990#01#06D7TR#A00
Dell Inc.#A07#Dell Inc.#OptiPlex 990#01#06D7TR#A00
Dell Inc.#A07#Dell Inc.#OptiPlex 990#01#06D7TR#A00
Dell Inc.#A05#Dell Inc.#OptiPlex 990#01#06D7TR#A00
Dell Inc.#A05#Dell Inc.#OptiPlex 990#01#06D7TR#A00
Dell Inc.#A03#Dell Inc.#OptiPlex 990#01#06D7TR#A00

The machines are of course out of warranty, and we guess that the 
problem was there all along (reproducible with Linux 4.4 and 4.9), but 
wasn’t noticed until now. We suspect it’s a hardware problem, as the PBA 
number of these devices is all *E041FF-0FF*. Other systems also use the 
device, but the problem is not reproducible there, for example *1011FF-0FF*.


Anyway, before digging more into this, could you please take a look into 
your internal error documents if there is a problem with that special 
82579LM assembly?


```
[1.458794] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k
[1.459053] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[1.459414] e1000e :00:19.0: Interrupt Throttling Rate (ints/sec) 
set to dynamic conservative mode

[1.722400] e1000e :00:19.0 eth0: registered PHC clock
[1.722653] e1000e :00:19.0 eth0: (PCI Express:2.5GT/s:Width x1) 
90:b1:1c:79:7b:04
[1.723072] e1000e :00:19.0 eth0: Intel(R) PRO/1000 Network 
Connection
[1.723403] e1000e :00:19.0 eth0: MAC: 10, PHY: 11, PBA No: 
1011FF-0FF

[   13.693188] e1000e :00:19.0 net00: renamed from eth0
[   16.878762] e1000e: net00 NIC Link is Up 1000 Mbps Full Duplex, Flow 
Control: Rx/Tx

```


Kind regards,

Paul



smime.p7s
Description: S/MIME Cryptographic Signature


Re: null-ptr-deref in tcf_block_put

2017-12-21 Thread Jiri Pirko
Thu, Dec 21, 2017 at 10:39:56AM CET, bhole_prashant...@lab.ntt.co.jp wrote:
>
>Hi,
>Recently I tried tools/testing/selftests/net/rtnetlink.sh with KASAN enabled
>and encountered following BUG.
>
>kernel: ==
>kernel: BUG: KASAN: null-ptr-deref in tcf_block_put+0x8c/0xc0
>kernel: Read of size 8 at addr 0018 by task tc/2966
>kernel:
>kernel: CPU: 0 PID: 2966 Comm: tc Not tainted 4.15.0-rc3+ #24
>kernel: Hardware name: Hewlett-Packard HP Z440 Workstation/212B, BIOS M60
>v02.34 05/18/2017
>kernel: Call Trace:
>kernel:  dump_stack+0xaf/0x127
>kernel:  ? _atomic_dec_and_lock+0x159/0x159
>kernel:  ? tcf_block_put_ext+0x215/0x270
>kernel:  kasan_report+0x15f/0x360
>kernel:  ? tcf_block_put+0x8c/0xc0
>kernel:  tcf_block_put+0x8c/0xc0
>kernel:  ? tcf_block_put_ext+0x270/0x270
>kernel:  ? kfree+0x9c/0x1b0
>kernel:  htb_destroy_class.isra.17+0x54/0x70 [sch_htb]
>kernel:  htb_destroy+0x122/0x200 [sch_htb]
>kernel:  qdisc_destroy+0xa4/0x2a0
>kernel:  ? rtnetlink_send+0x94/0xa0
>kernel:  qdisc_graft+0x530/0x650
>kernel:  tc_get_qdisc+0x235/0x370
>kernel:  ? tc_ctl_tclass+0x5f0/0x5f0
>kernel:  ? security_capable+0x2d/0x70
>kernel:  rtnetlink_rcv_msg+0x69c/0x790
>kernel:  ? rtnl_calcit.isra.26+0x250/0x250
>kernel:  ? depot_save_stack+0x12d/0x470
>kernel:  ? save_stack+0x89/0xb0
>kernel:  ? kasan_kmalloc+0xa0/0xd0
>kernel:  ? __kmalloc_node_track_caller+0x192/0x2d0
>kernel:  ? __kmalloc_reserve.isra.39+0x2e/0x80
>kernel:  ? __alloc_skb+0xf9/0x3a0
>kernel:  ? netlink_sendmsg+0x558/0x680
>kernel:  ? sock_sendmsg+0x6b/0x80
>kernel:  ? ___sys_sendmsg+0x49a/0x500
>kernel:  ? __sys_sendmsg+0xb5/0x150
>kernel:  ? entry_SYSCALL_64_fastpath+0x1a/0x7d
>kernel:  ? __alloc_skb+0xc9/0x3a0
>kernel:  ? netlink_sendmsg+0x558/0x680
>kernel:  ? sock_sendmsg+0x6b/0x80
>kernel:  ? ___sys_sendmsg+0x49a/0x500
>kernel:  ? __sys_sendmsg+0xb5/0x150
>kernel:  ? entry_SYSCALL_64_fastpath+0x1a/0x7d
>kernel:  ? lru_cache_add+0x145/0x210
>kernel:  ? lru_cache_add_file+0x10/0x10
>kernel:  ? mem_cgroup_low+0x140/0x140
>kernel:  ? netlink_compare+0x53/0x70
>kernel:  ? __netlink_lookup+0x2d3/0x3e0
>kernel:  ? netlink_broadcast+0x20/0x20
>kernel:  ? memcg_kmem_get_cache+0x4e0/0x4e0
>kernel:  ? netlink_deliver_tap+0x10b/0x530
>kernel:  ? kasan_kmalloc+0xa0/0xd0
>kernel:  ? netlink_has_listeners+0x170/0x170
>kernel:  ? __kmalloc_node_track_caller+0x231/0x2d0
>kernel:  ? iov_iter_advance+0x176/0x7a0
>kernel:  netlink_rcv_skb+0x122/0x230
>kernel:  ? rtnl_calcit.isra.26+0x250/0x250
>kernel:  ? netlink_ack+0x4b0/0x4b0
>kernel:  ? netlink_trim+0x123/0x1c0
>kernel:  ? alloc_pages_vma+0x93/0x260
>kernel:  netlink_unicast+0x2c2/0x360
>kernel:  ? netlink_attachskb+0x3f0/0x3f0
>kernel:  ? import_iovec+0x128/0x1d0
>kernel:  netlink_sendmsg+0x528/0x680
>kernel:  ? netlink_unicast+0x360/0x360
>kernel:  ? netlink_unicast+0x360/0x360
>kernel:  sock_sendmsg+0x6b/0x80
>kernel:  ___sys_sendmsg+0x49a/0x500
>kernel:  ? copy_msghdr_from_user+0x260/0x260
>kernel:  ? netlink_sendmsg+0x2b2/0x680
>kernel:  ? netlink_unicast+0x360/0x360
>kernel:  ? mem_cgroup_from_task+0x9c/0xe0
>kernel:  ? mem_cgroup_reset+0x190/0x190
>kernel:  ? __fget_light+0x17e/0x200
>kernel:  ? expand_files+0x570/0x570
>kernel:  ? handle_mm_fault+0x1ca/0x380
>kernel:  ? __handle_mm_fault+0x1f10/0x1f10
>kernel:  ? vmacache_find+0xe6/0x110
>kernel:  ? __do_page_fault+0x5c5/0x6d0
>kernel:  ? __sys_sendmsg+0xb5/0x150
>kernel:  __sys_sendmsg+0xb5/0x150
>kernel:  ? SyS_shutdown+0x160/0x160
>kernel:  ? kmem_cache_free+0x7c/0x1f0
>kernel:  ? __do_page_fault+0x6d0/0x6d0
>kernel:  ? do_sys_open+0x1f0/0x380
>kernel:  entry_SYSCALL_64_fastpath+0x1a/0x7d
>
>
>
>After some investigation I found this commit:
>[1] https://patchwork.ozlabs.org/patch/833596 which fixed this bug.
>
>But recently accepted commit:
>[2] https://patchwork.ozlabs.org/patch/849101/ reverted it.

Oops. Sending the fix. We need to check in both. 
Thanks!


>
>So I tried same fix in [1] on top of latest net-next. The bug did not
>reproduce.
>
>
>-Prashant
>
>


[net 1/1] tipc: base group replicast ack counter on number of actual receivers

2017-12-21 Thread Jon Maloy
In commit 2f487712b893 ("tipc: guarantee that group broadcast doesn't
bypass group unicast") we introduced a mechanism that requires the first
(replicated) broadcast sent after a unicast to be acknowledged by all
receivers before permitting sending of the next (true) broadcast.

The counter for keeping track of the number of acknowledges to expect
is based on the tipc_group::member_cnt variable. But this misses that
some of the known members may not be ready for reception, and will never
acknowledge the message, either because they haven't fully joined the
group or because they are leaving the group. Such members are identified
by not fulfilling the condition tested for in the function
tipc_group_is_enabled().

We now set the counter for the actual number of acks to receive at the
moment the message is sent, by just counting the number of recipients
satisfying the tipc_group_is_enabled() test.

Signed-off-by: Jon Maloy 
---
 net/tipc/group.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/tipc/group.c b/net/tipc/group.c
index 7ebbdeb..e5b03f0 100644
--- a/net/tipc/group.c
+++ b/net/tipc/group.c
@@ -368,18 +368,20 @@ void tipc_group_update_bc_members(struct tipc_group *grp, 
int len, bool ack)
u16 prev = grp->bc_snd_nxt - 1;
struct tipc_member *m;
struct rb_node *n;
+   u16 ackers = 0;
 
for (n = rb_first(&grp->members); n; n = rb_next(n)) {
m = container_of(n, struct tipc_member, tree_node);
if (tipc_group_is_enabled(m)) {
tipc_group_update_member(m, len);
m->bc_acked = prev;
+   ackers++;
}
}
 
/* Mark number of acknowledges to expect, if any */
if (ack)
-   grp->bc_ackers = grp->member_cnt;
+   grp->bc_ackers = ackers;
grp->bc_snd_nxt++;
 }
 
-- 
2.1.4



Re: [RFC PATCH net-next] tools/bpftool: use version from the kernel source tree

2017-12-21 Thread Roman Gushchin
On Wed, Dec 20, 2017 at 01:52:18PM -0800, Jakub Kicinski wrote:
> On Wed, 20 Dec 2017 20:53:41 +, Roman Gushchin wrote:
> > On Wed, Dec 20, 2017 at 12:29:21PM -0800, Jakub Kicinski wrote:
> > > On Wed, 20 Dec 2017 20:19:43 +, Roman Gushchin wrote:  
> > > > Bpftool determines it's own version based on the kernel
> > > > version, which is picked from the linux/version.h header.
> > > > 
> > > > It's strange to use the version of the installed kernel
> > > > headers, and makes much more sense to use the version
> > > > of the actual source tree, where bpftool sources are.
> > > > 
> > > > This patch adds $(srctree)/usr/include to the list
> > > > of include files, which causes bpftool to use the version
> > > > from the source tree.
> > > > 
> > > > Example:
> > > > before:
> > > > 
> > > > $ bpftool version
> > > > bpftool v4.14.6
> > > > 
> > > > after:
> > > > $ bpftool version
> > > > bpftool v4.15.0  
> > > 
> > > Thanks for the patch, this would indeed use some improvement.
> > > 
> > > How about we just run make to get the version like liblockdep does?
> > > 
> > > LIBLOCKDEP_VERSION=$(shell make --no-print-directory -sC ../../.. 
> > > kernelversion)
> > > 
> > > probably s@../../..@$(srctree)@
> > > 
> > > $(srctree)/usr/include is not going to be there for out-of-source builds. 
> > >  
> > 
> > Hm, why it's better? It's not only about the kernel version,
> > IMO it's generally better to use includes from the source tree,
> > rather then system-wide installed kernel headers.
> 
> Right I agree the kernel headers are preferred.  I'm not entirely sure
> why we don't use them, if it was OK to assume usr/ is there we wouldn't
> need the tools/include/uapi/ contraption.  Maybe Arnaldo could explain?
> 
> > I've got about out-of-source builds, but do we support it in general?
> > How can I build bpftool outside of the kernel tree?
> > I've tried a bit, but failed.
> 
> This is what I do:
> 
> make -C tools/bpf/bpftool/ W=1 O=/tmp/builds/bpftool

This works perfectly with my patch:

$ make -C ~/linux/tools/bpf/ W=1 O=/home/guro/build/ --trace
<...>
echo '  CC   '/home/guro/build/main.o;gcc -O2 -W -Wall -Wextra 
-Wno-unused-parameter -Wshadow -D__EXPORTED_HEADERS__ 
-I/home/guro/linux/tools/include/uapi -I/home/guro/linux/tools/include 
-I/home/guro/linux/tools/lib/bpf -I/home/guro/linux/kernel/bpf/ 
-I/home/guro/linux/usr/include -DNEW_DISSASSEMBLER_SIGNATURE   -c -MMD -o 
/home/guro/build/main.o main.c
<...>
echo '  LINK '/home/guro/build/bpftool;gcc -O2 -W -Wall -Wextra 
-Wno-unused-parameter -Wshadow -D__EXPORTED_HEADERS__ 
-I/home/guro/linux/tools/include/uapi -I/home/guro/linux/tools/include 
-I/home/guro/linux/tools/lib/bpf -I/home/guro/linux/kernel/bpf/ 
-I/home/guro/linux/usr/include -DNEW_DISSASSEMBLER_SIGNATURE -o 
/home/guro/build/bpftool /home/guro/build/common.o /home/guro/build/cgroup.o 
/home/guro/build/main.o /home/guro/build/json_writer.o /home/guro/build/prog.o 
/home/guro/build/map.o /home/guro/build/jit_disasm.o /home/guro/build/disasm.o 
/home/guro/build/libbpf.a -lelf -lbfd -lopcodes /home/guro/build/libbpf.a
  LINK /home/guro/build/bpftool
make[1]: Leaving directory '/home/guro/linux/tools/bpf/bpftool'
make: Leaving directory '/home/guro/linux/tools/bpf'

$ ./build/bpftool version
./build/bpftool v4.15.0

Thanks!


[PATCHv3 0/3] Socionext Synquacer NETSEC driver

2017-12-21 Thread jassisinghbrar
From: Jassi Brar 

Hi,

Changes since v2
# Use 'mdio' subnode in DT bindings.
# Use phy_interface_mode_is_rgmii(), instead of open coding the check.
# Use readl/b with eeprom_base pointer.
# Unregister mdio bus upon failure in probe.

Changes since v1
# Switched from using memremap to ioremap
# Implemented ndo_do_ioctl callback
# Defined optional 'dma-coherent' DT property

Jassi Brar (3):
  dt-bindings: net: Add DT bindings for Socionext Netsec
  net: socionext: Add Synquacer NetSec driver
  MAINTAINERS: Add entry for Socionext ethernet driver

 .../devicetree/bindings/net/socionext-netsec.txt   |   55 +
 MAINTAINERS|7 +
 drivers/net/ethernet/Kconfig   |1 +
 drivers/net/ethernet/Makefile  |1 +
 drivers/net/ethernet/socionext/Kconfig |   29 +
 drivers/net/ethernet/socionext/Makefile|1 +
 drivers/net/ethernet/socionext/netsec.c| 1849 
 7 files changed, 1943 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/socionext-netsec.txt
 create mode 100644 drivers/net/ethernet/socionext/Kconfig
 create mode 100644 drivers/net/ethernet/socionext/Makefile
 create mode 100644 drivers/net/ethernet/socionext/netsec.c

-- 
2.7.4



[PATCHv3 1/3] dt-bindings: net: Add DT bindings for Socionext Netsec

2017-12-21 Thread jassisinghbrar
From: Jassi Brar 

This patch adds documentation for Device-Tree bindings for the
Socionext NetSec Controller driver.

Signed-off-by: Jassi Brar 
Signed-off-by: Ard Biesheuvel 
---
 .../devicetree/bindings/net/socionext-netsec.txt   | 55 ++
 1 file changed, 55 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/socionext-netsec.txt

diff --git a/Documentation/devicetree/bindings/net/socionext-netsec.txt 
b/Documentation/devicetree/bindings/net/socionext-netsec.txt
new file mode 100644
index 000..350540c
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/socionext-netsec.txt
@@ -0,0 +1,57 @@
+* Socionext NetSec Ethernet Controller IP
+
+Required properties:
+- compatible: Should be "socionext,synquacer-netsec"
+- reg: Address and length of the control register area, followed by the
+   address and length of the EEPROM holding the MAC address and
+   microengine firmware
+- interrupts: Should contain ethernet controller interrupt
+- clocks: phandle to the PHY reference clock, and any other clocks to be
+  switched by runtime_pm
+- clock-names: Required only if more than a single clock is listed in 'clocks'.
+   The PHY reference clock must be named 'phy_refclk'
+- phy-mode: See ethernet.txt file in the same directory
+- phy-handle: See ethernet.txt in the same directory.
+
+- mdio device tree subnode: When the Netsec has a phy connected to its local
+   mdio, there must be device tree subnode with the following
+   required properties:
+
+   - compatible: Must be "socionext,snq-mdio".
+   - #address-cells: Must be <1>.
+   - #size-cells: Must be <0>.
+
+   For each phy on the mdio bus, there must be a node with the following
+   fields:
+   - compatible: Refer to phy.txt
+   - reg: phy id used to communicate to phy.
+
+Optional properties: (See ethernet.txt file in the same directory)
+- dma-coherent: Boolean property, must only be present if memory
+   accesses performed by the device are cache coherent.
+- local-mac-address: See ethernet.txt in the same directory.
+- mac-address: See ethernet.txt in the same directory.
+- max-speed: See ethernet.txt in the same directory.
+- max-frame-size: See ethernet.txt in the same directory.
+
+Example:
+   eth0: netsec@522d {
+   compatible = "socionext,synquacer-netsec";
+   reg = <0 0x522d 0x0 0x1>, <0 0x1000 0x0 0x1>;
+   interrupts = ;
+   clocks = <&clk_netsec>;
+   phy-mode = "rgmii";
+   max-speed = <1000>;
+   max-frame-size = <9000>;
+   phy-handle = <&phy1>;
+
+   mdio {
+   compatible = "socionext,snq-mdio";
+   #address-cells = <1>;
+   #size-cells = <0>;
+   phy1: ethernet-phy@1 {
+   compatible = "ethernet-phy-ieee802.3-c22";
+   reg = <1>;
+   };
+   };
+   };
-- 
2.7.4



[PATCHv3 2/3] net: socionext: Add Synquacer NetSec driver

2017-12-21 Thread jassisinghbrar
From: Jassi Brar 

This driver adds support for Socionext "netsec" IP Gigabit
Ethernet + PHY IP used in the Synquacer SC2A11 SoC.

Signed-off-by: Ard Biesheuvel 
Signed-off-by: Jassi Brar 
---
 drivers/net/ethernet/Kconfig|1 +
 drivers/net/ethernet/Makefile   |1 +
 drivers/net/ethernet/socionext/Kconfig  |   29 +
 drivers/net/ethernet/socionext/Makefile |1 +
 drivers/net/ethernet/socionext/netsec.c | 1849 +++
 5 files changed, 1881 insertions(+)
 create mode 100644 drivers/net/ethernet/socionext/Kconfig
 create mode 100644 drivers/net/ethernet/socionext/Makefile
 create mode 100644 drivers/net/ethernet/socionext/netsec.c

diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index c604213..d50519e 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -170,6 +170,7 @@ source "drivers/net/ethernet/sis/Kconfig"
 source "drivers/net/ethernet/sfc/Kconfig"
 source "drivers/net/ethernet/sgi/Kconfig"
 source "drivers/net/ethernet/smsc/Kconfig"
+source "drivers/net/ethernet/socionext/Kconfig"
 source "drivers/net/ethernet/stmicro/Kconfig"
 source "drivers/net/ethernet/sun/Kconfig"
 source "drivers/net/ethernet/tehuti/Kconfig"
diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile
index 39f62733..6cf5ade 100644
--- a/drivers/net/ethernet/Makefile
+++ b/drivers/net/ethernet/Makefile
@@ -82,6 +82,7 @@ obj-$(CONFIG_SFC) += sfc/
 obj-$(CONFIG_SFC_FALCON) += sfc/falcon/
 obj-$(CONFIG_NET_VENDOR_SGI) += sgi/
 obj-$(CONFIG_NET_VENDOR_SMSC) += smsc/
+obj-$(CONFIG_NET_VENDOR_SOCIONEXT) += socionext/
 obj-$(CONFIG_NET_VENDOR_STMICRO) += stmicro/
 obj-$(CONFIG_NET_VENDOR_SUN) += sun/
 obj-$(CONFIG_NET_VENDOR_TEHUTI) += tehuti/
diff --git a/drivers/net/ethernet/socionext/Kconfig 
b/drivers/net/ethernet/socionext/Kconfig
new file mode 100644
index 000..4601c2f
--- /dev/null
+++ b/drivers/net/ethernet/socionext/Kconfig
@@ -0,0 +1,29 @@
+#
+# Socionext Network device configuration
+#
+
+config NET_VENDOR_SOCIONEXT
+   bool "Socionext devices"
+   default y
+   ---help---
+ If you have a network (Ethernet) card belonging to this class, say Y.
+
+ Note that the answer to this question doesn't directly affect the
+ the questions about Socionext cards. If you say Y, you will be asked
+ for your specific card in the following questions.
+
+if NET_VENDOR_SOCIONEXT
+
+config SNI_NETSEC
+   tristate "NETSEC Driver Support"
+   depends on ARCH_SYNQUACER && OF
+   select PHYLIB
+   select MII
+help
+ Enable to add support for the SocioNext NetSec Gigabit Ethernet
+ controller + PHY, as found on the Synquacer SC2A11 SoC
+
+ To compile this driver as a module, choose M here: the module will be
+ called netsec.  If unsure, say N.
+
+endif # NET_VENDOR_SOCIONEXT
diff --git a/drivers/net/ethernet/socionext/Makefile 
b/drivers/net/ethernet/socionext/Makefile
new file mode 100644
index 000..9505923
--- /dev/null
+++ b/drivers/net/ethernet/socionext/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_SNI_NETSEC) += netsec.o
diff --git a/drivers/net/ethernet/socionext/netsec.c 
b/drivers/net/ethernet/socionext/netsec.c
new file mode 100644
index 000..9a9b699
--- /dev/null
+++ b/drivers/net/ethernet/socionext/netsec.c
@@ -0,0 +1,1849 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#define NETSEC_REG_SOFT_RST0x104
+#define NETSEC_REG_COM_INIT0x120
+
+#define NETSEC_REG_TOP_STATUS  0x200
+#define NETSEC_IRQ_RX  BIT(1)
+#define NETSEC_IRQ_TX  BIT(0)
+
+#define NETSEC_REG_TOP_INTEN   0x204
+#define NETSEC_REG_INTEN_SET   0x234
+#define NETSEC_REG_INTEN_CLR   0x238
+
+#define NETSEC_REG_NRM_TX_STATUS   0x400
+#define NETSEC_REG_NRM_TX_INTEN0x404
+#define NETSEC_REG_NRM_TX_INTEN_SET0x428
+#define NETSEC_REG_NRM_TX_INTEN_CLR0x42c
+#define NRM_TX_ST_NTOWNR   BIT(17)
+#define NRM_TX_ST_TR_ERR   BIT(16)
+#define NRM_TX_ST_TXDONE   BIT(15)
+#define NRM_TX_ST_TMREXP   BIT(14)
+
+#define NETSEC_REG_NRM_RX_STATUS   0x440
+#define NETSEC_REG_NRM_RX_INTEN0x444
+#define NETSEC_REG_NRM_RX_INTEN_SET0x468
+#define NETSEC_REG_NRM_RX_INTEN_CLR0x46c
+#define NRM_RX_ST_RC_ERR   BIT(16)
+#define NRM_RX_ST_PKTCNT   BIT(15)
+#define NRM_RX_ST_TMREXP   BIT(14)
+
+#define NETSEC_REG_PKT_CMD_BUF 0xd0
+
+#define NETSEC_REG_CLK_EN  0x100
+
+#define NETSEC_REG_PKT_CTRL0x140
+
+#define NETSEC_REG_DMA_TMR_CTRL0x20c
+#define NETSEC_REG_F_TAIKI_MC_VER  0x22c
+#define NETSEC_REG_F_TAIKI

[PATCHv3 3/3] MAINTAINERS: Add entry for Socionext ethernet driver

2017-12-21 Thread jassisinghbrar
From: Jassi Brar 

Add entry for the Socionext Netsec controller driver and DT bindings.

Acked-by: Ard Biesheuvel 
Signed-off-by: Jassi Brar 
---
 MAINTAINERS | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 9e0045e..0e1f0d4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12630,6 +12630,13 @@ F: drivers/md/raid*
 F: include/linux/raid/
 F: include/uapi/linux/raid/
 
+SOCIONEXT (SNI) NETSEC NETWORK DRIVER
+M: Jassi Brar 
+L: netdev@vger.kernel.org
+S: Maintained
+F: drivers/net/ethernet/socionext/netsec.c
+F: Documentation/devicetree/bindings/net/socionext-netsec.txt
+
 SONIC NETWORK DRIVER
 M: Thomas Bogendoerfer 
 L: netdev@vger.kernel.org
-- 
2.7.4



[patch net] net: sched: fix possible null pointer deref in tcf_block_put

2017-12-21 Thread Jiri Pirko
From: Jiri Pirko 

We need to check block for being null in both tcf_block_put and
tcf_block_put_ext.

Fixes: 343723dd51ef ("net: sched: fix clsact init error path")
Reported-by: Prashant Bhole 
Signed-off-by: Jiri Pirko 
---
 net/sched/cls_api.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index b91ea03..b9d63d2 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -379,6 +379,8 @@ void tcf_block_put(struct tcf_block *block)
 {
struct tcf_block_ext_info ei = {0, };
 
+   if (!block)
+   return;
tcf_block_put_ext(block, block->q, &ei);
 }
 
-- 
2.9.5



Re: [PATCH v4 5/5] flow_dissector: Parse batman-adv unicast headers

2017-12-21 Thread Jiri Pirko
Thu, Dec 21, 2017 at 10:17:42AM CET, sven.eckelm...@openmesh.com wrote:
>The batman-adv unicast packets contain a full layer 2 frame in encapsulated
>form. The flow dissector must therefore be able to parse the batman-adv
>unicast header to reach the layer 2+3 information.
>
>  ++
>  | ip(v6)hdr  |
>  ++
>  | inner ethhdr   |
>  ++
>  | batadv unicast hdr |
>  ++
>  | outer ethhdr   |
>  ++
>
>The obtained information from the upper layer can then be used by RPS to
>schedule the processing on separate cores. This allows better distribution
>of multiple flows from the same neighbor to different cores.
>
>Signed-off-by: Sven Eckelmann 

Reviewed-by: Jiri Pirko 


Re: [PATCH net-next] net: dwc-xlgmac: Get rid of custom hex_dump_to_buffer()

2017-12-21 Thread Andy Shevchenko
On Thu, 2017-12-21 at 13:32 +0800, Jie Deng wrote:
> Get rid of custom hex_dump_to_buffer().
> 
> The output is slightly changed, i.e. each byte followed by white
> space.
> 
> Note, we don't use print_hex_dump() here since the original code uses
> nedev_dbg().
> 

Jie, thanks for taking care of update.

David, please, consider this one to be applied.

> Signed-off-by: Andy Shevchenko 
> Signed-off-by: Jie Deng 
> ---
>  drivers/net/ethernet/synopsys/dwc-xlgmac-common.c | 24 +++---
> -
>  1 file changed, 7 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/net/ethernet/synopsys/dwc-xlgmac-common.c
> b/drivers/net/ethernet/synopsys/dwc-xlgmac-common.c
> index d655a42..eb1c6b0 100644
> --- a/drivers/net/ethernet/synopsys/dwc-xlgmac-common.c
> +++ b/drivers/net/ethernet/synopsys/dwc-xlgmac-common.c
> @@ -333,9 +333,8 @@ void xlgmac_print_pkt(struct net_device *netdev,
> struct sk_buff *skb, bool tx_rx)
>  {
>   struct ethhdr *eth = (struct ethhdr *)skb->data;
> - unsigned char *buf = skb->data;
>   unsigned char buffer[128];
> - unsigned int i, j;
> + unsigned int i;
>  
>   netdev_dbg(netdev, "\n** SKB dump
> \n");
>  
> @@ -346,22 +345,13 @@ void xlgmac_print_pkt(struct net_device *netdev,
>   netdev_dbg(netdev, "Src MAC addr: %pM\n", eth->h_source);
>   netdev_dbg(netdev, "Protocol: %#06hx\n", ntohs(eth-
> >h_proto));
>  
> - for (i = 0, j = 0; i < skb->len;) {
> - j += snprintf(buffer + j, sizeof(buffer) - j,
> "%02hhx",
> -   buf[i++]);
> -
> - if ((i % 32) == 0) {
> - netdev_dbg(netdev, "  %#06x: %s\n", i - 32,
> buffer);
> - j = 0;
> - } else if ((i % 16) == 0) {
> - buffer[j++] = ' ';
> - buffer[j++] = ' ';
> - } else if ((i % 4) == 0) {
> - buffer[j++] = ' ';
> - }
> + for (i = 0; i < skb->len; i += 32) {
> + unsigned int len = min(skb->len - i, 32U);
> +
> + hex_dump_to_buffer(&skb->data[i], len, 32, 1,
> +buffer, sizeof(buffer), false);
> + netdev_dbg(netdev, "  %#06x: %s\n", i, buffer);
>   }
> - if (i % 32)
> - netdev_dbg(netdev, "  %#06x: %s\n", i - (i % 32),
> buffer);
>  
>   netdev_dbg(netdev, "\n** SKB dump
> \n");
>  }

-- 
Andy Shevchenko 
Intel Finland Oy


Re: [RFC PATCH net-next] tools/bpf: fix build with binutils >= 2.28

2017-12-21 Thread Quentin Monnet
2017-12-20 18:32 UTC+ ~ Roman Gushchin 
> On Tue, Dec 19, 2017 at 04:22:51PM +, Quentin Monnet wrote:
>> 2017-12-19 16:10 UTC+ ~ Roman Gushchin 
>>> On Tue, Dec 19, 2017 at 03:57:02PM +, Quentin Monnet wrote:
 Hi Roman, thanks for working on this!


 I discussed this issue with Jakub recently, and one suggestion he had
 was to look in tools/build/feature to add a new "feature", by trying to
 compile short programs, for making the distinction between binutils
 versions. It probably requires more work, but could be more robust than
 parsing the version from the command line?
>>>
>>> Hm, might be an option. Parsing readelf output is pretty ugly, here I agree.
>>> In general it feels more like a binutils issue, so we have to workaround it
>>> in either way.
>>>
>>> Is Jakub or someone else working on it?
>>>
>>> Thanks!
>>>
>>
>> Jakub isn't. On our side, I noticed last week that there was this change
>> in binutils, and started to have a look at how these "features" work.
>> But I have nothing that works so far, so feel free to tackle this.
>>
>> Quentin
> 
> Hi Quentin!
> 
> Can you, please, check that the patch below works in your environment.
> 
> Thanks!


Hi Roman,

It failed for me, but I could make it work with just a small fix in
tools/build/feature/Makefile, see below. I also add some generic
comments inline.

> 
> --
> 
> 
> From b08deabf42e4c143b9e0eec8c49714e4d2c928e3 Mon Sep 17 00:00:00 2001
> From: Roman Gushchin 
> Date: Wed, 20 Dec 2017 13:27:32 +
> Subject: [RFC PATCH net-next] tools/bpftool: fix bpftool build with bintutils
>  >= 2.8
> 
> Bpftool build is broken with binutils version 2.28 and later.
> The cause is commit 003ca0fd2286 ("Refactor disassembler selection")
> in the binutils repo, which changed the disassembler() function
> signature.
> 
> Fix this by adding a new "feature" to the tools/build/features
> infrastructure and make it responsible for decision which
> disassembler() function signature to use.
> 
> Signed-off-by: Roman Gushchin 
> Cc: Jakub Kicinski 
> Cc: Alexei Starovoitov 
> Cc: Daniel Borkmann 
> ---
>  tools/bpf/Makefile  | 18 ++
>  tools/bpf/bpf_jit_disasm.c  |  7 +++
>  tools/bpf/bpftool/Makefile  | 13 +
>  tools/bpf/bpftool/jit_disasm.c  |  7 +++
>  tools/build/feature/Makefile|  4 
>  tools/build/feature/test-disassembler.c | 15 +++
>  6 files changed, 64 insertions(+)
>  create mode 100644 tools/build/feature/test-disassembler.c
> 
> diff --git a/tools/bpf/Makefile b/tools/bpf/Makefile
> index 07a6697466ef..c62b3a311486 100644
> --- a/tools/bpf/Makefile
> +++ b/tools/bpf/Makefile
> @@ -9,6 +9,24 @@ MAKE = make
>  CFLAGS += -Wall -O2
>  CFLAGS += -D__EXPORTED_HEADERS__ -I../../include/uapi -I../../include
>  
> +ifeq ($(srctree),)
> +srctree := $(patsubst %/,%,$(dir $(CURDIR)))
> +srctree := $(patsubst %/,%,$(dir $(srctree)))
> +endif
> +

libbpf has a FEATURE_USER = .libbpf here, that seems to be used as a
suffix for file FEATURE-DUMP. Not sure what it is used for, but could be
a good practice, so we should check this maybe.

> +FEATURE_TESTS = disassembler
> +FEATURE_DISPLAY = disassembler

While at it, could you add the "bfd" feature verification before
"disassembler"? The former already exists in tools/build/feature.
Logically feature "disassembler" would fail to be detected if bfd is
missing, but that would seem cleaner to me.

> +
> +ifeq ($(FEATURES_DUMP),)
> +include $(srctree)/tools/build/Makefile.feature
> +else
> +include $(FEATURES_DUMP)
> +endif
> +
> +ifeq ($(feature-disassembler), 1)
> +CFLAGS += -DNEW_DISSASSEMBLER_SIGNATURE
> +endif

Nice, but this means that we check the feature for all targets. This is
not necessary, in particular in bpftool Makefile below, this means that
we check the feature even on `make doc` for example.

libbpf has something in its Makefile to prevent that, you may want to
have a look at its check_feat variable.

Also checking the feature compiles some files in the tree that are not
removed with existing "clean" target. Again, I use libbpf as a reference
(see config-clean target, and removal of FEATURE-DUMP.

> +
>  %.yacc.c: %.y
>   $(YACC) -o $@ -d $<
>  
> diff --git a/tools/bpf/bpf_jit_disasm.c b/tools/bpf/bpf_jit_disasm.c
> index 75bf526a0168..a5f4dbacdb11 100644
> --- a/tools/bpf/bpf_jit_disasm.c
> +++ b/tools/bpf/bpf_jit_disasm.c
> @@ -72,7 +72,14 @@ static void get_asm_insns(uint8_t *image, size_t len, int 
> opcodes)
>  
>   disassemble_init_for_target(&info);
>  
> +#ifdef NEW_DISSASSEMBLER_SIGNATURE

Could we find another macro name? It makes perfect sense today, but
maybe no so much in the future, in particular if the prototype were to
change again. Something like BINUTILS_VERSION_GEQ_2_29, or
DISASM_FOUR_ARGS_SIGNATURE?

Also for the name of the feature ("disassembler" in your patch) we might
want to find something more explicit, we're

Re: [PATCHv3 2/3] net: socionext: Add Synquacer NetSec driver

2017-12-21 Thread Ard Biesheuvel
Hi Jassi,

On 21 December 2017 at 12:11,   wrote:
> From: Jassi Brar 
>
> This driver adds support for Socionext "netsec" IP Gigabit
> Ethernet + PHY IP used in the Synquacer SC2A11 SoC.
>
> Signed-off-by: Ard Biesheuvel 
> Signed-off-by: Jassi Brar 
> ---
>  drivers/net/ethernet/Kconfig|1 +
>  drivers/net/ethernet/Makefile   |1 +
>  drivers/net/ethernet/socionext/Kconfig  |   29 +
>  drivers/net/ethernet/socionext/Makefile |1 +
>  drivers/net/ethernet/socionext/netsec.c | 1849 
> +++
>  5 files changed, 1881 insertions(+)
>  create mode 100644 drivers/net/ethernet/socionext/Kconfig
>  create mode 100644 drivers/net/ethernet/socionext/Makefile
>  create mode 100644 drivers/net/ethernet/socionext/netsec.c
>
> diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
> index c604213..d50519e 100644
> --- a/drivers/net/ethernet/Kconfig
> +++ b/drivers/net/ethernet/Kconfig
> @@ -170,6 +170,7 @@ source "drivers/net/ethernet/sis/Kconfig"
>  source "drivers/net/ethernet/sfc/Kconfig"
>  source "drivers/net/ethernet/sgi/Kconfig"
>  source "drivers/net/ethernet/smsc/Kconfig"
> +source "drivers/net/ethernet/socionext/Kconfig"
>  source "drivers/net/ethernet/stmicro/Kconfig"
>  source "drivers/net/ethernet/sun/Kconfig"
>  source "drivers/net/ethernet/tehuti/Kconfig"
> diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile
> index 39f62733..6cf5ade 100644
> --- a/drivers/net/ethernet/Makefile
> +++ b/drivers/net/ethernet/Makefile
> @@ -82,6 +82,7 @@ obj-$(CONFIG_SFC) += sfc/
>  obj-$(CONFIG_SFC_FALCON) += sfc/falcon/
>  obj-$(CONFIG_NET_VENDOR_SGI) += sgi/
>  obj-$(CONFIG_NET_VENDOR_SMSC) += smsc/
> +obj-$(CONFIG_NET_VENDOR_SOCIONEXT) += socionext/
>  obj-$(CONFIG_NET_VENDOR_STMICRO) += stmicro/
>  obj-$(CONFIG_NET_VENDOR_SUN) += sun/
>  obj-$(CONFIG_NET_VENDOR_TEHUTI) += tehuti/
> diff --git a/drivers/net/ethernet/socionext/Kconfig 
> b/drivers/net/ethernet/socionext/Kconfig
> new file mode 100644
> index 000..4601c2f
> --- /dev/null
> +++ b/drivers/net/ethernet/socionext/Kconfig
> @@ -0,0 +1,29 @@
> +#
> +# Socionext Network device configuration
> +#
> +
> +config NET_VENDOR_SOCIONEXT
> +   bool "Socionext devices"
> +   default y
> +   ---help---
> + If you have a network (Ethernet) card belonging to this class, say 
> Y.
> +
> + Note that the answer to this question doesn't directly affect the
> + the questions about Socionext cards. If you say Y, you will be asked
> + for your specific card in the following questions.
> +
> +if NET_VENDOR_SOCIONEXT
> +
> +config SNI_NETSEC
> +   tristate "NETSEC Driver Support"
> +   depends on ARCH_SYNQUACER && OF
> +   select PHYLIB
> +   select MII
> +help
> + Enable to add support for the SocioNext NetSec Gigabit Ethernet
> + controller + PHY, as found on the Synquacer SC2A11 SoC
> +
> + To compile this driver as a module, choose M here: the module will 
> be
> + called netsec.  If unsure, say N.
> +
> +endif # NET_VENDOR_SOCIONEXT
> diff --git a/drivers/net/ethernet/socionext/Makefile 
> b/drivers/net/ethernet/socionext/Makefile
> new file mode 100644
> index 000..9505923
> --- /dev/null
> +++ b/drivers/net/ethernet/socionext/Makefile
> @@ -0,0 +1 @@
> +obj-$(CONFIG_SNI_NETSEC) += netsec.o
> diff --git a/drivers/net/ethernet/socionext/netsec.c 
> b/drivers/net/ethernet/socionext/netsec.c
> new file mode 100644
> index 000..9a9b699
> --- /dev/null
> +++ b/drivers/net/ethernet/socionext/netsec.c
> @@ -0,0 +1,1849 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +
> +#define NETSEC_REG_SOFT_RST0x104
> +#define NETSEC_REG_COM_INIT0x120
> +
> +#define NETSEC_REG_TOP_STATUS  0x200
> +#define NETSEC_IRQ_RX  BIT(1)
> +#define NETSEC_IRQ_TX  BIT(0)
> +
> +#define NETSEC_REG_TOP_INTEN   0x204
> +#define NETSEC_REG_INTEN_SET   0x234
> +#define NETSEC_REG_INTEN_CLR   0x238
> +
> +#define NETSEC_REG_NRM_TX_STATUS   0x400
> +#define NETSEC_REG_NRM_TX_INTEN0x404
> +#define NETSEC_REG_NRM_TX_INTEN_SET0x428
> +#define NETSEC_REG_NRM_TX_INTEN_CLR0x42c
> +#define NRM_TX_ST_NTOWNR   BIT(17)
> +#define NRM_TX_ST_TR_ERR   BIT(16)
> +#define NRM_TX_ST_TXDONE   BIT(15)
> +#define NRM_TX_ST_TMREXP   BIT(14)
> +
> +#define NETSEC_REG_NRM_RX_STATUS   0x440
> +#define NETSEC_REG_NRM_RX_INTEN0x444
> +#define NETSEC_REG_NRM_RX_INTEN_SET0x468
> +#define NETSEC_REG_NRM_RX_INTEN_CLR0x46c
> +#define NRM_RX_ST_RC_ERR   BIT(16)
> +#define NRM_RX_ST_PKTCNT   BIT(15)
> +#define NRM_RX_ST_TMREXP   

Re: [PATCH net-next v4 0/6] net: tcp: sctp: dccp: Replace jprobe usage with trace events

2017-12-21 Thread Masami Hiramatsu
On Wed, 20 Dec 2017 23:08:38 -0500 (EST)
David Miller  wrote:

> From: Masami Hiramatsu 
> Date: Thu, 21 Dec 2017 11:36:57 +0900
> 
> > Could you share your .config file?
> 
> You never need to ask me this question.
> 
> All of my test builds are with "allmodconfig".

OK, thanks!

-- 
Masami Hiramatsu 


Re: [PATCH net-next v7 1/2] dt-bindings: net: add DT bindings for Socionext UniPhier AVE

2017-12-21 Thread Kunihiko Hayashi
Hello Andrew,

On Thu, 21 Dec 2017 12:32:54 +0100 Andrew Lunn  wrote:

> > +Optional properties:
> > + - resets: A phandle to the reset control for the MAC
> > + - local-mac-address: See ethernet.txt in the same directory.
> > +
> > +Required subnode:
> > + - mdio: Device tree subnode with the following required properties:
> > +
> > +Example:
> 
> It sounds like there should be some properties before the Example.

Indeed, this is my carelessness.

> 
>Andrew
> 
> > +
> > +   ether: ethernet@6500 {
> > +   compatible = "socionext,uniphier-ld20-ave4";
> > +   reg = <0x6500 0x8500>;
> > +   interrupts = <0 66 4>;
> > +   phy-mode = "rgmii";
> > +   phy-handle = <ðphy>;
> > +   clocks = <&sys_clk 6>;
> > +   resets = <&sys_rst 6>;
> > +   local-mac-address = [00 00 00 00 00 00];
> 
> Typically you would put a blank line here, before the mdio node.

Okay, I'll put it.

> > +   mdio {
> > +   #address-cells = <1>;
> > +   #size-cells = <0>;
> > +   ethphy: ethphy@1 {
> > +   reg = <1>;
> > +   };
> > +   };
> > +   };
> 
>   Andrew

Thank you,

---
Best Regards,
Kunihiko Hayashi




[PATCH 3/3] net: Remove spinlock from get_net_ns_by_id()

2017-12-21 Thread Kirill Tkhai
idr_find() is safe under rcu_read_lock() and
maybe_get_net() guarantees that net is alive.

Signed-off-by: Kirill Tkhai 
---
 net/core/net_namespace.c |2 --
 1 file changed, 2 deletions(-)

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 6a4eab438221..a675f35a18ff 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -279,11 +279,9 @@ struct net *get_net_ns_by_id(struct net *net, int id)
return NULL;
 
rcu_read_lock();
-   spin_lock_bh(&net->nsid_lock);
peer = idr_find(&net->netns_ids, id);
if (peer)
peer = maybe_get_net(peer);
-   spin_unlock_bh(&net->nsid_lock);
rcu_read_unlock();
 
return peer;



[PATCH 2/3] net: Add BUG_ON() to get_net()

2017-12-21 Thread Kirill Tkhai
Since people may mistakenly obtain destroying net
from net_namespace_list and from net::netns_ids
without checking for its net::counter, let's protect
against such situations and insert BUG_ON() to stop
move on after this.

Panic is better, than memory corruption and undefined
behavior.

Signed-off-by: Kirill Tkhai 
---
 include/net/net_namespace.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 10f99dafd5ac..ff0e47471d5b 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -195,7 +195,7 @@ void __put_net(struct net *net);
 
 static inline struct net *get_net(struct net *net)
 {
-   atomic_inc(&net->count);
+   BUG_ON(atomic_inc_return(&net->count) <= 1);
return net;
 }
 



[PATCH 1/3] net: Fix possible race in peernet2id_alloc()

2017-12-21 Thread Kirill Tkhai
peernet2id_alloc() is racy without rtnl_lock() as atomic_read(&peer->count)
under net->nsid_lock does not guarantee, peer is alive:

rcu_read_lock()
peernet2id_alloc()..
  spin_lock_bh(&net->nsid_lock)   ..
  atomic_read(&peer->count) == 1  ..
  ..  put_net()
  ..cleanup_net()
  ..  for_each_net(tmp)
  ..
spin_lock_bh(&tmp->nsid_lock)
  ..__peernet2id(tmp, net) == -1
  ....
  ....
__peernet2id_alloc(alloc == true)   ..
  ....
rcu_read_unlock()   ..
..synchronize_rcu()
..kmem_cache_free(net)

After the above situation, net::netns_id contains id pointing to freed memory,
and any other dereferencing by the id will operate with this freed memory.

Currently, peernet2id_alloc() is used under rtnl_lock() everywhere except
ovs_vport_cmd_fill_info(), and this race can't occur. But peernet2id_alloc()
is generic interface, and better we fix it before someone really starts
use it in wrong context.

Signed-off-by: Kirill Tkhai 
---
 net/core/net_namespace.c |   23 +++
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 60a71be75aea..6a4eab438221 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -221,17 +221,32 @@ static void rtnl_net_notifyid(struct net *net, int cmd, 
int id);
  */
 int peernet2id_alloc(struct net *net, struct net *peer)
 {
-   bool alloc;
+   bool alloc = false, alive = false;
int id;
 
-   if (atomic_read(&net->count) == 0)
-   return NETNSA_NSID_NOT_ASSIGNED;
spin_lock_bh(&net->nsid_lock);
-   alloc = atomic_read(&peer->count) == 0 ? false : true;
+   /* Spinlock guarantees we never hash a peer to net->netns_ids
+* after idr_destroy(&net->netns_ids) occurs in cleanup_net().
+*/
+   if (atomic_read(&net->count) == 0) {
+   id = NETNSA_NSID_NOT_ASSIGNED;
+   goto unlock;
+   }
+   /*
+* When peer is obtained from RCU lists, we may race with
+* its cleanup. Check whether it's alive, and this guarantees
+* we never hash a peer back to net->netns_ids, after it has
+* just been idr_remove()'d from there in cleanup_net().
+*/
+   if (maybe_get_net(peer))
+   alive = alloc = true;
id = __peernet2id_alloc(net, peer, &alloc);
+unlock:
spin_unlock_bh(&net->nsid_lock);
if (alloc && id >= 0)
rtnl_net_notifyid(net, RTM_NEWNSID, id);
+   if (alive)
+   put_net(peer);
return id;
 }
 EXPORT_SYMBOL_GPL(peernet2id_alloc);



Re: [PATCH net-next v4 0/6] net: tcp: sctp: dccp: Replace jprobe usage with trace events

2017-12-21 Thread Masami Hiramatsu
On Wed, 20 Dec 2017 22:12:38 -0500
Steven Rostedt  wrote:

> On Thu, 21 Dec 2017 11:36:57 +0900
> Masami Hiramatsu  wrote:
> 
> > On Wed, 20 Dec 2017 14:24:24 -0500 (EST)
> > David Miller  wrote:
> > 
> > > From: David Miller 
> > > Date: Wed, 20 Dec 2017 14:20:40 -0500 (EST)
> > >   
> > > > From: Masami Hiramatsu 
> > > > Date: Wed, 20 Dec 2017 13:14:11 +0900
> > > >   
> > > >> This series is v4 of the replacement of jprobe usage with trace
> > > >> events. This version is rebased on net-next, fixes a build warning
> > > >> and moves a temporal variable definition in a block.
> > > >> 
> > > >> Previous version is here;
> > > >> https://lkml.org/lkml/2017/12/19/153
> > > >> 
> > > >> Changes from v3:
> > > >>   All: Rebased on net-next
> > > >>   [3/6]: fixes a build warning for i386 by casting pointer unsigned
> > > >> long instead of __u64, and moves a temporal variable
> > > >>  definition in a block.  
> > > > 
> > > > Looks good, series applied to net-next, thanks.  
> > > 
> > > Actually, this doesn't even compile, so I've reverted:
> > > 
> > > [davem@dhcp-10-15-49-227 net-next]$ make -s -j16
> > > In file included from net/dccp/trace.h:105:0,
> > >  from net/dccp/proto.c:42:
> > > ./include/trace/define_trace.h:89:42: fatal error: ./trace.h: No such 
> > > file or directory
> > >  #include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
> > >   ^
> > > compilation terminated.  
> > 
> > Hi David,
> > 
> > Could you share your .config file? I would like to reproduce it.
> > When I tried with attached kconfig, I could not reproduce this issue,
> > and could run it.
> >
> 
> Hi Masami,
> 
> Are you sure you committed everything in this change set? I don't see a
> modification of the Makefile to get the trace.h file you created.
> Shouldn't there be something like:
> 
> CFLAGS_proto.o := -I$(src)
> 
> in the Makefile?

Oops, I didn't do that, but also I had committed all in this changeset.
Hmm, strange, because I don't see same issue... At first I will try to
reproduce it and try above flags again.

Thank you,

-- 
Masami Hiramatsu 


[net 1/1] tipc: fix memory leak of group member when peer node is lost

2017-12-21 Thread Jon Maloy
When a group member receives a member WITHDRAW event, this might have
two reasons: either the peer member is leaving the group, or the link
to the member's node has been lost.

In the latter case we need to issue a DOWN event to the user right away,
and let function tipc_group_filter_msg() perform delete of the member
item. However, in this case we miss to change the state of the member
item to MBR_LEAVING, so the member item is not deleted, and we have a
memory leak.

We now separate better between the four sub-cases of a WITHRAW event
and make sure that each case is handled correctly.

Signed-off-by: Jon Maloy 
---
 net/tipc/group.c | 27 ++-
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/net/tipc/group.c b/net/tipc/group.c
index e5b03f0..8e12ab5 100644
--- a/net/tipc/group.c
+++ b/net/tipc/group.c
@@ -850,17 +850,26 @@ void tipc_group_member_evt(struct tipc_group *grp,
*usr_wakeup = true;
m->usr_pending = false;
node_up = tipc_node_is_up(net, node);
-
-   /* Hold back event if more messages might be expected */
-   if (m->state != MBR_LEAVING && node_up) {
-   m->event_msg = skb;
-   tipc_group_decr_active(grp, m);
-   m->state = MBR_LEAVING;
-   } else {
-   if (node_up)
+   m->event_msg = NULL;
+
+   if (node_up) {
+   /* Hold back event if a LEAVE msg should be expected */
+   if (m->state != MBR_LEAVING) {
+   m->event_msg = skb;
+   tipc_group_decr_active(grp, m);
+   m->state = MBR_LEAVING;
+   } else {
msg_set_grp_bc_seqno(hdr, m->bc_syncpt);
-   else
+   __skb_queue_tail(inputq, skb);
+   }
+   } else {
+   if (m->state != MBR_LEAVING) {
+   tipc_group_decr_active(grp, m);
+   m->state = MBR_LEAVING;
msg_set_grp_bc_seqno(hdr, m->bc_rcv_nxt);
+   } else {
+   msg_set_grp_bc_seqno(hdr, m->bc_syncpt);
+   }
__skb_queue_tail(inputq, skb);
}
list_del_init(&m->list);
-- 
2.1.4



Re: [PATCHv3 1/3] dt-bindings: net: Add DT bindings for Socionext Netsec

2017-12-21 Thread Andrew Lunn
> +- mdio device tree subnode: When the Netsec has a phy connected to its local
> + mdio, there must be device tree subnode with the following
> + required properties:
> +
> + - compatible: Must be "socionext,snq-mdio".

Is there a need for a compatible string? Is there different versions
of the MDIO bus hardware? If it was an independent MDIO bus driver,
then yes, you need a compatible string. But since it is embedded in
the MAC driver, there should not be a need.

Andrew


Re: [PATCHv3 2/3] net: socionext: Add Synquacer NetSec driver

2017-12-21 Thread Andrew Lunn
On Thu, Dec 21, 2017 at 12:43:40PM +, Ard Biesheuvel wrote:
> Hi Jassi,

Hi Ard

Please trim emails when you reply. It was not easy to find your
comment, i'm assuming there was only one, and that i did not miss others.

 Andrew


Re: [PATCHv3 2/3] net: socionext: Add Synquacer NetSec driver

2017-12-21 Thread Andrew Lunn
> +static int netsec_register_mdio(struct netsec_priv *priv, u32 phy_addr)
> +{
> + struct mii_bus *bus;
> + int ret;
> +
> + bus = devm_mdiobus_alloc(priv->dev);
> + if (!bus)
> + return -ENOMEM;
> +
> + snprintf(bus->id, MII_BUS_ID_SIZE, "%s", dev_name(priv->dev));
> + bus->priv = priv;
> + bus->name = "SNI NETSEC MDIO";
> + bus->read = netsec_phy_read;
> + bus->write = netsec_phy_write;
> + bus->parent = priv->dev;
> + priv->mii_bus = bus;
> +
> + if (dev_of_node(priv->dev)) {
> + struct device_node *parent = dev_of_node(priv->dev);
> + struct device_node *child_node, *mdio_node = NULL;
> +
> + for_each_child_of_node(parent, child_node) {
> + if (of_device_is_compatible(child_node,
> + "socionext,snq-mdio")) {

Just use of_get_child_by_name(parent, "mdio");

 Andrew


Re: [PATCH net-next v4 0/6] net: tcp: sctp: dccp: Replace jprobe usage with trace events

2017-12-21 Thread Masami Hiramatsu
On Wed, 20 Dec 2017 22:12:38 -0500
Steven Rostedt  wrote:

> On Thu, 21 Dec 2017 11:36:57 +0900
> Masami Hiramatsu  wrote:
> 
> > On Wed, 20 Dec 2017 14:24:24 -0500 (EST)
> > David Miller  wrote:
> > 
> > > From: David Miller 
> > > Date: Wed, 20 Dec 2017 14:20:40 -0500 (EST)
> > >   
> > > > From: Masami Hiramatsu 
> > > > Date: Wed, 20 Dec 2017 13:14:11 +0900
> > > >   
> > > >> This series is v4 of the replacement of jprobe usage with trace
> > > >> events. This version is rebased on net-next, fixes a build warning
> > > >> and moves a temporal variable definition in a block.
> > > >> 
> > > >> Previous version is here;
> > > >> https://lkml.org/lkml/2017/12/19/153
> > > >> 
> > > >> Changes from v3:
> > > >>   All: Rebased on net-next
> > > >>   [3/6]: fixes a build warning for i386 by casting pointer unsigned
> > > >> long instead of __u64, and moves a temporal variable
> > > >>  definition in a block.  
> > > > 
> > > > Looks good, series applied to net-next, thanks.  
> > > 
> > > Actually, this doesn't even compile, so I've reverted:
> > > 
> > > [davem@dhcp-10-15-49-227 net-next]$ make -s -j16
> > > In file included from net/dccp/trace.h:105:0,
> > >  from net/dccp/proto.c:42:
> > > ./include/trace/define_trace.h:89:42: fatal error: ./trace.h: No such 
> > > file or directory
> > >  #include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
> > >   ^
> > > compilation terminated.  
> > 
> > Hi David,
> > 
> > Could you share your .config file? I would like to reproduce it.
> > When I tried with attached kconfig, I could not reproduce this issue,
> > and could run it.
> >
> 
> Hi Masami,
> 
> Are you sure you committed everything in this change set? I don't see a
> modification of the Makefile to get the trace.h file you created.
> Shouldn't there be something like:
> 
> CFLAGS_proto.o := -I$(src)
> 
> in the Makefile?

You're correct. I could reproduce it with kallmodconfig.
And when I add below diff, it is resolved.
If I built with O= option, it didn't happen. 

diff --git a/net/dccp/Makefile b/net/dccp/Makefile
index 9d0383d2f277..8d4a8e901ae0 100644
--- a/net/dccp/Makefile
+++ b/net/dccp/Makefile
@@ -25,3 +25,5 @@ obj-$(CONFIG_INET_DCCP_DIAG) += dccp_diag.o
 dccp-$(CONFIG_SYSCTL) += sysctl.o
 
 dccp_diag-y := diag.o
+
+CFLAGS_proto.o := -I$(src)


I'll update [5/6].

Thank you,

-- 
Masami Hiramatsu 


Re: [PATCH net] vxlan: update skb dst pmtu on tx path

2017-12-21 Thread Xin Long
On Wed, Dec 20, 2017 at 2:38 AM, David Miller  wrote:
> From: Xin Long 
> Date: Wed, 20 Dec 2017 01:05:32 +0800
>
>> On Wed, Dec 20, 2017 at 12:12 AM, David Miller  wrote:
>>> You're going to have to find a way to fix this without
>>> invoking ->update_pmtu() on every single transmit.  That's
>>> really excessive, especially for an operation which is
>>> going to be a NOP %99. of the time.
>> understand, I couldn't find a better way,  and all iptunnels are
>> doing it in this way.
>>
>> Or is it possible to go with an unlikely here ?
>>
>> if (unlikely(skb_dst(skb) && mtu < dst_mtu(skb_dst(skb
>> skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL,
>>skb, mtu);
>>
>>
>  ...
>> how about doing it in vxlan_get_route():
>> @@ -1896,6 +1896,13 @@ static struct rtable *vxlan_get_route(struct
>> vxlan_dev *vxlan, struct net_device
>> *saddr = fl4.saddr;
>> if (use_cache)
>> dst_cache_set_ip4(dst_cache, &rt->dst, fl4.saddr);
>> +
>> +   if (skb_dst(skb)) {
>> +   int mtu = dst_mtu(ndst) - VXLAN_HEADROOM;
>> +
>> +   skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL,
>> +  skb, mtu);
>> +   }
>>
>>
>> This would do it only when no dst_cache and it has to do real route lookup.
>>
>> Note that even when update_pmtu is hit, mostly it will do nothing and
>> just return
>> as usually new mtu >= skb_dst(skb)'s pmtu.
>
> Ok, yeah, this is really difficult.
>
> I'll apply your patch for now, but generally speaking we have to handle this
> issue better.
Thanks Dave.

I'm still thinking about how to support for all udp tunnels icmp packet process,
It's also difficult, as no udp sock for TX.

is it possible we do like the following patch, though I know it's pretty bad
to still update pmtu even if no sock is found. do you have any suggestion
about this ?

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index e4ff25c..b3d2a50 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -608,6 +608,36 @@ static inline bool __udp_is_mcast_sock(struct net
*net, struct sock *sk,
return true;
 }

+static int udp4_encap_err_handler(struct sk_buff *skb, u32 info)
+{
+   struct iphdr *iph = (struct iphdr *)skb->data;
+   struct net *net = dev_net(skb->dev);
+   int type = icmp_hdr(skb)->type;
+   int code = icmp_hdr(skb)->code;
+
+   switch (type) {
+   case ICMP_DEST_UNREACH:
+   if (code == ICMP_FRAG_NEEDED) {
+   ipv4_update_pmtu(skb, net, info, 0, 0,
+iph->protocol, 0);
+   break;
+   } else if (code == ICMP_SR_FAILED) {
+   break;
+   }
+   return -ENOENT;
+   case ICMP_TIME_EXCEEDED:
+   if (code != ICMP_EXC_TTL)
+   break;
+   return -ENOENT;
+   case ICMP_REDIRECT:
+   ipv4_redirect(skb, net, 0, 0, iph->protocol, 0);
+   default:
+   break;
+   }
+
 /*
  * This routine is called by the ICMP module when it gets some
  * sort of error condition.  If err < 0 then the socket should
@@ -635,7 +665,8 @@ void __udp4_lib_err(struct sk_buff *skb, u32 info,
struct udp_table *udptable)
   iph->saddr, uh->source, skb->dev->ifindex, 0,
   udptable, NULL);
if (!sk) {
-   __ICMP_INC_STATS(net, ICMP_MIB_INERRORS);
+   if (udp4_encap_err_handler(skb, info))
+   __ICMP_INC_STATS(net, ICMP_MIB_INERRORS);
return; /* No socket for error */
}


[PATCH v2] FIRMWARE: bcm47xx_nvram: Replace mac address parsing

2017-12-21 Thread Andy Shevchenko
Replace sscanf() with mac_pton().

Signed-off-by: Andy Shevchenko 
---
- use negative condition to be consistent with the rest code
 drivers/firmware/broadcom/Kconfig |  1 +
 drivers/firmware/broadcom/bcm47xx_sprom.c | 18 +++---
 2 files changed, 4 insertions(+), 15 deletions(-)

diff --git a/drivers/firmware/broadcom/Kconfig 
b/drivers/firmware/broadcom/Kconfig
index 5e29f83e7b39..c041dcb7ea52 100644
--- a/drivers/firmware/broadcom/Kconfig
+++ b/drivers/firmware/broadcom/Kconfig
@@ -13,6 +13,7 @@ config BCM47XX_NVRAM
 config BCM47XX_SPROM
bool "Broadcom SPROM driver"
depends on BCM47XX_NVRAM
+   select GENERIC_NET_UTILS
help
  Broadcom devices store configuration data in SPROM. Accessing it is
  specific to the bus host type, e.g. PCI(e) devices have it mapped in
diff --git a/drivers/firmware/broadcom/bcm47xx_sprom.c 
b/drivers/firmware/broadcom/bcm47xx_sprom.c
index 62aa3cf09b4d..4787f86c8ac1 100644
--- a/drivers/firmware/broadcom/bcm47xx_sprom.c
+++ b/drivers/firmware/broadcom/bcm47xx_sprom.c
@@ -137,20 +137,6 @@ static void nvram_read_leddc(const char *prefix, const 
char *name,
*leddc_off_time = (val >> 16) & 0xff;
 }
 
-static void bcm47xx_nvram_parse_macaddr(char *buf, u8 macaddr[6])
-{
-   if (strchr(buf, ':'))
-   sscanf(buf, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx", &macaddr[0],
-   &macaddr[1], &macaddr[2], &macaddr[3], &macaddr[4],
-   &macaddr[5]);
-   else if (strchr(buf, '-'))
-   sscanf(buf, "%hhx-%hhx-%hhx-%hhx-%hhx-%hhx", &macaddr[0],
-   &macaddr[1], &macaddr[2], &macaddr[3], &macaddr[4],
-   &macaddr[5]);
-   else
-   pr_warn("Can not parse mac address: %s\n", buf);
-}
-
 static void nvram_read_macaddr(const char *prefix, const char *name,
   u8 val[6], bool fallback)
 {
@@ -161,7 +147,9 @@ static void nvram_read_macaddr(const char *prefix, const 
char *name,
if (err < 0)
return;
 
-   bcm47xx_nvram_parse_macaddr(buf, val);
+   strreplace(buf, '-', ':');
+   if (!mac_pton(buf, val))
+   pr_warn("Can not parse mac address: %s\n", buf);
 }
 
 static void nvram_read_alpha2(const char *prefix, const char *name,
-- 
2.15.1



[PATCH net-next v4.1 5/6] net: dccp: Add DCCP sendmsg trace event

2017-12-21 Thread Masami Hiramatsu
Add DCCP sendmsg trace event (dccp/dccp_probe) for
replacing dccpprobe. User can trace this event via
ftrace or perftools.

Signed-off-by: Masami Hiramatsu 
---
  Changes in v4.1
   - Fix to add local directory to include for trace.h.
 Thanks Steven!
---
 net/dccp/Makefile |3 ++
 net/dccp/proto.c  |5 +++
 net/dccp/trace.h  |  105 +
 3 files changed, 113 insertions(+)
 create mode 100644 net/dccp/trace.h

diff --git a/net/dccp/Makefile b/net/dccp/Makefile
index 2e7b56097bc4..4215f13a63af 100644
--- a/net/dccp/Makefile
+++ b/net/dccp/Makefile
@@ -27,3 +27,6 @@ dccp-$(CONFIG_SYSCTL) += sysctl.o
 
 dccp_diag-y := diag.o
 dccp_probe-y := probe.o
+
+# build with local directory for trace.h
+CFLAGS_proto.o := -I$(src)
diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index 9d43c1f40274..e57b5db495cd 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -38,6 +38,9 @@
 #include "dccp.h"
 #include "feat.h"
 
+#define CREATE_TRACE_POINTS
+#include "trace.h"
+
 DEFINE_SNMP_STAT(struct dccp_mib, dccp_statistics) __read_mostly;
 
 EXPORT_SYMBOL_GPL(dccp_statistics);
@@ -761,6 +764,8 @@ int dccp_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t len)
int rc, size;
long timeo;
 
+   trace_dccp_probe(sk, len);
+
if (len > dp->dccps_mss_cache)
return -EMSGSIZE;
 
diff --git a/net/dccp/trace.h b/net/dccp/trace.h
new file mode 100644
index ..aa01321a6c37
--- /dev/null
+++ b/net/dccp/trace.h
@@ -0,0 +1,105 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM dccp
+
+#if !defined(_TRACE_DCCP_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_DCCP_H
+
+#include 
+#include "dccp.h"
+#include "ccids/ccid3.h"
+#include 
+
+TRACE_EVENT(dccp_probe,
+
+   TP_PROTO(struct sock *sk, size_t size),
+
+   TP_ARGS(sk, size),
+
+   TP_STRUCT__entry(
+   /* sockaddr_in6 is always bigger than sockaddr_in */
+   __array(__u8, saddr, sizeof(struct sockaddr_in6))
+   __array(__u8, daddr, sizeof(struct sockaddr_in6))
+   __field(__u16, sport)
+   __field(__u16, dport)
+   __field(__u16, size)
+   __field(__u16, tx_s)
+   __field(__u32, tx_rtt)
+   __field(__u32, tx_p)
+   __field(__u32, tx_x_calc)
+   __field(__u64, tx_x_recv)
+   __field(__u64, tx_x)
+   __field(__u32, tx_t_ipi)
+   ),
+
+   TP_fast_assign(
+   const struct inet_sock *inet = inet_sk(sk);
+   struct ccid3_hc_tx_sock *hc = NULL;
+
+   if (ccid_get_current_tx_ccid(dccp_sk(sk)) == DCCPC_CCID3)
+   hc = ccid3_hc_tx_sk(sk);
+
+   memset(__entry->saddr, 0, sizeof(struct sockaddr_in6));
+   memset(__entry->daddr, 0, sizeof(struct sockaddr_in6));
+
+   if (sk->sk_family == AF_INET) {
+   struct sockaddr_in *v4 = (void *)__entry->saddr;
+
+   v4->sin_family = AF_INET;
+   v4->sin_port = inet->inet_sport;
+   v4->sin_addr.s_addr = inet->inet_saddr;
+   v4 = (void *)__entry->daddr;
+   v4->sin_family = AF_INET;
+   v4->sin_port = inet->inet_dport;
+   v4->sin_addr.s_addr = inet->inet_daddr;
+#if IS_ENABLED(CONFIG_IPV6)
+   } else if (sk->sk_family == AF_INET6) {
+   struct sockaddr_in6 *v6 = (void *)__entry->saddr;
+
+   v6->sin6_family = AF_INET6;
+   v6->sin6_port = inet->inet_sport;
+   v6->sin6_addr = inet6_sk(sk)->saddr;
+   v6 = (void *)__entry->daddr;
+   v6->sin6_family = AF_INET6;
+   v6->sin6_port = inet->inet_dport;
+   v6->sin6_addr = sk->sk_v6_daddr;
+#endif
+   }
+
+   /* For filtering use */
+   __entry->sport = ntohs(inet->inet_sport);
+   __entry->dport = ntohs(inet->inet_dport);
+
+   __entry->size = size;
+   if (hc) {
+   __entry->tx_s = hc->tx_s;
+   __entry->tx_rtt = hc->tx_rtt;
+   __entry->tx_p = hc->tx_p;
+   __entry->tx_x_calc = hc->tx_x_calc;
+   __entry->tx_x_recv = hc->tx_x_recv >> 6;
+   __entry->tx_x = hc->tx_x >> 6;
+   __entry->tx_t_ipi = hc->tx_t_ipi;
+   } else {
+   __entry->tx_s = 0;
+   memset(&__entry->tx_rtt, 0, (void *)&__entry->tx_t_ipi -
+  (void *)&__entry->tx_rtt +
+  sizeof(__entry->tx_t_ipi));
+   }
+   ),
+
+   TP_printk("src=%pISpc dest=%pISpc size=%d tx_s=%d tx_r

Re: [pull request][for-next 00/11] Mellanox, mlx5 E-Switch updates 2017-12-19

2017-12-21 Thread David Miller
From: Saeed Mahameed 
Date: Thu, 21 Dec 2017 06:19:09 +

> On Wed, 2017-12-20 at 12:56 -0500, David Miller wrote:
>> From: Saeed Mahameed 
>> Date: Tue, 19 Dec 2017 12:33:29 -0800
>> 
>> > This patchset is based on rc4 and I see that net-next is still on
>> > rc3, i hope
>> > this is not a problem.
>> 
>> If it doesn't pull cleanly into net-next, then it would be a problem.
> 
> It does pull cleanly.

But if it is based upon rc4 I'll get tons of crap that is non-networking
related and you should never create a situation like that.

That's not "cleanly"


Re: [PATCH V4 net-next 00/17] add some features and fix some bugs for HNS3 driver

2017-12-21 Thread David Miller

I'm not looking at this patch series until you sort out your full
name properly.

Thank you.


Re: [PATCH V3 net-next 00/17] add some features and fix some bugs for HNS3 driver

2017-12-21 Thread David Miller
From: "lipeng (Y)" 
Date: Thu, 21 Dec 2017 14:57:02 +0800

> have checked with him, and will fix his name spelling to "Mingguang
> Qu".

So now all of your colleagues will use "Familyname Surname" or
"Surname Familyname" format, yet you will stick with this
"Familynamesurname" one word thing?

What we are looking for is consistency.

I understand the ordering in asian languages is opposite to western, I
see it all the time in Korea and elsewhere, it is nothing new to me.

But this desire to use a single word to contain both the surname and
the familyname is not what should be done when your asian names are
romanized.

I don't see this practice generally done by other Chinese developers.
So please use "Li Peng" or "Peng Li", whichever you prefer, but be
consistent with your colleagues and other's of your culture who submit
kernel changes.

Thank you.


[PATCH v2] openvswitch: Trim off padding before L3+ netfilter processing

2017-12-21 Thread Ed Swierk
IPv4 and IPv6 packets may arrive with lower-layer padding that is not
included in the L3 length. For example, a short IPv4 packet may have
up to 6 bytes of padding following the IP payload when received on an
Ethernet device. In the normal IPv4 receive path, ip_rcv() trims the
packet to ip_hdr->tot_len before invoking netfilter hooks (including
conntrack and nat).

In the IPv6 receive path, ip6_rcv() does the same using
ipv6_hdr->payload_len. Similarly in the br_netfilter receive path,
br_validate_ipv4() and br_validate_ipv6() trim the packet to the L3
length before invoking NF_INET_PRE_ROUTING hooks.

In the OVS conntrack receive path, ovs_ct_execute() pulls the skb to
the L3 header but does not trim it to the L3 length before calling
nf_conntrack_in(NF_INET_PRE_ROUTING). When nf_conntrack_proto_tcp
encounters a packet with lower-layer padding, nf_checksum() fails and
logs "nf_ct_tcp: bad TCP checksum". While extra zero bytes don't
affect the checksum, the length in the IP pseudoheader does. That
length is based on skb->len, and without trimming, it doesn't match
the length the sender used when computing the checksum.

The assumption throughout nf_conntrack and nf_nat is that skb->len
reflects the length of the L3 header and payload, so there is no need
to refer back to ip_hdr->tot_len or ipv6_hdr->payload_len.

This change brings OVS into line with other netfilter users, trimming
IPv4 and IPv6 packets prior to L3+ netfilter processing.

Signed-off-by: Ed Swierk 
---
v2:
- Trim packet in nat receive path as well as conntrack
- Free skb on error
---
 net/openvswitch/conntrack.c | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index b27c5c6..1bdc78f 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -703,6 +703,33 @@ static bool skb_nfct_cached(struct net *net,
return ct_executed;
 }
 
+/* Trim the skb to the L3 length. Assumes the skb is already pulled to
+ * the L3 header. The skb is freed on error.
+ */
+static int skb_trim_l3(struct sk_buff *skb)
+{
+   unsigned int nh_len;
+   int err;
+
+   switch (skb->protocol) {
+   case htons(ETH_P_IP):
+   nh_len = ntohs(ip_hdr(skb)->tot_len);
+   break;
+   case htons(ETH_P_IPV6):
+   nh_len = ntohs(ipv6_hdr(skb)->payload_len)
+   + sizeof(struct ipv6hdr);
+   break;
+   default:
+   nh_len = skb->len;
+   }
+
+   err = pskb_trim_rcsum(skb, nh_len);
+   if (err)
+   kfree_skb(skb);
+
+   return err;
+}
+
 #ifdef CONFIG_NF_NAT_NEEDED
 /* Modelled after nf_nat_ipv[46]_fn().
  * range is only used for new, uninitialized NAT state.
@@ -715,8 +742,12 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct 
nf_conn *ct,
 {
int hooknum, nh_off, err = NF_ACCEPT;
 
+   /* The nat module expects to be working at L3. */
nh_off = skb_network_offset(skb);
skb_pull_rcsum(skb, nh_off);
+   err = skb_trim_l3(skb);
+   if (err)
+   return err;
 
/* See HOOK2MANIP(). */
if (maniptype == NF_NAT_MANIP_SRC)
@@ -,6 +1142,9 @@ int ovs_ct_execute(struct net *net, struct sk_buff *skb,
/* The conntrack module expects to be working at L3. */
nh_ofs = skb_network_offset(skb);
skb_pull_rcsum(skb, nh_ofs);
+   err = skb_trim_l3(skb);
+   if (err)
+   return err;
 
if (key->ip.frag != OVS_FRAG_TYPE_NONE) {
err = handle_fragments(net, key, info->zone.id, skb);
-- 
1.9.1



[PATCH v2 RESEND] openvswitch: Trim off padding before L3+ netfilter processing

2017-12-21 Thread Ed Swierk
IPv4 and IPv6 packets may arrive with lower-layer padding that is not
included in the L3 length. For example, a short IPv4 packet may have
up to 6 bytes of padding following the IP payload when received on an
Ethernet device. In the normal IPv4 receive path, ip_rcv() trims the
packet to ip_hdr->tot_len before invoking netfilter hooks (including
conntrack and nat).

In the IPv6 receive path, ip6_rcv() does the same using
ipv6_hdr->payload_len. Similarly in the br_netfilter receive path,
br_validate_ipv4() and br_validate_ipv6() trim the packet to the L3
length before invoking NF_INET_PRE_ROUTING hooks.

In the OVS conntrack receive path, ovs_ct_execute() pulls the skb to
the L3 header but does not trim it to the L3 length before calling
nf_conntrack_in(NF_INET_PRE_ROUTING). When nf_conntrack_proto_tcp
encounters a packet with lower-layer padding, nf_checksum() fails and
logs "nf_ct_tcp: bad TCP checksum". While extra zero bytes don't
affect the checksum, the length in the IP pseudoheader does. That
length is based on skb->len, and without trimming, it doesn't match
the length the sender used when computing the checksum.

The assumption throughout nf_conntrack and nf_nat is that skb->len
reflects the length of the L3 header and payload, so there is no need
to refer back to ip_hdr->tot_len or ipv6_hdr->payload_len.

This change brings OVS into line with other netfilter users, trimming
IPv4 and IPv6 packets prior to L3+ netfilter processing.

Signed-off-by: Ed Swierk 
---
(resent with Pravin's correct address)
v2:
- Trim packet in nat receive path as well as conntrack
- Free skb on error
---
 net/openvswitch/conntrack.c | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index b27c5c6..1bdc78f 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -703,6 +703,33 @@ static bool skb_nfct_cached(struct net *net,
return ct_executed;
 }
 
+/* Trim the skb to the L3 length. Assumes the skb is already pulled to
+ * the L3 header. The skb is freed on error.
+ */
+static int skb_trim_l3(struct sk_buff *skb)
+{
+   unsigned int nh_len;
+   int err;
+
+   switch (skb->protocol) {
+   case htons(ETH_P_IP):
+   nh_len = ntohs(ip_hdr(skb)->tot_len);
+   break;
+   case htons(ETH_P_IPV6):
+   nh_len = ntohs(ipv6_hdr(skb)->payload_len)
+   + sizeof(struct ipv6hdr);
+   break;
+   default:
+   nh_len = skb->len;
+   }
+
+   err = pskb_trim_rcsum(skb, nh_len);
+   if (err)
+   kfree_skb(skb);
+
+   return err;
+}
+
 #ifdef CONFIG_NF_NAT_NEEDED
 /* Modelled after nf_nat_ipv[46]_fn().
  * range is only used for new, uninitialized NAT state.
@@ -715,8 +742,12 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct 
nf_conn *ct,
 {
int hooknum, nh_off, err = NF_ACCEPT;
 
+   /* The nat module expects to be working at L3. */
nh_off = skb_network_offset(skb);
skb_pull_rcsum(skb, nh_off);
+   err = skb_trim_l3(skb);
+   if (err)
+   return err;
 
/* See HOOK2MANIP(). */
if (maniptype == NF_NAT_MANIP_SRC)
@@ -,6 +1142,9 @@ int ovs_ct_execute(struct net *net, struct sk_buff *skb,
/* The conntrack module expects to be working at L3. */
nh_ofs = skb_network_offset(skb);
skb_pull_rcsum(skb, nh_ofs);
+   err = skb_trim_l3(skb);
+   if (err)
+   return err;
 
if (key->ip.frag != OVS_FRAG_TYPE_NONE) {
err = handle_fragments(net, key, info->zone.id, skb);
-- 
1.9.1



Re: [bpf-next V1-RFC PATCH 10/14] tun: setup xdp_rxq_info

2017-12-21 Thread Jesper Dangaard Brouer
On Wed, 20 Dec 2017 15:48:01 +0800
Jason Wang  wrote:

> > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > index e367d6310353..f1df08c2c541 100644
> > --- a/drivers/net/tun.c
> > +++ b/drivers/net/tun.c
> > @@ -180,6 +180,7 @@ struct tun_file {
> > struct list_head next;
> > struct tun_struct *detached;
> > struct skb_array tx_array;
> > +   struct xdp_rxq_info xdp_rxq;
> >   };
> >   
> >   struct tun_flow_entry {
> > @@ -687,8 +688,10 @@ static void __tun_detach(struct tun_file *tfile, bool 
> > clean)
> > tun->dev->reg_state == NETREG_REGISTERED)
> > unregister_netdevice(tun->dev);
> > }
> > -   if (tun)
> > +   if (tun) {
> > skb_array_cleanup(&tfile->tx_array);
> > +   xdp_rxq_info_unreg(&tfile->xdp_rxq);
> > +   }
> > sock_put(&tfile->sk);
> > }
> >   }
> > @@ -728,11 +731,15 @@ static void tun_detach_all(struct net_device *dev)
> > tun_napi_del(tun, tfile);
> > /* Drop read queue */
> > tun_queue_purge(tfile);
> > +   skb_array_cleanup(&tfile->tx_array);  
> 
> Looks like this is unnecessary, skb array will be cleaned up only when 
> fd is closed otherwise there will be a double free.

What code path is called on "fd close" which call skb_array_cleanup() ?
(Is it __tun_detach()?)

Then, I guess I don't need below xdp_rxq_info_unreg() either, right?
 
> > +   xdp_rxq_info_unreg(&tfile->xdp_rxq);
> > sock_put(&tfile->sk);

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [PATCH net-next v4.1 5/6] net: dccp: Add DCCP sendmsg trace event

2017-12-21 Thread David Miller

When you fix any part of a patch series, you must always repost the
entire series from scratch, not just the patch(s) that change.

Thank you.


Re: [PATCH net-next v4.1 5/6] net: dccp: Add DCCP sendmsg trace event

2017-12-21 Thread Steven Rostedt
On Thu, 21 Dec 2017 10:57:36 -0500 (EST)
David Miller  wrote:

> When you fix any part of a patch series, you must always repost the
> entire series from scratch, not just the patch(s) that change.

He probably gets that from me. I don't usually require a full series if
only a single patch changes. But I don't receive as many patch series
as you do, so I can handle that work flow without too much difficulty.

-- Steve


Re: RCU callback crashes

2017-12-21 Thread John Fastabend
On 12/20/2017 11:27 PM, Cong Wang wrote:
> On Wed, Dec 20, 2017 at 4:50 PM, Jakub Kicinski  wrote:
>> On Wed, 20 Dec 2017 16:41:14 -0800, Jakub Kicinski wrote:
>>> Just as I hit send... :)  but this looks unrelated, "Comm: sshd" -
>>> so probably from the management interface.
>>>
>>> [  154.604041] 
>>> ==
>>> [  154.612245] BUG: KASAN: slab-out-of-bounds in 
>>> pfifo_fast_dequeue+0x140/0x2d0
>>> [  154.620219] Read of size 8 at addr 88086bb64040 by task sshd/983
>>> [  154.627403]
>>> [  154.629161] CPU: 10 PID: 983 Comm: sshd Not tainted 
>>> 4.15.0-rc3-perf-00984-g82d3fc87a4aa-dirty #13
>>> [  154.639190] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 
>>> 11/08/2016
>>> [  154.647665] Call Trace:
>>> [  154.650494]  dump_stack+0xa6/0x118
>>> [  154.654387]  ? _atomic_dec_and_lock+0xe8/0xe8
>>> [  154.659355]  ? trace_event_raw_event_rcu_torture_read+0x190/0x190
>>> [  154.666263]  ? rcu_segcblist_enqueue+0xe9/0x120
>>> [  154.671422]  ? _raw_spin_unlock_bh+0x91/0xc0
>>> [  154.676286]  ? pfifo_fast_dequeue+0x140/0x2d0
>>> [  154.681251]  print_address_description+0x6a/0x270
>>> [  154.686601]  ? pfifo_fast_dequeue+0x140/0x2d0
>>> [  154.691565]  kasan_report+0x23f/0x350
>>> [  154.695752]  pfifo_fast_dequeue+0x140/0x2d0
>>
>> If we trust stack decode it's:
>>
>>615  static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
>>616  {
>>617  struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
>>618  struct sk_buff *skb = NULL;
>>619  int band;
>>620
>>621  for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) {
>>622  struct skb_array *q = band2list(priv, band);
>>623
 624  if (__skb_array_empty(q))
>>625  continue;
>>626
>>627  skb = skb_array_consume_bh(q);
>>628  }
>>629  if (likely(skb)) {
>>630  qdisc_qstats_cpu_backlog_dec(qdisc, skb);
>>631  qdisc_bstats_cpu_update(qdisc, skb);
>>632  qdisc_qstats_cpu_qlen_dec(qdisc);
>>633  }
>>634
>>635  return skb;
>>636  }
> 
> Yeah, this one is clearly a different one and it is introduced by John's
> "lockless" patchset.
> 
> I will take a look tomorrow if John doesn't.
> 

I guess this path

  dev_deactivate_many
dev_deactivate_queue
  qdisc_reset

here we have the qdisc lock but no rcu call or sync before the reset
does a kfree_skb and cleans up list walks. So possible for xmit path to
also be pushing skbs onto the array/lists still. I don't think this is
the issue triggered above but needs to be fixed

Also net_synchronize uses synchronize_rcu and we also have _bh variants
involved here...

Finally looks like net_tx_action is calling into qdisc_run without
rcu_read. Either need to check is_running bit (wanted to avoid this)
or put in rcu critical section. Maybe this is what you hit.

@Jakub, does your test have traffic generator running or just control
path? My theory would be a bit odd if you didn't have traffic, but
something is kicking the dequeue so must be some traffic.

I'll come up with some fixes today.

Thanks,
John


Re: pull-request: bpf-next 2017-12-18

2017-12-21 Thread David Miller
From: David Miller 
Date: Wed, 20 Dec 2017 16:16:44 -0500 (EST)

> I think I understand how this new stuff works, I'll take a stab at
> doing the sparc64 JIT bits.

This patch should do it, please queue up for bpf-next.

But this is really overkill on sparc64.

No matter where you relocate the call destination to, the size of the
program and the code output will be identical except for the call
instruction PC relative offset field.

So at some point as a follow-up I should change this code to simply
scan the insns for the function calls and fixup the offsets, rather
than do a full set of code generation passes.

Thanks.


bpf: sparc64: Add JIT support for multi-function programs.

Modelled strongly upon the arm64 implementation.

Signed-off-by: David S. Miller 

diff --git a/arch/sparc/net/bpf_jit_comp_64.c b/arch/sparc/net/bpf_jit_comp_64.c
index a2f1b5e..4ee417f 100644
--- a/arch/sparc/net/bpf_jit_comp_64.c
+++ b/arch/sparc/net/bpf_jit_comp_64.c
@@ -1507,11 +1507,19 @@ static void jit_fill_hole(void *area, unsigned int size)
*ptr++ = 0x91d02005; /* ta 5 */
 }
 
+struct sparc64_jit_data {
+   struct bpf_binary_header *header;
+   u8 *image;
+   struct jit_ctx ctx;
+};
+
 struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 {
struct bpf_prog *tmp, *orig_prog = prog;
+   struct sparc64_jit_data *jit_data;
struct bpf_binary_header *header;
bool tmp_blinded = false;
+   bool extra_pass = false;
struct jit_ctx ctx;
u32 image_size;
u8 *image_ptr;
@@ -1531,13 +1539,30 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog 
*prog)
prog = tmp;
}
 
+   jit_data = prog->aux->jit_data;
+   if (!jit_data) {
+   jit_data = kzalloc(sizeof(*jit_data), GFP_KERNEL);
+   if (!jit_data) {
+   prog = orig_prog;
+   goto out;
+   }
+   }
+   if (jit_data->ctx.offset) {
+   ctx = jit_data->ctx;
+   image_ptr = jit_data->image;
+   header = jit_data->header;
+   extra_pass = true;
+   image_size = sizeof(u32) * ctx.idx;
+   goto skip_init_ctx;
+   }
+
memset(&ctx, 0, sizeof(ctx));
ctx.prog = prog;
 
ctx.offset = kcalloc(prog->len, sizeof(unsigned int), GFP_KERNEL);
if (ctx.offset == NULL) {
prog = orig_prog;
-   goto out;
+   goto out_off;
}
 
/* Fake pass to detect features used, and get an accurate assessment
@@ -1560,7 +1585,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog 
*prog)
}
 
ctx.image = (u32 *)image_ptr;
-
+skip_init_ctx:
for (pass = 1; pass < 3; pass++) {
ctx.idx = 0;
 
@@ -1591,14 +1616,24 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog 
*prog)
 
bpf_flush_icache(header, (u8 *)header + (header->pages * PAGE_SIZE));
 
-   bpf_jit_binary_lock_ro(header);
+   if (!prog->is_func || extra_pass) {
+   bpf_jit_binary_lock_ro(header);
+   } else {
+   jit_data->ctx = ctx;
+   jit_data->image = image_ptr;
+   jit_data->header = header;
+   }
 
prog->bpf_func = (void *)ctx.image;
prog->jited = 1;
prog->jited_len = image_size;
 
+   if (!prog->is_func || extra_pass) {
 out_off:
-   kfree(ctx.offset);
+   kfree(ctx.offset);
+   kfree(jit_data);
+   prog->aux->jit_data = NULL;
+   }
 out:
if (tmp_blinded)
bpf_jit_prog_release_other(prog, prog == orig_prog ?


Re: [QUESTION] Doubt about NAPI_GRO_CB(skb)->is_atomic in tcpv4 gro process

2017-12-21 Thread Alexander Duyck
On Thu, Dec 21, 2017 at 1:16 AM, Yunsheng Lin  wrote:
> Hi, Alexander
>
> On 2017/12/21 0:24, Alexander Duyck wrote:
>> On Wed, Dec 20, 2017 at 1:09 AM, Yunsheng Lin  wrote:
>>> Hi, all
>>> I have some doubt about NAPI_GRO_CB(skb)->is_atomic when
>>> analyzing the tcpv4 gro process:
>>>
>>> Firstly we set NAPI_GRO_CB(skb)->is_atomic to 1 in dev_gro_receive:
>>> https://elixir.free-electrons.com/linux/v4.15-rc4/source/net/core/dev.c#L4838
>>>
>>> And then in inet_gro_receive, we check the NAPI_GRO_CB(skb)->is_atomic
>>> before setting NAPI_GRO_CB(skb)->is_atomic according to IP_DF bit in the ip 
>>> header:
>>> https://elixir.free-electrons.com/linux/v4.15-rc4/source/net/ipv4/af_inet.c#L1319
>>>
>>> struct sk_buff **inet_gro_receive(struct sk_buff **head, struct sk_buff 
>>> *skb)
>>> {
>>> .
>>> for (p = *head; p; p = p->next) {
>>> 
>>>
>>> /* If the previous IP ID value was based on an atomic
>>>  * datagram we can overwrite the value and ignore it.
>>>  */
>>> if (NAPI_GRO_CB(skb)->is_atomic)  //we 
>>> check it here
>>> NAPI_GRO_CB(p)->flush_id = flush_id;
>>> else
>>> NAPI_GRO_CB(p)->flush_id |= flush_id;
>>> }
>>>
>>> NAPI_GRO_CB(skb)->is_atomic = !!(iph->frag_off & htons(IP_DF));  
>>> //we set it here
>>> NAPI_GRO_CB(skb)->flush |= flush;
>>> skb_set_network_header(skb, off);
>>> 
>>> }
>>>
>>> My question is whether we should check the NAPI_GRO_CB(skb)->is_atomic or 
>>> NAPI_GRO_CB(p)->is_atomic?
>>> If we should check NAPI_GRO_CB(skb)->is_atomic, then maybe it is 
>>> unnecessary because it is alway true.
>>> If we should check NAPI_GRO_CB(p)->is_atomic, maybe there is a bug here.
>>>
>>> So what is the logic here? I am just start analyzing the gro, maybe I miss 
>>> something obvious here.
>>
>> The logic there is to address the multiple IP header case where there
>> are 2 or more IP headers due to things like VXLAN or GRE tunnels. So
>> what will happen is that an outer IP header will end up being sent
>> with DF not set and will clear the is_atomic value then we want to OR
>> in the next header that is applied. It defaults to assignment on
>> is_atomic because the first IP header will encounter flush_id with no
>> previous configuration occupying it.
>
> I see your point now.
>
> But for the same flow of tunnels packet, the outer and inner ip header must
> have the same fixed id or increment id?
>
> For example, if we have a flow of tunnels packet which has fixed id in outer
> header and increment id in inner header(the inner header does have DF flag 
> set):
>
> 1. For the first packet, NAPI_GRO_CB(skb)->is_atomic will be set to zero when
> inet_gro_receive is processing the inner ip header.
>
> 2. For the second packet, when inet_gro_receive is processing the outer ip 
> header
> which has a fixed id, NAPI_GRO_CB(p)->is_atomic is zero according to [1], so
> NAPI_GRO_CB(p)->flush_id will be set to 0x, then the second packet will 
> not
> be merged to first packet in tcp_gro_receive.

I'm not sure how valid your case here is. The is_atomic is only really
meant to apply to the inner-most header. In the case of TCP the
inner-most header should almost always have the DF bit set which means
the inner-most is almost always atomic.

> I thought outer ip header could have a fixed id while inner ip header could
> have a increment id. Do I miss something here?

You have it backwards. The innermost will have DF bit set so it can be
fixed, the outer-most will in many cases not since it is usually UDP
and as such it will likely need to increment.

>>
>> The part I am not sure about is if we should be using assignment for
>> is_atomic or using an "&=" to clear the bit and leave it cleared.
>
> I am not sure I understood you here. is_atomic is a bit field, why do you
> want to use "&="?

Actually that was my mind kind of wandering. It has been a while since
I looked at this code and the use of &= wouldn't be appropriate since
is_atomic should only apply to the innermost header.

Basically the only acceptable combinations for is_atomic and flush_id
are false with 0, or true with 1. We can't have a fixed outer header
value if DF is not set.

Hope that helps to clarify things.

- Alex


[PATCH net-next] ipv6: Reinject IPv6 packets if IPsec policy matches after SNAT

2017-12-21 Thread Tobias Brunner
If SNAT modifies the source address the resulting packet might match
an IPsec policy, reinject the packet if that's the case.

The exact same thing is already done for IPv4.

Signed-off-by: Tobias Brunner 
---
 net/ipv6/ip6_output.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 176d74fb3b4d..c90f02632782 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -138,6 +138,14 @@ static int ip6_finish_output(struct net *net, struct sock 
*sk, struct sk_buff *s
return ret;
}
 
+#if defined(CONFIG_NETFILTER) && defined(CONFIG_XFRM)
+   /* Policy lookup after SNAT yielded a new policy */
+   if (skb_dst(skb)->xfrm) {
+   IPCB(skb)->flags |= IPSKB_REROUTED;
+   return dst_output(net, sk, skb);
+   }
+#endif
+
if ((skb->len > ip6_skb_dst_mtu(skb) && !skb_is_gso(skb)) ||
dst_allfrag(skb_dst(skb)) ||
(IP6CB(skb)->frag_max_size && skb->len > IP6CB(skb)->frag_max_size))
-- 
2.7.4


Re: [PATCH v2] FIRMWARE: bcm47xx_nvram: Replace mac address parsing

2017-12-21 Thread Hauke Mehrtens


On 12/21/2017 03:40 PM, Andy Shevchenko wrote:
> Replace sscanf() with mac_pton().
> 
> Signed-off-by: Andy Shevchenko 

Acked-by: Hauke Mehrtens 

The patch looks good, but I haven't tested them on my devices.

> ---
> - use negative condition to be consistent with the rest code
>  drivers/firmware/broadcom/Kconfig |  1 +
>  drivers/firmware/broadcom/bcm47xx_sprom.c | 18 +++---
>  2 files changed, 4 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/firmware/broadcom/Kconfig 
> b/drivers/firmware/broadcom/Kconfig
> index 5e29f83e7b39..c041dcb7ea52 100644
> --- a/drivers/firmware/broadcom/Kconfig
> +++ b/drivers/firmware/broadcom/Kconfig
> @@ -13,6 +13,7 @@ config BCM47XX_NVRAM
>  config BCM47XX_SPROM
>   bool "Broadcom SPROM driver"
>   depends on BCM47XX_NVRAM
> + select GENERIC_NET_UTILS
>   help
> Broadcom devices store configuration data in SPROM. Accessing it is
> specific to the bus host type, e.g. PCI(e) devices have it mapped in
> diff --git a/drivers/firmware/broadcom/bcm47xx_sprom.c 
> b/drivers/firmware/broadcom/bcm47xx_sprom.c
> index 62aa3cf09b4d..4787f86c8ac1 100644
> --- a/drivers/firmware/broadcom/bcm47xx_sprom.c
> +++ b/drivers/firmware/broadcom/bcm47xx_sprom.c
> @@ -137,20 +137,6 @@ static void nvram_read_leddc(const char *prefix, const 
> char *name,
>   *leddc_off_time = (val >> 16) & 0xff;
>  }
>  
> -static void bcm47xx_nvram_parse_macaddr(char *buf, u8 macaddr[6])
> -{
> - if (strchr(buf, ':'))
> - sscanf(buf, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx", &macaddr[0],
> - &macaddr[1], &macaddr[2], &macaddr[3], &macaddr[4],
> - &macaddr[5]);
> - else if (strchr(buf, '-'))
> - sscanf(buf, "%hhx-%hhx-%hhx-%hhx-%hhx-%hhx", &macaddr[0],
> - &macaddr[1], &macaddr[2], &macaddr[3], &macaddr[4],
> - &macaddr[5]);
> - else
> - pr_warn("Can not parse mac address: %s\n", buf);
> -}
> -
>  static void nvram_read_macaddr(const char *prefix, const char *name,
>  u8 val[6], bool fallback)
>  {
> @@ -161,7 +147,9 @@ static void nvram_read_macaddr(const char *prefix, const 
> char *name,
>   if (err < 0)
>   return;
>  
> - bcm47xx_nvram_parse_macaddr(buf, val);
> + strreplace(buf, '-', ':');
> + if (!mac_pton(buf, val))
> + pr_warn("Can not parse mac address: %s\n", buf);
>  }
>  
>  static void nvram_read_alpha2(const char *prefix, const char *name,
> 


Re: [PATCH net] ipv6: Honor specified parameters in fibmatch lookup

2017-12-21 Thread David Miller
From: Ido Schimmel 
Date: Wed, 20 Dec 2017 12:28:25 +0200

> Currently, parameters such as oif and source address are not taken into
> account during fibmatch lookup. Example (IPv4 for reference) before
> patch:
 ...
> The problem stems from the fact that the necessary route lookup flags
> are not set based on these parameters.
> 
> Instead of duplicating the same logic for fibmatch, we can simply
> resolve the original route from its copy and dump it instead.
> 
> Fixes: 18c3a61c4264 ("net: ipv6: RTM_GETROUTE: return matched fib result when 
> requested")
> Signed-off-by: Ido Schimmel 

Applied and queued up for -stable, thanks.


[PATCH bpf] selftests/bpf: fix Makefile for passing LLC to the command line

2017-12-21 Thread Jakub Kicinski
From: Quentin Monnet 

Makefile has a LLC variable that is initialised to "llc", but can
theoretically be overridden from the command line ("make LLC=llc-6.0").
However, this fails because for LLVM probe check, "llc" is called
directly. Use the $(LLC) variable instead to fix this.

Fixes: 22c8852624fc ("bpf: improve selftests and add tests for meta pointer")
Signed-off-by: Quentin Monnet 
Signed-off-by: Jakub Kicinski 
---
 tools/testing/selftests/bpf/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index 05fc4e2e7b3a..9316e648a880 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -39,7 +39,7 @@ $(BPFOBJ): force
 CLANG ?= clang
 LLC   ?= llc
 
-PROBE := $(shell llc -march=bpf -mcpu=probe -filetype=null /dev/null 2>&1)
+PROBE := $(shell $(LLC) -march=bpf -mcpu=probe -filetype=null /dev/null 2>&1)
 
 # Let newer LLVM versions transparently probe the kernel for availability
 # of full BPF instruction set.
-- 
2.15.1



Re: RCU callback crashes

2017-12-21 Thread Jakub Kicinski
On Thu, 21 Dec 2017 08:26:56 -0800, John Fastabend wrote:
> @Jakub, does your test have traffic generator running or just control
> path? My theory would be a bit odd if you didn't have traffic, but
> something is kicking the dequeue so must be some traffic.

It was just control traffic, but it's the first time I've seen it so it
may be very unlikely to trigger...


Re: KASAN: stack-out-of-bounds Read in rds_sendmsg

2017-12-21 Thread Santosh Shilimkar

+Avinash

On 12/21/2017 1:10 AM, syzbot wrote:
syzkaller has found reproducer for the following crash on 


[..]



audit: type=1400 audit(1513847224.110:7): avc:  denied  { map } for  
pid=3157 comm="syzkaller455006" path="/root/syzkaller455006870" 
dev="sda1" ino=16481 
scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023 
tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file permissive=1

==
BUG: KASAN: stack-out-of-bounds in rds_rdma_bytes net/rds/send.c:1013 
[inline]


Could you please post the discussed fix if you are ready with it ?
This new report is same as last one and cmesg length check should
address it.

Regards,
Santosh


Re: [PATCH v4 5/5] flow_dissector: Parse batman-adv unicast headers

2017-12-21 Thread Willem de Bruijn
On Thu, Dec 21, 2017 at 7:24 AM, Jiri Pirko  wrote:
> Thu, Dec 21, 2017 at 10:17:42AM CET, sven.eckelm...@openmesh.com wrote:
>>The batman-adv unicast packets contain a full layer 2 frame in encapsulated
>>form. The flow dissector must therefore be able to parse the batman-adv
>>unicast header to reach the layer 2+3 information.
>>
>>  ++
>>  | ip(v6)hdr  |
>>  ++
>>  | inner ethhdr   |
>>  ++
>>  | batadv unicast hdr |
>>  ++
>>  | outer ethhdr   |
>>  ++
>>
>>The obtained information from the upper layer can then be used by RPS to
>>schedule the processing on separate cores. This allows better distribution
>>of multiple flows from the same neighbor to different cores.
>>
>>Signed-off-by: Sven Eckelmann 
>
> Reviewed-by: Jiri Pirko 

Acked-by: Willem de Bruijn 


Re: [bpf-next V1-RFC PATCH 01/14] xdp: base API for new XDP rx-queue info concept

2017-12-21 Thread Jesper Dangaard Brouer
On Mon, 18 Dec 2017 11:55:01 +0100
Jesper Dangaard Brouer  wrote:

> On Wed, 13 Dec 2017 19:34:40 -0700
> David Ahern  wrote:
> 
> > On 12/13/17 4:19 AM, Jesper Dangaard Brouer wrote:  
> > > +
> > > +void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq)
> > > +{
> > > + xdp_rxq->reg_state = REG_STATE_UNREGISTRED;
> > > +}
> > > +EXPORT_SYMBOL_GPL(xdp_rxq_info_unreg);
> > > +
> > > +void xdp_rxq_info_init(struct xdp_rxq_info *xdp_rxq)
> > > +{
> > > + if (xdp_rxq->reg_state == REG_STATE_REGISTRED) {
> > > + WARN(1, "Missing unregister, handled but fix driver\n");
> > > + xdp_rxq_info_unreg(xdp_rxq);
> > > + }
> > > + memset(xdp_rxq, 0, sizeof(*xdp_rxq));
> > > + xdp_rxq->queue_index = U32_MAX;
> > > + xdp_rxq->reg_state = REG_STATE_NEW;
> > > +}
> > > +EXPORT_SYMBOL_GPL(xdp_rxq_info_init);
> > > +
> > > +void xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq)
> > > +{
> > > + WARN(!xdp_rxq->dev, "Missing net_device from driver");
> > > + WARN(xdp_rxq->queue_index == U32_MAX, "Miss queue_index from driver");
> > > + WARN(!(xdp_rxq->reg_state == REG_STATE_NEW),"API violation, miss init");
> > > + xdp_rxq->reg_state = REG_STATE_REGISTRED;
> > > +}
> > > +EXPORT_SYMBOL_GPL(xdp_rxq_info_reg);
> > > 
> > 
> > Rather than WARN()'s why not make the _reg and _init functions return an
> > int that indicates an error? For example you don't want to continue if
> > the dev is expected but missing.  
> 
> Handling return-errors in the drivers complicated the driver code, as it
> involves unraveling and deallocating other RX-rings etc (that were
> already allocated) if the reg fails. (Also notice next patch will allow
> dev == NULL, if right ptype is set).
> 
> I'm not completely rejecting you idea, as this is a good optimization
> trick, which is to move validation checks to setup-time, thus allowing
> less validation checks at runtime.  I sort-of actually already did
> this, as I allow bpf to deref dev without NULL check.  I would argue
> this is good enough, as we will crash in a predictable way, as above
> WARN will point to which driver violated the API.
> 
> If people think it is valuable I can change this API to return an err?

I will take Ahern's suggestion of returning an err-code, but only from
xdp_rxq_info_reg().  And I'm going to move xdp_rxq_info_init to be an
internal function (which Saeed also implicitly suggested).
I'm working through the drivers now, and only two drivers don't have a
proper error-return for handling xdp_rxq_info_reg() could fail.

I've also extended xdp_rxq_info_reg() to take args dev + idx, to reduce
the code-lines (given we now also have to check return code, this got
too big).  Thus, reg is a single call with if-return-check.


> I guess, it would be more future-proof to do this, as we (Bjørn,
> Michael, Andy) want to extend this to implement a XDP frame/mem return
> code-path.  And the register call will likely have to allocate some
> resource that could fail, which need to be handled...

I'm mostly doing it for above reason, as I'm hoping to avoid touching
every XDP driver once again.  It is a real pain.

> If we do this, we might as well (slab) alloc the xdp_rxq_info
> structure to reduce the bloat in the drivers RX-rings to a single
> pointer (and a pointer to xdp_rxq_info is what xdp_buff.rxq need).

I've dropped my idea of (slab) allocating the xdp_rxq_info structure.
I started coding this up, but realized the number of lines added per
driver got too excessive for no apparent gain. (e.g. I also needed to
take the numa-node into account in some drivers).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected

2017-12-21 Thread Andreas Hartmann

On 12/20/2017 at 11:44 PM Willem de Bruijn wrote:

On Wed, Dec 20, 2017 at 10:56 AM, Andreas Hartmann
 wrote:

On 12/18/2017 at 06:11 PM Andreas Hartmann wrote:

On 12/17/2017 at 11:33 PM Willem de Bruijn wrote:

[...]

I have been able to reproduce the hang by sending a UFO packet
between two guests running v4.13 on a host running v4.15-rc1.

The vhost_net_ubuf_ref refcount indeed hits overflow (-1) from
vhost_zerocopy_callback being called for each segment of a
segmented UFO skb. This refcount is decremented then on each
segment, but incremented only once for the entire UFO skb.

Before v4.14, these packets would be converted in skb_segment to
regular copy packets with skb_orphan_frags and the callback function
called once at this point. v4.14 added support for reference counted
zerocopy skb that can pass through skb_orphan_frags unmodified and
have their zerocopy state safely cloned with skb_zerocopy_clone.

The call to skb_zerocopy_clone must come after skb_orphan_frags
to limit cloning of this state to those skbs that can do so safely.

Please try a host with the following patch. This fixes it for me. I intend to
send it to net.

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a592ca025fc4..d2d985418819 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3654,8 +3654,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,

 skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
   SKBTX_SHARED_FRAG;
-   if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
-   goto err;

 while (pos < offset + len) {
 if (i >= nfrags) {
@@ -3681,6 +3679,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,

 if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC)))
 goto err;
+   if (skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
+   goto err;

 *nskb_frag = *frag;
 __skb_frag_ref(nskb_frag);


This is relatively inefficient, as it calls skb_zerocopy_clone for each frag
in the frags[] array. I will follow-up with a patch to net-next that only
checks once per skb:

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 466581cf4cdc..a293a33604ec 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3662,7 +3662,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,

 skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
   SKBTX_SHARED_FRAG;
-   if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
+   if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
+   skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
 goto err;

 while (pos < offset + len) {
@@ -3676,6 +3677,11 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,

 BUG_ON(!nfrags);

+   if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
+   skb_zerocopy_clone(nskb, frag_skb,
+  GFP_ATOMIC))
+   goto err;
+
 list_skb = list_skb->next;
 }

@@ -3687,9 +3693,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
 goto err;
 }

-   if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC)))
-   goto err;
-


I'm currently testing this one.



Test is in progress. I'm testing w/ 4.14.7, which already contains "net:
accept UFO datagrams from tuntap and packet".

At first, I tested an unpatched 4.14.7 - the problem (no more killable
qemu-process) did occur promptly on shutdown of the machine. This was
expected.

Next, I applied the above patch (the second one). Until now, I didn't
face any problem any more on shutdown of VMs. Looks promising.


Thanks for testing.

I sent the first, simpler, one to net together with another fix.

   http://patchwork.ozlabs.org/patch/851715/



If I'm using the second patch above (the more efficient one and not 
"[net,1/2] skbuff: orphan frags before zerocopy clone"), which I'm 
already testing here: Is it still necessary to apply this patch 
"[net,2/2] skbuff: skb_copy_ubufs must release uarg even without user 
frags"?



Thanks,
Andreas


Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected

2017-12-21 Thread Willem de Bruijn
On Thu, Dec 21, 2017 at 12:05 PM, Andreas Hartmann
 wrote:
> On 12/20/2017 at 11:44 PM Willem de Bruijn wrote:
>>
>> On Wed, Dec 20, 2017 at 10:56 AM, Andreas Hartmann
>>  wrote:
>>>
>>> On 12/18/2017 at 06:11 PM Andreas Hartmann wrote:

 On 12/17/2017 at 11:33 PM Willem de Bruijn wrote:
>>>
>>> [...]
>
> I have been able to reproduce the hang by sending a UFO packet
> between two guests running v4.13 on a host running v4.15-rc1.
>
> The vhost_net_ubuf_ref refcount indeed hits overflow (-1) from
> vhost_zerocopy_callback being called for each segment of a
> segmented UFO skb. This refcount is decremented then on each
> segment, but incremented only once for the entire UFO skb.
>
> Before v4.14, these packets would be converted in skb_segment to
> regular copy packets with skb_orphan_frags and the callback function
> called once at this point. v4.14 added support for reference counted
> zerocopy skb that can pass through skb_orphan_frags unmodified and
> have their zerocopy state safely cloned with skb_zerocopy_clone.
>
> The call to skb_zerocopy_clone must come after skb_orphan_frags
> to limit cloning of this state to those skbs that can do so safely.
>
> Please try a host with the following patch. This fixes it for me. I
> intend to
> send it to net.
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index a592ca025fc4..d2d985418819 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3654,8 +3654,6 @@ struct sk_buff *skb_segment(struct sk_buff
> *head_skb,
>
>  skb_shinfo(nskb)->tx_flags |=
> skb_shinfo(head_skb)->tx_flags &
>SKBTX_SHARED_FRAG;
> -   if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
> -   goto err;
>
>  while (pos < offset + len) {
>  if (i >= nfrags) {
> @@ -3681,6 +3679,8 @@ struct sk_buff *skb_segment(struct sk_buff
> *head_skb,
>
>  if (unlikely(skb_orphan_frags(frag_skb,
> GFP_ATOMIC)))
>  goto err;
> +   if (skb_zerocopy_clone(nskb, frag_skb,
> GFP_ATOMIC))
> +   goto err;
>
>  *nskb_frag = *frag;
>  __skb_frag_ref(nskb_frag);
>
>
> This is relatively inefficient, as it calls skb_zerocopy_clone for each
> frag
> in the frags[] array. I will follow-up with a patch to net-next that
> only
> checks once per skb:
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 466581cf4cdc..a293a33604ec 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3662,7 +3662,8 @@ struct sk_buff *skb_segment(struct sk_buff
> *head_skb,
>
>  skb_shinfo(nskb)->tx_flags |=
> skb_shinfo(head_skb)->tx_flags &
>SKBTX_SHARED_FRAG;
> -   if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
> +   if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
> +   skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
>  goto err;
>
>  while (pos < offset + len) {
> @@ -3676,6 +3677,11 @@ struct sk_buff *skb_segment(struct sk_buff
> *head_skb,
>
>  BUG_ON(!nfrags);
>
> +   if (skb_orphan_frags(frag_skb,
> GFP_ATOMIC) ||
> +   skb_zerocopy_clone(nskb, frag_skb,
> +  GFP_ATOMIC))
> +   goto err;
> +
>  list_skb = list_skb->next;
>  }
>
> @@ -3687,9 +3693,6 @@ struct sk_buff *skb_segment(struct sk_buff
> *head_skb,
>  goto err;
>  }
>
> -   if (unlikely(skb_orphan_frags(frag_skb,
> GFP_ATOMIC)))
> -   goto err;
> -


 I'm currently testing this one.

>>>
>>> Test is in progress. I'm testing w/ 4.14.7, which already contains "net:
>>> accept UFO datagrams from tuntap and packet".
>>>
>>> At first, I tested an unpatched 4.14.7 - the problem (no more killable
>>> qemu-process) did occur promptly on shutdown of the machine. This was
>>> expected.
>>>
>>> Next, I applied the above patch (the second one). Until now, I didn't
>>> face any problem any more on shutdown of VMs. Looks promising.
>>
>>
>> Thanks for testing.
>>
>> I sent the first, simpler, one to net together with another fix.
>>
>>http://patchwork.ozlabs.org/patch/851715/
>>
>
> If

[PATCH net-next v2] xen-netback: make copy batch size configurable

2017-12-21 Thread Joao Martins
Commit eb1723a29b9a ("xen-netback: refactor guest rx") refactored Rx
handling and as a result decreased max grant copy ops from 4352 to 64.
Before this commit it would drain the rx_queue (while there are
enough slots in the ring to put packets) then copy to all pages and write
responses on the ring. With the refactor we do almost the same albeit
the last two steps are done every COPY_BATCH_SIZE (64) copies.

For big packets, the value of 64 means copying 3 packets best case scenario
(17 copies) and worst-case only 1 packet (34 copies, i.e. if all frags
plus head cross the 4k grant boundary) which could be the case when
packets go from local backend process.

Instead of making it static to 64 grant copies, lets allow the user to
select its value (while keeping the current as default) by introducing
the `copy_batch_size` module parameter. This allows users to select
the higher batches (i.e. for better throughput with big packets) as it
was prior to the above mentioned commit.

Signed-off-by: Joao Martins 
---
Changes since v1:
 * move rx_copy.{idx,op} reallocation to separate helper
 Addressed Paul's comments:
 * rename xenvif_copy_state#size field to batch_size
 * argument `size` should be unsigned int
 * vfree is safe with NULL
 * realloc rx_copy.{idx,op} after copy op flush
---
 drivers/net/xen-netback/common.h|  7 +--
 drivers/net/xen-netback/interface.c | 16 +++-
 drivers/net/xen-netback/netback.c   |  5 +
 drivers/net/xen-netback/rx.c| 35 ++-
 4 files changed, 59 insertions(+), 4 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index a46a1e94505d..8e4eaf3a507d 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -129,8 +129,9 @@ struct xenvif_stats {
 #define COPY_BATCH_SIZE 64
 
 struct xenvif_copy_state {
-   struct gnttab_copy op[COPY_BATCH_SIZE];
-   RING_IDX idx[COPY_BATCH_SIZE];
+   struct gnttab_copy *op;
+   RING_IDX *idx;
+   unsigned int batch_size;
unsigned int num;
struct sk_buff_head *completed;
 };
@@ -358,6 +359,7 @@ irqreturn_t xenvif_ctrl_irq_fn(int irq, void *data);
 
 void xenvif_rx_action(struct xenvif_queue *queue);
 void xenvif_rx_queue_tail(struct xenvif_queue *queue, struct sk_buff *skb);
+int xenvif_rx_copy_realloc(struct xenvif_queue *queue, unsigned int size);
 
 void xenvif_carrier_on(struct xenvif *vif);
 
@@ -381,6 +383,7 @@ extern unsigned int rx_drain_timeout_msecs;
 extern unsigned int rx_stall_timeout_msecs;
 extern unsigned int xenvif_max_queues;
 extern unsigned int xenvif_hash_cache_size;
+extern unsigned int xenvif_copy_batch_size;
 
 #ifdef CONFIG_DEBUG_FS
 extern struct dentry *xen_netback_dbg_root;
diff --git a/drivers/net/xen-netback/interface.c 
b/drivers/net/xen-netback/interface.c
index 78ebe494fef0..e12eb64ab0a9 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -518,6 +518,12 @@ int xenvif_init_queue(struct xenvif_queue *queue)
 {
int err, i;
 
+   err = xenvif_rx_copy_realloc(queue, xenvif_copy_batch_size);
+   if (err) {
+   netdev_err(queue->vif->dev, "Could not alloc rx_copy\n");
+   goto err;
+   }
+
queue->credit_bytes = queue->remaining_credit = ~0UL;
queue->credit_usec  = 0UL;
timer_setup(&queue->credit_timeout, xenvif_tx_credit_callback, 0);
@@ -544,7 +550,7 @@ int xenvif_init_queue(struct xenvif_queue *queue)
 queue->mmap_pages);
if (err) {
netdev_err(queue->vif->dev, "Could not reserve mmap_pages\n");
-   return -ENOMEM;
+   goto err;
}
 
for (i = 0; i < MAX_PENDING_REQS; i++) {
@@ -556,6 +562,11 @@ int xenvif_init_queue(struct xenvif_queue *queue)
}
 
return 0;
+
+err:
+   vfree(queue->rx_copy.op);
+   vfree(queue->rx_copy.idx);
+   return -ENOMEM;
 }
 
 void xenvif_carrier_on(struct xenvif *vif)
@@ -788,6 +799,9 @@ void xenvif_disconnect_ctrl(struct xenvif *vif)
  */
 void xenvif_deinit_queue(struct xenvif_queue *queue)
 {
+   vfree(queue->rx_copy.op);
+   vfree(queue->rx_copy.idx);
+   queue->rx_copy.batch_size = 0;
gnttab_free_pages(MAX_PENDING_REQS, queue->mmap_pages);
 }
 
diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index a27daa23c9dc..3a5e1d7ac2f4 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -96,6 +96,11 @@ unsigned int xenvif_hash_cache_size = 
XENVIF_HASH_CACHE_SIZE_DEFAULT;
 module_param_named(hash_cache_size, xenvif_hash_cache_size, uint, 0644);
 MODULE_PARM_DESC(hash_cache_size, "Number of flows in the hash cache");
 
+/* This is the maximum batch of grant copies on Rx */
+unsigned int xenvif_copy_batch_size = COPY_BATCH_SIZE;
+module_param_named(copy_batch_size, xenvif_copy_batch_size, uint, 0644);
+MODULE_PARM_DESC(copy_

Re: Distress Call Please don't ignore

2017-12-21 Thread Sandra Younes
Good Day,

Forgive my indignation if this message comes to you as a surprise and may 
offend your personality for contacting you without your prior consent and 
writing through this channel.

I came across your name and contact on the course of my personal searching when 
i was searching for a foreign reliable partner. I was assured of your 
capability and reliability after going true your profile.

I'm (Miss. Sandra) from Benghazi libya, My father of blessed memory by name 
late General Abdel Fattah Younes who was shot death by Islamist-linked militia 
within the anti-Gaddafi forces on 28th July, 2011 and after two days later my 
mother with my two brothers was killed one early morning by the rebels as 
result of civil war that is going on in my country Libya, then after the burial 
of my parents, my uncles conspired and sold my father's properties and left 
nothing for me. On a faithful morning, I opened my father's briefcase and 
discover a document which he has deposited ($6.250M USD) in a bank in a Turkish 
Bank which has a small branch in Canada with my name as the legitimate/next of 
kin. Meanwhile i have located the bank,and have also discussed the possiblity 
of transfering the fund. My father left a clause to the bank that i must 
introduce a trusted foreign partner who would be my trustee to help me invest 
this fund; hence the need for your assistance,i request that you be my t
rustee and assist me in e

You will also be responsible for the investment and management of the fund for 
me and also you will help me get a good school where i will further my 
education.
I agreed to give you 40% of the $6.250M once the transfer is done. this is my 
true life story, I will be glad to receive your respond soonest for more 
details to enable us start and champion the transfer less than 14 banking days 
as i was informed by the bank manager.

Thanks for giving me your attention,

Yours sincerely,
Miss. Sandra Younes


RE: [PATCH net-next v2] xen-netback: make copy batch size configurable

2017-12-21 Thread Paul Durrant
> -Original Message-
> From: Joao Martins [mailto:joao.m.mart...@oracle.com]
> Sent: 21 December 2017 17:24
> To: netdev@vger.kernel.org
> Cc: Joao Martins ; Wei Liu
> ; Paul Durrant ; xen-
> de...@lists.xenproject.org
> Subject: [PATCH net-next v2] xen-netback: make copy batch size
> configurable
> 
> Commit eb1723a29b9a ("xen-netback: refactor guest rx") refactored Rx
> handling and as a result decreased max grant copy ops from 4352 to 64.
> Before this commit it would drain the rx_queue (while there are
> enough slots in the ring to put packets) then copy to all pages and write
> responses on the ring. With the refactor we do almost the same albeit
> the last two steps are done every COPY_BATCH_SIZE (64) copies.
> 
> For big packets, the value of 64 means copying 3 packets best case scenario
> (17 copies) and worst-case only 1 packet (34 copies, i.e. if all frags
> plus head cross the 4k grant boundary) which could be the case when
> packets go from local backend process.
> 
> Instead of making it static to 64 grant copies, lets allow the user to
> select its value (while keeping the current as default) by introducing
> the `copy_batch_size` module parameter. This allows users to select
> the higher batches (i.e. for better throughput with big packets) as it
> was prior to the above mentioned commit.
> 
> Signed-off-by: Joao Martins 

Reviewed-by: Paul Durrant 

> ---
> Changes since v1:
>  * move rx_copy.{idx,op} reallocation to separate helper
>  Addressed Paul's comments:
>  * rename xenvif_copy_state#size field to batch_size
>  * argument `size` should be unsigned int
>  * vfree is safe with NULL
>  * realloc rx_copy.{idx,op} after copy op flush
> ---
>  drivers/net/xen-netback/common.h|  7 +--
>  drivers/net/xen-netback/interface.c | 16 +++-
>  drivers/net/xen-netback/netback.c   |  5 +
>  drivers/net/xen-netback/rx.c| 35
> ++-
>  4 files changed, 59 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-
> netback/common.h
> index a46a1e94505d..8e4eaf3a507d 100644
> --- a/drivers/net/xen-netback/common.h
> +++ b/drivers/net/xen-netback/common.h
> @@ -129,8 +129,9 @@ struct xenvif_stats {
>  #define COPY_BATCH_SIZE 64
> 
>  struct xenvif_copy_state {
> - struct gnttab_copy op[COPY_BATCH_SIZE];
> - RING_IDX idx[COPY_BATCH_SIZE];
> + struct gnttab_copy *op;
> + RING_IDX *idx;
> + unsigned int batch_size;
>   unsigned int num;
>   struct sk_buff_head *completed;
>  };
> @@ -358,6 +359,7 @@ irqreturn_t xenvif_ctrl_irq_fn(int irq, void *data);
> 
>  void xenvif_rx_action(struct xenvif_queue *queue);
>  void xenvif_rx_queue_tail(struct xenvif_queue *queue, struct sk_buff
> *skb);
> +int xenvif_rx_copy_realloc(struct xenvif_queue *queue, unsigned int size);
> 
>  void xenvif_carrier_on(struct xenvif *vif);
> 
> @@ -381,6 +383,7 @@ extern unsigned int rx_drain_timeout_msecs;
>  extern unsigned int rx_stall_timeout_msecs;
>  extern unsigned int xenvif_max_queues;
>  extern unsigned int xenvif_hash_cache_size;
> +extern unsigned int xenvif_copy_batch_size;
> 
>  #ifdef CONFIG_DEBUG_FS
>  extern struct dentry *xen_netback_dbg_root;
> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-
> netback/interface.c
> index 78ebe494fef0..e12eb64ab0a9 100644
> --- a/drivers/net/xen-netback/interface.c
> +++ b/drivers/net/xen-netback/interface.c
> @@ -518,6 +518,12 @@ int xenvif_init_queue(struct xenvif_queue *queue)
>  {
>   int err, i;
> 
> + err = xenvif_rx_copy_realloc(queue, xenvif_copy_batch_size);
> + if (err) {
> + netdev_err(queue->vif->dev, "Could not alloc rx_copy\n");
> + goto err;
> + }
> +
>   queue->credit_bytes = queue->remaining_credit = ~0UL;
>   queue->credit_usec  = 0UL;
>   timer_setup(&queue->credit_timeout, xenvif_tx_credit_callback, 0);
> @@ -544,7 +550,7 @@ int xenvif_init_queue(struct xenvif_queue *queue)
>queue->mmap_pages);
>   if (err) {
>   netdev_err(queue->vif->dev, "Could not reserve
> mmap_pages\n");
> - return -ENOMEM;
> + goto err;
>   }
> 
>   for (i = 0; i < MAX_PENDING_REQS; i++) {
> @@ -556,6 +562,11 @@ int xenvif_init_queue(struct xenvif_queue *queue)
>   }
> 
>   return 0;
> +
> +err:
> + vfree(queue->rx_copy.op);
> + vfree(queue->rx_copy.idx);
> + return -ENOMEM;
>  }
> 
>  void xenvif_carrier_on(struct xenvif *vif)
> @@ -788,6 +799,9 @@ void xenvif_disconnect_ctrl(struct xenvif *vif)
>   */
>  void xenvif_deinit_queue(struct xenvif_queue *queue)
>  {
> + vfree(queue->rx_copy.op);
> + vfree(queue->rx_copy.idx);
> + queue->rx_copy.batch_size = 0;
>   gnttab_free_pages(MAX_PENDING_REQS, queue->mmap_pages);
>  }
> 
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-
> netback/netback.c
> index a27daa23c9dc..3a5e1d7ac2f4 100644
> --- a/drivers/n

Re: [RFC PATCH net-next] tools/bpftool: use version from the kernel source tree

2017-12-21 Thread Jakub Kicinski
On Thu, 21 Dec 2017 12:07:42 +, Roman Gushchin wrote:
> On Wed, Dec 20, 2017 at 01:52:18PM -0800, Jakub Kicinski wrote:
> > On Wed, 20 Dec 2017 20:53:41 +, Roman Gushchin wrote:  
> > > On Wed, Dec 20, 2017 at 12:29:21PM -0800, Jakub Kicinski wrote:  
> > > Hm, why it's better? It's not only about the kernel version,
> > > IMO it's generally better to use includes from the source tree,
> > > rather then system-wide installed kernel headers.  
> > 
> > Right I agree the kernel headers are preferred.  I'm not entirely sure
> > why we don't use them, if it was OK to assume usr/ is there we wouldn't
> > need the tools/include/uapi/ contraption.  Maybe Arnaldo could explain?
> >   
> > > I've got about out-of-source builds, but do we support it in general?
> > > How can I build bpftool outside of the kernel tree?
> > > I've tried a bit, but failed.  
> > 
> > This is what I do:
> > 
> > make -C tools/bpf/bpftool/ W=1 O=/tmp/builds/bpftool  
> 
> This works perfectly with my patch:
> 
> $ make -C ~/linux/tools/bpf/ W=1 O=/home/guro/build/ --trace
> <...>
> echo '  CC   '/home/guro/build/main.o;gcc -O2 -W -Wall -Wextra 
> -Wno-unused-parameter -Wshadow -D__EXPORTED_HEADERS__ 
> -I/home/guro/linux/tools/include/uapi -I/home/guro/linux/tools/include 
> -I/home/guro/linux/tools/lib/bpf -I/home/guro/linux/kernel/bpf/ 
> -I/home/guro/linux/usr/include -DNEW_DISSASSEMBLER_SIGNATURE   -c -MMD -o 
> /home/guro/build/main.o main.c
> <...>
> echo '  LINK '/home/guro/build/bpftool;gcc -O2 -W -Wall -Wextra 
> -Wno-unused-parameter -Wshadow -D__EXPORTED_HEADERS__ 
> -I/home/guro/linux/tools/include/uapi -I/home/guro/linux/tools/include 
> -I/home/guro/linux/tools/lib/bpf -I/home/guro/linux/kernel/bpf/ 
> -I/home/guro/linux/usr/include -DNEW_DISSASSEMBLER_SIGNATURE -o 
> /home/guro/build/bpftool /home/guro/build/common.o /home/guro/build/cgroup.o 
> /home/guro/build/main.o /home/guro/build/json_writer.o 
> /home/guro/build/prog.o /home/guro/build/map.o /home/guro/build/jit_disasm.o 
> /home/guro/build/disasm.o /home/guro/build/libbpf.a -lelf -lbfd -lopcodes 
> /home/guro/build/libbpf.a
>   LINK /home/guro/build/bpftool
> make[1]: Leaving directory '/home/guro/linux/tools/bpf/bpftool'
> make: Leaving directory '/home/guro/linux/tools/bpf'
> 
> $ ./build/bpftool version
> ./build/bpftool v4.15.0

Argh, sorry for the confusion you need to build the kernel out-of-source
as well.  In my case I build the kernel and bpftool out of source, and
then the usr/ doesn't actually contain the auto-generated headers:

$ ls ~/devel/linux/usr/
gen_init_cpio.c  initramfs_data.S  Kconfig  Makefile

Only build directory does:

$ ls /tmp/builds/usr/
built-in.o  gen_init_cpio  include  initramfs_data.cpio  initramfs_data.o  
modules.builtin  modules.order

Let me reiterate, the user space headers we need should all be already
included in -I$(srctree)/tools/include/uapi, and make kernelversion is
nice because it also adds the -rc tags.


[PATCH v2 net-next] net: dsa: lan9303: lan9303_csr_reg_wait cleanups

2017-12-21 Thread Egil Hjelmeland
Non-functional cleanups in lan9303_csr_reg_wait():
 - Change type of param 'mask' from int to u32.
 - Remove param 'value' (will probably never be used)
 - Reduced retries from 1000 to 25, consistent with lan9303_read_wait.
 - Removed comments

Signed-off-by: Egil Hjelmeland 

Changes v1 -> v2:
 - Removed comments
---
 drivers/net/dsa/lan9303-core.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
index f412aad58253..944901f03f8b 100644
--- a/drivers/net/dsa/lan9303-core.c
+++ b/drivers/net/dsa/lan9303-core.c
@@ -249,7 +249,6 @@ static int lan9303_read(struct regmap *regmap, unsigned int 
offset, u32 *reg)
return -EIO;
 }
 
-/* Wait a while until mask & reg == value. Otherwise return timeout. */
 static int lan9303_read_wait(struct lan9303 *chip, int offset, u32 mask)
 {
int i;
@@ -541,20 +540,19 @@ lan9303_alr_cache_find_mac(struct lan9303 *chip, const u8 
*mac_addr)
return NULL;
 }
 
-/* Wait a while until mask & reg == value. Otherwise return timeout. */
-static int lan9303_csr_reg_wait(struct lan9303 *chip, int regno,
-   int mask, char value)
+static int lan9303_csr_reg_wait(struct lan9303 *chip, int regno, u32 mask)
 {
int i;
 
-   for (i = 0; i < 0x1000; i++) {
+   for (i = 0; i < 25; i++) {
u32 reg;
 
lan9303_read_switch_reg(chip, regno, ®);
-   if ((reg & mask) == value)
+   if (!(reg & mask))
return 0;
usleep_range(1000, 2000);
}
+
return -ETIMEDOUT;
 }
 
@@ -564,8 +562,7 @@ static int lan9303_alr_make_entry_raw(struct lan9303 *chip, 
u32 dat0, u32 dat1)
lan9303_write_switch_reg(chip, LAN9303_SWE_ALR_WR_DAT_1, dat1);
lan9303_write_switch_reg(chip, LAN9303_SWE_ALR_CMD,
 LAN9303_ALR_CMD_MAKE_ENTRY);
-   lan9303_csr_reg_wait(chip, LAN9303_SWE_ALR_CMD_STS, ALR_STS_MAKE_PEND,
-0);
+   lan9303_csr_reg_wait(chip, LAN9303_SWE_ALR_CMD_STS, ALR_STS_MAKE_PEND);
lan9303_write_switch_reg(chip, LAN9303_SWE_ALR_CMD, 0);
 
return 0;
-- 
2.14.1



Re: [PATCH 1/3] net: Fix possible race in peernet2id_alloc()

2017-12-21 Thread Eric W. Biederman
Kirill Tkhai  writes:

> peernet2id_alloc() is racy without rtnl_lock() as atomic_read(&peer->count)
> under net->nsid_lock does not guarantee, peer is alive:
>
> rcu_read_lock()
> peernet2id_alloc()..
>   spin_lock_bh(&net->nsid_lock)   ..
>   atomic_read(&peer->count) == 1  ..
>   ..  put_net()
>   ..cleanup_net()
>   ..  for_each_net(tmp)
>   ..
> spin_lock_bh(&tmp->nsid_lock)
>   ..__peernet2id(tmp, net) == 
> -1
>   ....
>   ....
> __peernet2id_alloc(alloc == true)   ..
>   ....
> rcu_read_unlock()   ..
> ..synchronize_rcu()
> ..kmem_cache_free(net)
>
> After the above situation, net::netns_id contains id pointing to freed memory,
> and any other dereferencing by the id will operate with this freed memory.
>
> Currently, peernet2id_alloc() is used under rtnl_lock() everywhere except
> ovs_vport_cmd_fill_info(), and this race can't occur. But peernet2id_alloc()
> is generic interface, and better we fix it before someone really starts
> use it in wrong context.

So it comes down to this piece of code from ovs and just let me say ick.
if (!net_eq(net, dev_net(vport->dev))) {
int id = peernet2id_alloc(net, dev_net(vport->dev));

if (nla_put_s32(skb, OVS_VPORT_ATTR_NETNSID, id))
goto nla_put_failure;
}

Without the rtnl lock dev_net can cange between the test and the
call of peernet2id_alloc.

At first glance it looks like the bug is that we are running a control
path of the networking stack without the rtnl lock. So it may be that
ASSERT_RTNL() is the better fix.

Given that it would be nice to reduce the scope of the rtnl lock this
might not be a bad direction.  Let me see.

Is rtnl_notify safe without the rtnl lock?


>
> Signed-off-by: Kirill Tkhai 
> ---
>  net/core/net_namespace.c |   23 +++
>  1 file changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> index 60a71be75aea..6a4eab438221 100644
> --- a/net/core/net_namespace.c
> +++ b/net/core/net_namespace.c
> @@ -221,17 +221,32 @@ static void rtnl_net_notifyid(struct net *net, int cmd, 
> int id);
>   */
>  int peernet2id_alloc(struct net *net, struct net *peer)
>  {
> - bool alloc;
> + bool alloc = false, alive = false;
>   int id;

^^^ Perhaps we want "ASSERT_RTNL();" here?
>  
> - if (atomic_read(&net->count) == 0)
> - return NETNSA_NSID_NOT_ASSIGNED;

Moving this hunk is of no benefit.  The code must be called with a valid
reference to net.   Which means net->count is a fancy way of testing to
see if the code is in cleanup_net.  In all other cases net->count should
be non-zero and it should remain that way because of our caller must
keep a reference.

>   spin_lock_bh(&net->nsid_lock);
> - alloc = atomic_read(&peer->count) == 0 ? false : true;
> + /* Spinlock guarantees we never hash a peer to net->netns_ids
> +  * after idr_destroy(&net->netns_ids) occurs in cleanup_net().
> +  */
> + if (atomic_read(&net->count) == 0) {
> + id = NETNSA_NSID_NOT_ASSIGNED;
> + goto unlock;
> + }
> + /*
> +  * When peer is obtained from RCU lists, we may race with
> +  * its cleanup. Check whether it's alive, and this guarantees
> +  * we never hash a peer back to net->netns_ids, after it has
> +  * just been idr_remove()'d from there in cleanup_net().
> +  */
> + if (maybe_get_net(peer))
> + alive = alloc = true;

Yes this does seem reasonable.  The more obvious looking code which
would return NETNSA_NSID_NOT_ASSIGNED if the peer has a count of 0, is
silly as it makes would make it appear that a peer is momentary outside
of a network namespace when the peer is in fact moving from one network
namespace to another.

>   id = __peernet2id_alloc(net, peer, &alloc);
> +unlock:
>   spin_unlock_bh(&net->nsid_lock);
>   if (alloc && id >= 0)
>   rtnl_net_notifyid(net, RTM_NEWNSID, id);
^^
Is this safe without the rtnl lock?
> + if (alive)
> + put_net(peer);
>   return id;
>  }
>  EXPORT_SYMBOL_GPL(peernet2id_alloc);

Eric


Re: [PATCHv4 net-next 00/14] net: sched: sch: introduce extack support

2017-12-21 Thread David Miller
From: Alexander Aring 
Date: Wed, 20 Dec 2017 12:35:10 -0500

> this patch series basically add support for extack in common qdisc handling.
> Additional it adds extack pointer to common qdisc callback handling this
> offers per qdisc implementation to setting the extack message for each
> failure over netlink.

Series applied.


Re: [PATCH net v3] openvswitch: Fix pop_vlan action for double tagged frames

2017-12-21 Thread David Miller
From: Eric Garver 
Date: Wed, 20 Dec 2017 15:09:22 -0500

> skb_vlan_pop() expects skb->protocol to be a valid TPID for double
> tagged frames. So set skb->protocol to the TPID and let skb_vlan_pop()
> shift the true ethertype into position for us.
> 
> Fixes: 5108bbaddc37 ("openvswitch: add processing of L3 packets")
> Signed-off-by: Eric Garver 

Applied and queued up for -stable, thanks.


Re: [PATCH net V3] net: reevalulate autoflowlabel setting after sysctl setting

2017-12-21 Thread David Miller
From: Shaohua Li 
Date: Wed, 20 Dec 2017 12:10:21 -0800

> From: Shaohua Li 
> 
> sysctl.ip6.auto_flowlabels is default 1. In our hosts, we set it to 2.
> If sockopt doesn't set autoflowlabel, outcome packets from the hosts are
> supposed to not include flowlabel. This is true for normal packet, but
> not for reset packet.
> 
> The reason is ipv6_pinfo.autoflowlabel is set in sock creation. Later if
> we change sysctl.ip6.auto_flowlabels, the ipv6_pinfo.autoflowlabel isn't
> changed, so the sock will keep the old behavior in terms of auto
> flowlabel. Reset packet is suffering from this problem, because reset
> packet is sent from a special control socket, which is created at boot
> time. Since sysctl.ipv6.auto_flowlabels is 1 by default, the control
> socket will always have its ipv6_pinfo.autoflowlabel set, even after
> user set sysctl.ipv6.auto_flowlabels to 1, so reset packset will always
> have flowlabel. Normal sock created before sysctl setting suffers from
> the same issue. We can't even turn off autoflowlabel unless we kill all
> socks in the hosts.
> 
> To fix this, if IPV6_AUTOFLOWLABEL sockopt is used, we use the
> autoflowlabel setting from user, otherwise we always call
> ip6_default_np_autolabel() which has the new settings of sysctl.
> 
> Note, this changes behavior a little bit. Before commit 42240901f7c4
> (ipv6: Implement different admin modes for automatic flow labels), the
> autoflowlabel behavior of a sock isn't sticky, eg, if sysctl changes,
> existing connection will change autoflowlabel behavior. After that
> commit, autoflowlabel behavior is sticky in the whole life of the sock.
> With this patch, the behavior isn't sticky again.
> 
> Cc: Martin KaFai Lau 
> Cc: Eric Dumazet 
> Cc: Tom Herbert 
> Signed-off-by: Shaohua Li 

This looks a lot better, applied, thanks.


Re: [PATCH v3 next-queue 00/10] ixgbe: Add ipsec offload

2017-12-21 Thread Shannon Nelson

On 12/20/2017 11:09 PM, Yanjun Zhu wrote:

On 2017/12/21 14:39, Yanjun Zhu wrote:

On 2017/12/20 7:59, Shannon Nelson wrote:

This is an implementation of the ipsec hardware offload feature for
the ixgbe driver and Intel's 10Gbe series NICs: x540, x550, 82599.

Hi, Nelson

I notice that the ipsec feature is based on x540, x550, 82599. But 
this ixgbe driver

will also work with 82598.

Does this ipsec feature also work with 82598?
Sorry. I mean, after these ipsec patches are applied, whether ipsec 
offload enabled or not,

can this ixgbe driver still work well with 82598?


Hmm... I don't have one to test on, but I suspect the 82598 might not be 
happy with this.  I'll send a followup patch to catch this case.


Thanks!
sln




Zhu Yanjun


Thanks a lot.
Zhu Yanjun
These patches apply to net-next v4.14 as well as Jeff Kirsher's 
next-queue

v4.15-rc1-206-ge47375b.

The ixgbe NICs support ipsec offload for 1024 Rx and 1024 Tx Security
Associations (SAs), using up to 128 inbound IP addresses, and using the
rfc4106(gcm(aes)) encryption.  This code does not yet support IPv6,
checksum offload, or TSO in conjunction with the ipsec offload - those
will be added in the future.

This code shows improvements in both packet throughput and CPU 
utilization.

For example, here are some quicky numbers that show the magnitude of the
performance gain on a single run of "iperf -c " with the ipsec
offload on both ends of a point-to-point connection:

9.4 Gbps - normal case
7.6 Gbps - ipsec with offload
343 Mbps - ipsec no offload

To set up a similar test case, you first need to be sure you have a 
recent
version of iproute2 that supports the ipsec offload tag, probably 
something

from ip 4.12 or newer would be best.  I have a shell script that builds
up the appropriate commands for me, but here are the resulting commands
for all tcp traffic between 14.0.0.52 and 14.0.0.70:

For the left side (14.0.0.52):
   ip x p add dir out src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp tmpl \
  proto esp src 14.0.0.52 dst 14.0.0.70 spi 0x07 mode transport 
reqid 0x07

   ip x p add dir in src 14.0.0.70/24 dst 14.0.0.52/24 proto tcp tmpl \
  proto esp dst 14.0.0.52 src 14.0.0.70 spi 0x07 mode transport 
reqid 0x07
   ip x s add proto esp src 14.0.0.52 dst 14.0.0.70 spi 0x07 mode 
transport \

  reqid 0x07 replay-window 32 \
  aead 'rfc4106(gcm(aes))' 
0x44434241343332312423222114131211f4f3f2f1 128 \
  sel src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp offload dev 
eth4 dir out
   ip x s add proto esp dst 14.0.0.52 src 14.0.0.70 spi 0x07 mode 
transport \

  reqid 0x07 replay-window 32 \
  aead 'rfc4106(gcm(aes))' 
0x44434241343332312423222114131211f4f3f2f1 128 \
  sel src 14.0.0.70/24 dst 14.0.0.52/24 proto tcp offload dev 
eth4 dir in

  For the right side (14.0.0.70):
   ip x p add dir out src 14.0.0.70/24 dst 14.0.0.52/24 proto tcp tmpl \
  proto esp src 14.0.0.70 dst 14.0.0.52 spi 0x07 mode transport 
reqid 0x07

   ip x p add dir in src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp tmpl \
  proto esp dst 14.0.0.70 src 14.0.0.52 spi 0x07 mode transport 
reqid 0x07
   ip x s add proto esp src 14.0.0.70 dst 14.0.0.52 spi 0x07 mode 
transport \

  reqid 0x07 replay-window 32 \
  aead 'rfc4106(gcm(aes))' 
0x44434241343332312423222114131211f4f3f2f1 128 \
  sel src 14.0.0.70/24 dst 14.0.0.52/24 proto tcp offload dev 
eth4 dir out
   ip x s add proto esp dst 14.0.0.70 src 14.0.0.52 spi 0x07 mode 
transport \

  reqid 0x07 replay-window 32 \
  aead 'rfc4106(gcm(aes))' 
0x44434241343332312423222114131211f4f3f2f1 128 \
  sel src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp offload dev 
eth4 dir in


In both cases, the command "ip x s flush ; ip x p flush" will clean
it all out and remove the offloads.

Lastly, thanks to Alex Duyck for his early comments.

Please see the individual patches for specific update info.

v3: fixes after comments from those wonderfully pesky kbuild robots
v2: fixes after comments from Alex

Shannon Nelson (10):
   ixgbe: clean up ipsec defines
   ixgbe: add ipsec register access routines
   ixgbe: add ipsec engine start and stop routines
   ixgbe: add ipsec data structures
   ixgbe: add ipsec offload add and remove SA
   ixgbe: restore offloaded SAs after a reset
   ixgbe: process the Rx ipsec offload
   ixgbe: process the Tx ipsec offload
   ixgbe: ipsec offload stats
   ixgbe: register ipsec offload with the xfrm subsystem

  drivers/net/ethernet/intel/ixgbe/Makefile    |   1 +
  drivers/net/ethernet/intel/ixgbe/ixgbe.h |  33 +-
  drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |   2 +
  drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c   | 923 
+++

  drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.h   |  92 +++
  drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c |   4 +-
  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |  39 +-
  drivers/net/ethernet/intel/ixgbe/ixgbe_type.h    |  22 +-
  8 files changed, 1093 insertions(+), 

Re: [PATCH v3 1/3] net: ibm: emac: replace custom rgmii_mode_name with phy_modes

2017-12-21 Thread David Miller
From: Christian Lamparter 
Date: Wed, 20 Dec 2017 23:01:48 +0100

> phy_modes() in the common phy.h already defines the same phy mode
> names in lower case. The deleted rgmii_mode_name() is used only
> in one place and for a "notice-level" printk. Hence, it will not
> be missed.
> 
> Signed-off-by: Christian Lamparter 

Applied to net-next.


Re: [PATCH v3 2/3] net: ibm: emac: replace custom PHY_MODE_* macros

2017-12-21 Thread David Miller
From: Christian Lamparter 
Date: Wed, 20 Dec 2017 23:01:49 +0100

> The ibm_emac driver predates the PHY_INTERFACE_MODE_*
> enums by a few years.
> 
> And while the driver has been retrofitted to use the PHYLIB,
> the old definitions have stuck around to this day.
> 
> This patch replaces all occurences of PHY_MODE_* with
> the respective equivalent PHY_INTERFACE_MODE_* enum.
> And finally, it purges the old macros for good.
> 
> Signed-off-by: Christian Lamparter 

Applied to net-next.


Re: [PATCH v3 3/3] net: ibm: emac: support RGMII-[RX|TX]ID phymode

2017-12-21 Thread David Miller
From: Christian Lamparter 
Date: Wed, 20 Dec 2017 23:01:50 +0100

> The RGMII spec allows compliance for devices that implement an internal
> delay on TXC and/or RXC inside the transmitter. This patch adds the
> necessary RGMII_[RX|TX]ID mode code to handle such PHYs with the
> emac driver.
> 
> Signed-off-by: Christian Lamparter 

Applied to net-next.


[PATCH next-queue] ixgbe: no ipsec offload for 82598

2017-12-21 Thread Shannon Nelson
Don't try to set up ipsec offload on the oldest part of
the ixgbe family.

Suggested-by: Yanjun Zhu 
Signed-off-by: Shannon Nelson 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index 424dbf7..12c7132 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -863,6 +863,9 @@ void ixgbe_init_ipsec_offload(struct ixgbe_adapter *adapter)
struct ixgbe_ipsec *ipsec;
size_t size;
 
+   if (adapter->hw.mac.type == ixgbe_mac_82598EB)
+   return;
+
ipsec = kzalloc(sizeof(*ipsec), GFP_KERNEL);
if (!ipsec)
goto err1;
-- 
2.7.4



[PATCH net-next] tcp: md5: Handle RCU dereference of md5sig_info

2017-12-21 Thread Mat Martineau
Dereference tp->md5sig_info in tcp_v4_destroy_sock() the same way it is
done in the adjacent call to tcp_clear_md5_list().

Resolves this sparse warning:

net/ipv4/tcp_ipv4.c:1914:17: warning: incorrect type in argument 1 (different 
address spaces)
net/ipv4/tcp_ipv4.c:1914:17:expected struct callback_head *head
net/ipv4/tcp_ipv4.c:1914:17:got struct callback_head [noderef] 
*

Signed-off-by: Mat Martineau 
---
 net/ipv4/tcp_ipv4.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index dd945b114215..5d203248123e 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1911,7 +1911,7 @@ void tcp_v4_destroy_sock(struct sock *sk)
/* Clean up the MD5 key list, if any */
if (tp->md5sig_info) {
tcp_clear_md5_list(sk);
-   kfree_rcu(tp->md5sig_info, rcu);
+   kfree_rcu(rcu_dereference_protected(tp->md5sig_info, 1), rcu);
tp->md5sig_info = NULL;
}
 #endif
-- 
2.15.1



[PATCH net] tcp: Avoid preprocessor directives in tracepoint macro args

2017-12-21 Thread Mat Martineau
Using a preprocessor directive to check for CONFIG_IPV6 in the middle of
a DECLARE_EVENT_CLASS macro's arg list causes sparse to report a series
of errors:

./include/trace/events/tcp.h:68:1: error: directive in argument list
./include/trace/events/tcp.h:75:1: error: directive in argument list
./include/trace/events/tcp.h:144:1: error: directive in argument list
./include/trace/events/tcp.h:151:1: error: directive in argument list
./include/trace/events/tcp.h:216:1: error: directive in argument list
./include/trace/events/tcp.h:223:1: error: directive in argument list
./include/trace/events/tcp.h:274:1: error: directive in argument list
./include/trace/events/tcp.h:281:1: error: directive in argument list

Once sparse finds an error, it stops printing warnings for the file it
is checking. This masks any sparse warnings that would normally be
reported for the core TCP code.

Instead, handle the preprocessor conditionals in a couple of auxiliary
macros. This also has the benefit of reducing duplicate code.

Cc: David Ahern 
Signed-off-by: Mat Martineau 
---
 include/trace/events/tcp.h | 97 ++
 1 file changed, 37 insertions(+), 60 deletions(-)

diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
index 07a6cbf1..ab34c561f26b 100644
--- a/include/trace/events/tcp.h
+++ b/include/trace/events/tcp.h
@@ -25,6 +25,35 @@
tcp_state_name(TCP_CLOSING),\
tcp_state_name(TCP_NEW_SYN_RECV))
 
+#define TP_STORE_V4MAPPED(__entry, saddr, daddr)   \
+   do {\
+   struct in6_addr *pin6;  \
+   \
+   pin6 = (struct in6_addr *)__entry->saddr_v6;\
+   ipv6_addr_set_v4mapped(saddr, pin6);\
+   pin6 = (struct in6_addr *)__entry->daddr_v6;\
+   ipv6_addr_set_v4mapped(daddr, pin6);\
+   } while (0)
+
+#if IS_ENABLED(CONFIG_IPV6)
+#define TP_STORE_ADDRS(__entry, saddr, daddr, saddr6, daddr6)  \
+   do {\
+   if (sk->sk_family == AF_INET6) {\
+   struct in6_addr *pin6;  \
+   \
+   pin6 = (struct in6_addr *)__entry->saddr_v6;\
+   *pin6 = saddr6; \
+   pin6 = (struct in6_addr *)__entry->daddr_v6;\
+   *pin6 = daddr6; \
+   } else {\
+   TP_STORE_V4MAPPED(__entry, saddr, daddr);   \
+   }   \
+   } while (0)
+#else
+#define TP_STORE_ADDRS(__entry, saddr, daddr, saddr6, daddr6)  \
+   TP_STORE_V4MAPPED(__entry, saddr, daddr)
+#endif
+
 /*
  * tcp event with arguments sk and skb
  *
@@ -50,7 +79,6 @@ DECLARE_EVENT_CLASS(tcp_event_sk_skb,
 
TP_fast_assign(
struct inet_sock *inet = inet_sk(sk);
-   struct in6_addr *pin6;
__be32 *p32;
 
__entry->skbaddr = skb;
@@ -65,20 +93,8 @@ DECLARE_EVENT_CLASS(tcp_event_sk_skb,
p32 = (__be32 *) __entry->daddr;
*p32 =  inet->inet_daddr;
 
-#if IS_ENABLED(CONFIG_IPV6)
-   if (sk->sk_family == AF_INET6) {
-   pin6 = (struct in6_addr *)__entry->saddr_v6;
-   *pin6 = sk->sk_v6_rcv_saddr;
-   pin6 = (struct in6_addr *)__entry->daddr_v6;
-   *pin6 = sk->sk_v6_daddr;
-   } else
-#endif
-   {
-   pin6 = (struct in6_addr *)__entry->saddr_v6;
-   ipv6_addr_set_v4mapped(inet->inet_saddr, pin6);
-   pin6 = (struct in6_addr *)__entry->daddr_v6;
-   ipv6_addr_set_v4mapped(inet->inet_daddr, pin6);
-   }
+   TP_STORE_ADDRS(__entry, inet->inet_saddr, inet->inet_daddr,
+ sk->sk_v6_rcv_saddr, sk->sk_v6_daddr);
),
 
TP_printk("sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c 
daddrv6=%pI6c",
@@ -127,7 +143,6 @@ DECLARE_EVENT_CLASS(tcp_event_sk,
 
TP_fast_assign(
struct inet_sock *inet = inet_sk(sk);
-   struct in6_addr *pin6;
__be32 *p32;
 
__entry->skaddr = sk;
@@ -141,20 +156,8 @@ DECLARE_EVENT_CLASS(tcp_event_sk,
p32 = (__be32 *) __entry->daddr;
*p32 =  inet->inet_daddr;
 
-#if IS_ENABLED(CONFIG_IPV6)
-   if (sk->sk_family == AF_INET6) {
-   pin6 = (struct in6_

Re: [Patch net] net_sched: fix a missing rcu barrier in mini_qdisc_pair_swap()

2017-12-21 Thread Cong Wang
On Thu, Dec 21, 2017 at 1:03 AM, Jiri Pirko  wrote:
> Thu, Dec 21, 2017 at 08:26:24AM CET, xiyou.wangc...@gmail.com wrote:
>>The rcu_barrier_bh() in mini_qdisc_pair_swap() is to wait for
>>flying RCU callback installed by a previous mini_qdisc_pair_swap(),
>>however we miss it on the tp_head==NULL path, which leads to that
>>the RCU callback still uses miniq_old->rcu after it is freed together
>>with qdisc in qdisc_graft(). So just add it on that path too.
>>
>>Fixes: 46209401f8f6 ("net: core: introduce mini_Qdisc and eliminate usage of 
>>tp->q for clsact fastpath ")
>
> This fixes:
> 752fbcc33405 ("net_sched: no need to free qdisc in RCU callback")
>
> Before that, the issue was not there as the qdisc struct got removed
> after a grace period.


This is non-sense. You have to read the stack trace from Jakub again
and tell me why you keep believing any RCU reader involved.

I am pretty sure no one reported any crash between commit
752fbcc33405 and 46209401f8f6.


>
>
>>Reported-by: Jakub Kicinski 
>>Tested-by: Jakub Kicinski 
>>Cc: Jiri Pirko 
>>Cc: John Fastabend 
>>Signed-off-by: Cong Wang 
>>---
>> net/sched/sch_generic.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>>diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
>>index cd1b200acae7..661c7144b53a 100644
>>--- a/net/sched/sch_generic.c
>>+++ b/net/sched/sch_generic.c
>>@@ -1040,6 +1040,8 @@ void mini_qdisc_pair_swap(struct mini_Qdisc_pair 
>>*miniqp,
>>
>>   if (!tp_head) {
>>   RCU_INIT_POINTER(*miniqp->p_miniq, NULL);
>>+  /* Wait for flying RCU callback before it is freed. */
>>+  rcu_barrier_bh();
>
>
>>   return;
>>   }
>>
>>@@ -1055,7 +1057,7 @@ void mini_qdisc_pair_swap(struct mini_Qdisc_pair 
>>*miniqp,
>>   rcu_assign_pointer(*miniqp->p_miniq, miniq);
>>
>>   if (miniq_old)
>>-  /* This is counterpart of the rcu barrier above. We need to
>>+  /* This is counterpart of the rcu barriers above. We need to
>
> This is incorrect. Here we block in order to not use the same miniq
> again in scenario
>
> miniq1 (X)
> miniq2
> miniq1 (yet there are reader using X)
>
> This call_rcu has 0 relation to the barrier you are adding.


Seriously? It is this call_rcu still flying after we free the qdisc.
Did you seriously look into the stack trace from Jakub?


>
>
> But again, we don't we just free qdisc in call_rcu and avoid the
> barrier?


Non-sense again. Why qdisc code should be adjusted for your
miniq code? It is your own responsibility to take care of this shit.
Don't spread it out of minq.


Re: [net-next: PATCH 0/8] Armada 7k/8k PP2 ACPI support

2017-12-21 Thread Antoine Tenart
Hi Marcin,

On Mon, Dec 18, 2017 at 10:17:56AM +0100, Marcin Wojtas wrote:
> 
> Marcin Wojtas (8):
>   device property: Introduce fwnode_get_mac_address()
>   device property: Introduce fwnode_get_phy_mode()
>   mdio_bus: Introduce fwnode MDIO helpers
>   net: mvmdio: add ACPI support
>   net: mvpp2: simplify maintaining enabled ports' list
>   net: mvpp2: use device_*/fwnode_* APIs instead of of_*
>   net: mvpp2: handle PHY with its fwnode
>   net: mvpp2: enable ACPI support in the driver


I tested your series on a mcbin, using the dt way. It still worked. If
it is relevant, you can add on the mvpp2 related patches:

Tested-by: Antoine Tenart 

Thanks!

Antoine

-- 
Antoine Ténart, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


[PATCH v5 net-next 0/7] net: ILA notification mechanism and fixes

2017-12-21 Thread Tom Herbert
This patch set adds support to get netlink notifications for ILA 
routes when a route is used.

This patch set contains:

- General infrastructure for route notifications
- The ILA route notification mechanism
- Add net to ila build_state
- Add flush command to ila_xlat
- Fix use of rhashtable for latest fixes

Route notifications will be used in conjunction with populating
ILA forwarding caches. There are three methods described in
the ILA Mapping Protocol. These are redirects, request/reply,
and push. The ILA route mechanism is relevant to the first two methods.

  - ILA router secure redirect mechanism-- This is used on an ILA
router where a notification is sent when an ILA host route is
used. The purpose of this notification is to send an
ILA redirect towards the ILA forwarding node of a source to
inform it of a direct ILA route. When the forwarding node
receives the redirect it can populate its cache so that
subsequent packets take the direct path. This is the
RECOMMENDED method.

  - Cache address resolution-- This used to perform request/reply
address resolution on a route. As noted on netdev list, a
request/reply mechanism is susceptible to DOS attacks.
For this reason, this method is not NOT RECOMMENDED as the
primary means to populate an ILA cache.

ILAMP is described in 
https://www.ietf.org/internet-drafts/draft-herbert-ila-ilamp-00.txt

Tested:

Ran ILA traffic, set up ILA notify routes and observed correct
routing message via ip monitor.

v5:
 - Fix some compiler and sparse warnings
 - Generalize route notify with RTM_NOTIFYROUTE,
   RTNLGRP_ROUTE_NOTIFY (suggested by Roopa)

v4:
 - Remove front end cache per davem feedback
 - Eliminate separate LWT type just use ILA LWT already in place

v3:
 - Removed rhashtable changes to their own patch set
 - Restructure ILA code to be more amenable to changes
 - Remove extra call back functions in resolution interface

Changes from initial RFC:

 - Added net argument to LWT build_state
 - Made resolve timeout an attribute of the LWT encap route
 - Changed ILA notifications to be regular routing messages of event
   RTM_ADDR_RESOLVE, family RTNL_FAMILY_ILA, and group
   RTNLGRP_ILA_NOTIFY

Tom Herbert (7):
  lwt: Add net to build_state argument
  rtnetlink: Add notify route message types
  ila: Fix use of rhashtable walk in ila_xlat.c
  ila: Call library function alloc_bucket_locks
  ila: Create main ila source file
  ila: Flush netlink command to clear xlat table
  ila: Route notify

 include/net/lwtunnel.h |   6 +-
 include/uapi/linux/ila.h   |   3 +
 include/uapi/linux/rtnetlink.h |   6 +
 net/core/lwt_bpf.c |   2 +-
 net/core/lwtunnel.c|   4 +-
 net/ipv4/fib_semantics.c   |  13 +-
 net/ipv4/ip_tunnel_core.c  |   4 +-
 net/ipv6/ila/Makefile  |   2 +-
 net/ipv6/ila/ila.h |  27 +++-
 net/ipv6/ila/ila_common.c  |  30 -
 net/ipv6/ila/ila_lwt.c | 275 ++
 net/ipv6/ila/ila_main.c| 121 +
 net/ipv6/ila/ila_xlat.c| 290 -
 net/ipv6/route.c   |   2 +-
 net/ipv6/seg6_iptunnel.c   |   2 +-
 net/ipv6/seg6_local.c  |   5 +-
 net/mpls/mpls_iptunnel.c   |   2 +-
 17 files changed, 511 insertions(+), 283 deletions(-)
 create mode 100644 net/ipv6/ila/ila_main.c

-- 
2.11.0



[PATCH v5 net-next 1/7] lwt: Add net to build_state argument

2017-12-21 Thread Tom Herbert
Users of LWT need to know net if they want to have per net operations
in LWT.

Acked-by: Roopa Prabhu 
Signed-off-by: Tom Herbert 
---
 include/net/lwtunnel.h|  6 +++---
 net/core/lwt_bpf.c|  2 +-
 net/core/lwtunnel.c   |  4 ++--
 net/ipv4/fib_semantics.c  | 13 -
 net/ipv4/ip_tunnel_core.c |  4 ++--
 net/ipv6/ila/ila_lwt.c|  2 +-
 net/ipv6/route.c  |  2 +-
 net/ipv6/seg6_iptunnel.c  |  2 +-
 net/ipv6/seg6_local.c |  5 +++--
 net/mpls/mpls_iptunnel.c  |  2 +-
 10 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h
index d747ef975cd8..da5e51e0d122 100644
--- a/include/net/lwtunnel.h
+++ b/include/net/lwtunnel.h
@@ -34,7 +34,7 @@ struct lwtunnel_state {
 };
 
 struct lwtunnel_encap_ops {
-   int (*build_state)(struct nlattr *encap,
+   int (*build_state)(struct net *net, struct nlattr *encap,
   unsigned int family, const void *cfg,
   struct lwtunnel_state **ts,
   struct netlink_ext_ack *extack);
@@ -113,7 +113,7 @@ int lwtunnel_valid_encap_type(u16 encap_type,
  struct netlink_ext_ack *extack);
 int lwtunnel_valid_encap_type_attr(struct nlattr *attr, int len,
   struct netlink_ext_ack *extack);
-int lwtunnel_build_state(u16 encap_type,
+int lwtunnel_build_state(struct net *net, u16 encap_type,
 struct nlattr *encap,
 unsigned int family, const void *cfg,
 struct lwtunnel_state **lws,
@@ -192,7 +192,7 @@ static inline int lwtunnel_valid_encap_type_attr(struct 
nlattr *attr, int len,
return 0;
 }
 
-static inline int lwtunnel_build_state(u16 encap_type,
+static inline int lwtunnel_build_state(struct net *net, u16 encap_type,
   struct nlattr *encap,
   unsigned int family, const void *cfg,
   struct lwtunnel_state **lws,
diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
index e7e626fb87bb..3a3ac13fcf06 100644
--- a/net/core/lwt_bpf.c
+++ b/net/core/lwt_bpf.c
@@ -238,7 +238,7 @@ static const struct nla_policy bpf_nl_policy[LWT_BPF_MAX + 
1] = {
[LWT_BPF_XMIT_HEADROOM] = { .type = NLA_U32 },
 };
 
-static int bpf_build_state(struct nlattr *nla,
+static int bpf_build_state(struct net *net, struct nlattr *nla,
   unsigned int family, const void *cfg,
   struct lwtunnel_state **ts,
   struct netlink_ext_ack *extack)
diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c
index 0b171756453c..b3f2f77dfe72 100644
--- a/net/core/lwtunnel.c
+++ b/net/core/lwtunnel.c
@@ -103,7 +103,7 @@ int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops 
*ops,
 }
 EXPORT_SYMBOL_GPL(lwtunnel_encap_del_ops);
 
-int lwtunnel_build_state(u16 encap_type,
+int lwtunnel_build_state(struct net *net, u16 encap_type,
 struct nlattr *encap, unsigned int family,
 const void *cfg, struct lwtunnel_state **lws,
 struct netlink_ext_ack *extack)
@@ -124,7 +124,7 @@ int lwtunnel_build_state(u16 encap_type,
ops = rcu_dereference(lwtun_encaps[encap_type]);
if (likely(ops && ops->build_state && try_module_get(ops->owner))) {
found = true;
-   ret = ops->build_state(encap, family, cfg, lws, extack);
+   ret = ops->build_state(net, encap, family, cfg, lws, extack);
if (ret)
module_put(ops->owner);
}
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index f04d944f8abe..4979e5c6b9b8 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -523,6 +523,7 @@ static int fib_get_nhs(struct fib_info *fi, struct 
rtnexthop *rtnh,
if (nla) {
struct lwtunnel_state *lwtstate;
struct nlattr *nla_entype;
+   struct net *net = cfg->fc_nlinfo.nl_net;
 
nla_entype = nla_find(attrs, attrlen,
  RTA_ENCAP_TYPE);
@@ -533,7 +534,7 @@ static int fib_get_nhs(struct fib_info *fi, struct 
rtnexthop *rtnh,
goto err_inval;
}
 
-   ret = lwtunnel_build_state(nla_get_u16(
+   ret = lwtunnel_build_state(net, nla_get_u16(
   nla_entype),
   nla,  AF_INET, cfg,
   &lwtstate, extack);
@@ -607,7 +608,7 @@ static void fib_rebalance(struct fib_info *fi)
 
 #endif /* CONFIG_IP_ROUTE_MULTIPATH */

[PATCH v5 net-next 3/7] ila: Fix use of rhashtable walk in ila_xlat.c

2017-12-21 Thread Tom Herbert
Perform better EAGAIN handling, handle case where ila_dump_info
fails and we missed objects in the dump, and add a skip index
to skip over ila entires in a list on a rhashtable node that have
already been visited (by a previous call to ila_nl_dump).

Signed-off-by: Tom Herbert 
---
 net/ipv6/ila/ila_xlat.c | 70 ++---
 1 file changed, 54 insertions(+), 16 deletions(-)

diff --git a/net/ipv6/ila/ila_xlat.c b/net/ipv6/ila/ila_xlat.c
index 44c39c5f0638..887dd5b785b5 100644
--- a/net/ipv6/ila/ila_xlat.c
+++ b/net/ipv6/ila/ila_xlat.c
@@ -474,24 +474,31 @@ static int ila_nl_cmd_get_mapping(struct sk_buff *skb, 
struct genl_info *info)
 
 struct ila_dump_iter {
struct rhashtable_iter rhiter;
+   int skip;
 };
 
 static int ila_nl_dump_start(struct netlink_callback *cb)
 {
struct net *net = sock_net(cb->skb->sk);
struct ila_net *ilan = net_generic(net, ila_net_id);
-   struct ila_dump_iter *iter = (struct ila_dump_iter *)cb->args[0];
+   struct ila_dump_iter *iter;
+   int ret;
 
-   if (!iter) {
-   iter = kmalloc(sizeof(*iter), GFP_KERNEL);
-   if (!iter)
-   return -ENOMEM;
+   iter = kmalloc(sizeof(*iter), GFP_KERNEL);
+   if (!iter)
+   return -ENOMEM;
 
-   cb->args[0] = (long)iter;
+   ret = rhashtable_walk_init(&ilan->rhash_table, &iter->rhiter,
+  GFP_KERNEL);
+   if (ret) {
+   kfree(iter);
+   return ret;
}
 
-   return rhashtable_walk_init(&ilan->rhash_table, &iter->rhiter,
-   GFP_KERNEL);
+   iter->skip = 0;
+   cb->args[0] = (long)iter;
+
+   return ret;
 }
 
 static int ila_nl_dump_done(struct netlink_callback *cb)
@@ -509,20 +516,45 @@ static int ila_nl_dump(struct sk_buff *skb, struct 
netlink_callback *cb)
 {
struct ila_dump_iter *iter = (struct ila_dump_iter *)cb->args[0];
struct rhashtable_iter *rhiter = &iter->rhiter;
+   int skip = iter->skip;
struct ila_map *ila;
int ret;
 
rhashtable_walk_start(rhiter);
 
-   for (;;) {
-   ila = rhashtable_walk_next(rhiter);
+   /* Get first entry */
+   ila = rhashtable_walk_peek(rhiter);
+
+   if (ila && !IS_ERR(ila) && skip) {
+   /* Skip over visited entries */
+
+   while (ila && skip) {
+   /* Skip over any ila entries in this list that we
+* have already dumped.
+*/
+   ila = rcu_access_pointer(ila->next);
+   skip--;
+   }
+   }
 
+   skip = 0;
+
+   for (;;) {
if (IS_ERR(ila)) {
-   if (PTR_ERR(ila) == -EAGAIN)
-   continue;
ret = PTR_ERR(ila);
-   goto done;
+   if (ret == -EAGAIN) {
+   /* Table has changed and iter has reset. Return
+* -EAGAIN to the application even if we have
+* written data to the skb. The application
+* needs to deal with this.
+*/
+
+   goto out_ret;
+   } else {
+   break;
+   }
} else if (!ila) {
+   ret = 0;
break;
}
 
@@ -531,15 +563,21 @@ static int ila_nl_dump(struct sk_buff *skb, struct 
netlink_callback *cb)
 cb->nlh->nlmsg_seq, NLM_F_MULTI,
 skb, ILA_CMD_GET);
if (ret)
-   goto done;
+   goto out;
 
+   skip++;
ila = rcu_access_pointer(ila->next);
}
+
+   skip = 0;
+   ila = rhashtable_walk_next(rhiter);
}
 
-   ret = skb->len;
+out:
+   iter->skip = skip;
+   ret = (skb->len ? : ret);
 
-done:
+out_ret:
rhashtable_walk_stop(rhiter);
return ret;
 }
-- 
2.11.0



[PATCH v5 net-next 2/7] rtnetlink: Add notify route message types

2017-12-21 Thread Tom Herbert
Add notify route message and notify rtnl group. This is used to send
a notification about a route. For example, this will be used with ILA
to notify a daemon to send an ILA redirect.

Signed-off-by: Tom Herbert 
---
 include/uapi/linux/rtnetlink.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 843e29aa3cac..ee955c7ca48a 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -150,6 +150,9 @@ enum {
RTM_NEWCACHEREPORT = 96,
 #define RTM_NEWCACHEREPORT RTM_NEWCACHEREPORT
 
+   RTM_NOTIFYROUTE = 98,
+#define RTM_NOTIFYROUTE RTM_NOTIFYROUTE
+
__RTM_MAX,
 #define RTM_MAX(((__RTM_MAX + 3) & ~3) - 1)
 };
@@ -677,6 +680,8 @@ enum rtnetlink_groups {
 #define RTNLGRP_IPV4_MROUTE_R  RTNLGRP_IPV4_MROUTE_R
RTNLGRP_IPV6_MROUTE_R,
 #define RTNLGRP_IPV6_MROUTE_R  RTNLGRP_IPV6_MROUTE_R
+   RTNLGRP_ROUTE_NOTIFY,
+#define RTNLGRP_ROUTE_NOTIFY   RTNLGRP_ROUTE_NOTIFY
__RTNLGRP_MAX
 };
 #define RTNLGRP_MAX(__RTNLGRP_MAX - 1)
-- 
2.11.0



[PATCH v5 net-next 6/7] ila: Flush netlink command to clear xlat table

2017-12-21 Thread Tom Herbert
Add ILA_CMD_FLUSH netlink command to clear the ILA translation table.

Signed-off-by: Tom Herbert 
---
 include/uapi/linux/ila.h |  1 +
 net/ipv6/ila/ila.h   |  1 +
 net/ipv6/ila/ila_main.c  |  6 +
 net/ipv6/ila/ila_xlat.c  | 62 ++--
 4 files changed, 68 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/ila.h b/include/uapi/linux/ila.h
index 483b77af4eb8..db45d3e49a12 100644
--- a/include/uapi/linux/ila.h
+++ b/include/uapi/linux/ila.h
@@ -30,6 +30,7 @@ enum {
ILA_CMD_ADD,
ILA_CMD_DEL,
ILA_CMD_GET,
+   ILA_CMD_FLUSH,
 
__ILA_CMD_MAX,
 };
diff --git a/net/ipv6/ila/ila.h b/net/ipv6/ila/ila.h
index faba7824ea56..1f747bcbec29 100644
--- a/net/ipv6/ila/ila.h
+++ b/net/ipv6/ila/ila.h
@@ -123,6 +123,7 @@ void ila_xlat_exit_net(struct net *net);
 int ila_xlat_nl_cmd_add_mapping(struct sk_buff *skb, struct genl_info *info);
 int ila_xlat_nl_cmd_del_mapping(struct sk_buff *skb, struct genl_info *info);
 int ila_xlat_nl_cmd_get_mapping(struct sk_buff *skb, struct genl_info *info);
+int ila_xlat_nl_cmd_flush(struct sk_buff *skb, struct genl_info *info);
 int ila_xlat_nl_dump_start(struct netlink_callback *cb);
 int ila_xlat_nl_dump_done(struct netlink_callback *cb);
 int ila_xlat_nl_dump(struct sk_buff *skb, struct netlink_callback *cb);
diff --git a/net/ipv6/ila/ila_main.c b/net/ipv6/ila/ila_main.c
index f6ac6b14577e..18fac76b9520 100644
--- a/net/ipv6/ila/ila_main.c
+++ b/net/ipv6/ila/ila_main.c
@@ -27,6 +27,12 @@ static const struct genl_ops ila_nl_ops[] = {
.flags = GENL_ADMIN_PERM,
},
{
+   .cmd = ILA_CMD_FLUSH,
+   .doit = ila_xlat_nl_cmd_flush,
+   .policy = ila_nl_policy,
+   .flags = GENL_ADMIN_PERM,
+   },
+   {
.cmd = ILA_CMD_GET,
.doit = ila_xlat_nl_cmd_get_mapping,
.start = ila_xlat_nl_dump_start,
diff --git a/net/ipv6/ila/ila_xlat.c b/net/ipv6/ila/ila_xlat.c
index d05de891dfb6..51a15ce50a64 100644
--- a/net/ipv6/ila/ila_xlat.c
+++ b/net/ipv6/ila/ila_xlat.c
@@ -164,9 +164,9 @@ static inline void ila_release(struct ila_map *ila)
kfree_rcu(ila, rcu);
 }
 
-static void ila_free_cb(void *ptr, void *arg)
+static void ila_free_node(struct ila_map *ila)
 {
-   struct ila_map *ila = (struct ila_map *)ptr, *next;
+   struct ila_map *next;
 
/* Assume rcu_readlock held */
while (ila) {
@@ -176,6 +176,11 @@ static void ila_free_cb(void *ptr, void *arg)
}
 }
 
+static void ila_free_cb(void *ptr, void *arg)
+{
+   ila_free_node((struct ila_map *)ptr);
+}
+
 static int ila_xlat_addr(struct sk_buff *skb, bool sir2ila);
 
 static unsigned int
@@ -365,6 +370,59 @@ int ila_xlat_nl_cmd_del_mapping(struct sk_buff *skb, 
struct genl_info *info)
return 0;
 }
 
+static inline spinlock_t *lock_from_ila_map(struct ila_net *ilan,
+   struct ila_map *ila)
+{
+   return ila_get_lock(ilan, ila->xp.ip.locator_match);
+}
+
+int ila_xlat_nl_cmd_flush(struct sk_buff *skb, struct genl_info *info)
+{
+   struct net *net = genl_info_net(info);
+   struct ila_net *ilan = net_generic(net, ila_net_id);
+   struct rhashtable_iter iter;
+   struct ila_map *ila;
+   spinlock_t *lock;
+   int ret;
+
+   ret = rhashtable_walk_init(&ilan->xlat.rhash_table, &iter, GFP_KERNEL);
+   if (ret)
+   goto done;
+
+   rhashtable_walk_start(&iter);
+
+   for (;;) {
+   ila = rhashtable_walk_next(&iter);
+
+   if (IS_ERR(ila)) {
+   if (PTR_ERR(ila) == -EAGAIN)
+   continue;
+   ret = PTR_ERR(ila);
+   goto done;
+   } else if (!ila) {
+   break;
+   }
+
+   lock = lock_from_ila_map(ilan, ila);
+
+   spin_lock(lock);
+
+   ret = rhashtable_remove_fast(&ilan->xlat.rhash_table,
+&ila->node, rht_params);
+   if (!ret)
+   ila_free_node(ila);
+
+   spin_unlock(lock);
+
+   if (ret)
+   break;
+   }
+
+done:
+   rhashtable_walk_stop(&iter);
+   return ret;
+}
+
 static int ila_fill_info(struct ila_map *ila, struct sk_buff *msg)
 {
if (nla_put_u64_64bit(msg, ILA_ATTR_LOCATOR,
-- 
2.11.0



[PATCH v5 net-next 7/7] ila: Route notify

2017-12-21 Thread Tom Herbert
Implement RTM notifications for ILA routers. This adds support to
ILA LWT to send a netlink RTM message when a router is uses.

The ILA notify mechanism can be used in two contexts:

  - On an ILA forwarding cache a route prefix can be configured to
do an ILA notification. This method is used when address
resolution needs to be done on an address.
  - One an ILA router an ILA host route entry may include a
noitification. The purpose of this is to get a notification
to a userspace daemon to send and ILA redirect

This patch also adds a routing protocol number for ILA.

Signed-off-by: Tom Herbert 
---
 include/uapi/linux/ila.h   |   2 +
 include/uapi/linux/rtnetlink.h |   1 +
 net/ipv6/ila/ila_lwt.c | 273 -
 3 files changed, 191 insertions(+), 85 deletions(-)

diff --git a/include/uapi/linux/ila.h b/include/uapi/linux/ila.h
index db45d3e49a12..5675f3e71fac 100644
--- a/include/uapi/linux/ila.h
+++ b/include/uapi/linux/ila.h
@@ -19,6 +19,8 @@ enum {
ILA_ATTR_CSUM_MODE, /* u8 */
ILA_ATTR_IDENT_TYPE,/* u8 */
ILA_ATTR_HOOK_TYPE, /* u8 */
+   ILA_ATTR_NOTIFY_DST,/* flag */
+   ILA_ATTR_NOTIFY_SRC,/* flag */
 
__ILA_ATTR_MAX,
 };
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index ee955c7ca48a..5da035cce640 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -256,6 +256,7 @@ enum {
 #define RTPROT_NTK 15  /* Netsukuku */
 #define RTPROT_DHCP16  /* DHCP client */
 #define RTPROT_MROUTED 17  /* Multicast daemon */
+#define RTPROT_ILA 18  /* ILA route */
 #define RTPROT_BABEL   42  /* Babel daemon */
 
 /* rtm_scope
diff --git a/net/ipv6/ila/ila_lwt.c b/net/ipv6/ila/ila_lwt.c
index 9f1e46a1468e..d0ddd6f2714f 100644
--- a/net/ipv6/ila/ila_lwt.c
+++ b/net/ipv6/ila/ila_lwt.c
@@ -19,10 +19,15 @@
 struct ila_lwt {
struct ila_params p;
struct dst_cache dst_cache;
+   u8 hook_type;
u32 connected : 1;
-   u32 lwt_output : 1;
+   u32 xlat : 1;
+   u32 notify : 2;
 };
 
+#define ILA_NOTIFY_DST 1
+#define ILA_NOTIFY_SRC 2
+
 static inline struct ila_lwt *ila_lwt_lwtunnel(
struct lwtunnel_state *lwt)
 {
@@ -35,6 +40,69 @@ static inline struct ila_params *ila_params_lwtunnel(
return &ila_lwt_lwtunnel(lwt)->p;
 }
 
+static size_t ila_rslv_msgsize(void)
+{
+   size_t len =
+   NLMSG_ALIGN(sizeof(struct rtmsg))
+   + nla_total_size(16) /* RTA_DST */
+   + nla_total_size(16) /* RTA_SRC */
+   ;
+
+   return len;
+}
+
+static void ila_notify(struct net *net, struct sk_buff *skb,
+  struct ila_lwt *lwt, struct rt6_info *rt)
+{
+   struct ipv6hdr *ip6h = ipv6_hdr(skb);
+   int flags = NLM_F_MULTI;
+   struct sk_buff *nlskb;
+   struct nlmsghdr *nlh;
+   struct rtmsg *rtm;
+   int err = 0;
+
+   /* Send ILA notification to user */
+   nlskb = nlmsg_new(ila_rslv_msgsize(), GFP_KERNEL);
+   if (!nlskb)
+   return;
+
+   nlh = nlmsg_put(nlskb, 0, 0, RTM_NOTIFYROUTE, sizeof(*rtm), flags);
+   if (!nlh) {
+   err = -EMSGSIZE;
+   goto errout;
+   }
+
+   rtm = nlmsg_data(nlh);
+   rtm->rtm_family   = AF_INET6;
+   rtm->rtm_dst_len  = 128;
+   rtm->rtm_src_len  = 0;
+   rtm->rtm_tos  = 0;
+   rtm->rtm_table= RT6_TABLE_UNSPEC;
+   rtm->rtm_type = RTN_UNICAST;
+   rtm->rtm_scope= RT_SCOPE_UNIVERSE;
+   rtm->rtm_protocol = rt->rt6i_protocol;
+
+   if (((lwt->notify & ILA_NOTIFY_DST) &&
+nla_put_in6_addr(nlskb, RTA_DST, &ip6h->daddr)) ||
+   ((lwt->notify & ILA_NOTIFY_SRC) &&
+nla_put_in6_addr(nlskb, RTA_SRC, &ip6h->saddr))) {
+   nlmsg_cancel(nlskb, nlh);
+   err = -EMSGSIZE;
+   goto errout;
+   }
+
+   nlmsg_end(nlskb, nlh);
+
+   rtnl_notify(nlskb, net, 0, RTNLGRP_ROUTE_NOTIFY, NULL, GFP_ATOMIC);
+
+   return;
+
+errout:
+   kfree_skb(nlskb);
+   WARN_ON(err == -EMSGSIZE);
+   rtnl_set_sk_err(net, RTNLGRP_ROUTE_NOTIFY, err);
+}
+
 static int ila_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 {
struct dst_entry *orig_dst = skb_dst(skb);
@@ -46,11 +114,14 @@ static int ila_output(struct net *net, struct sock *sk, 
struct sk_buff *skb)
if (skb->protocol != htons(ETH_P_IPV6))
goto drop;
 
-   if (ilwt->lwt_output)
+   if (ilwt->xlat)
ila_update_ipv6_locator(skb,
ila_params_lwtunnel(orig_dst->lwtstate),
true);
 
+   if (ilwt->notify)
+   ila_notify(net, skb, ilwt, rt);
+
if (rt->rt6i_flags & (RTF_GATEWAY | RTF

  1   2   3   >