from:"YOSHIFUJI Hideaki"

Re: Why are IPv6 host and anycast routes referencing lo device?

2016-11-07 Thread YOSHIFUJI Hideaki

Hi,

David Ahern wrote:
> 
> Can anyone explain why host routes and anycast routes for IPv6 are added with 
> the device set to loopback versus the device with the address:
> 
> local ::1 dev lo  proto none  metric 0  pref medium
> local 2000:1:: dev lo  proto none  metric 0  pref medium
> local 2000:1::3 dev lo  proto none  metric 0  pref medium
> local 2100:2:: dev lo  proto none  metric 0  pref medium
> local 2100:2::3 dev lo  proto none  metric 0  pref medium
> 
> 
> This behavior differs from IPv4 where host routes use the device with the 
> address:
> 
> broadcast 10.1.1.0 dev eth0  proto kernel  scope link  src 10.1.1.3
> local 10.1.1.3 dev eth0  proto kernel  scope host  src 10.1.1.3
> broadcast 10.1.1.255 dev eth0  proto kernel  scope link  src 10.1.1.3
> broadcast 10.100.2.0 dev eth2  proto kernel  scope link  src 10.100.2.3
> local 10.100.2.3 dev eth2  proto kernel  scope host  src 10.100.2.3
> broadcast 10.100.2.255 dev eth2  proto kernel  scope link  src 10.100.2.3
> 
> The use of loopback pre-dates the git history, so wondering if someone 
> recalls the reason why. We would like to change that to make it consistent 
> with IPv4 - with a sysctl to maintain backwards compatibility.

Once I tried I did not work.
You could try again to see what happens.

--yoshfuji

> 
> Thanks,
> David
> 

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

Re: [PATCH v5 3/7] ipv6 addrconf: rtr_solicits == -1 means unlimited

2016-09-27 Thread YOSHIFUJI Hideaki



Maciej Żenczykowski wrote:
> From: Maciej Żenczykowski 
> 
> This allows setting /proc/sys/net/ipv6/conf/*/router_solicitations
> to -1 meaning an unlimited number of retransmits.
> 

We could say "< 0 means infinite" and we can reduce changes here.

--yoshfuji

> Signed-off-by: Maciej Żenczykowski 
> ---
>  net/ipv6/addrconf.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 8bd2d06eefe7..1e59c0034916 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -3687,7 +3687,7 @@ static void addrconf_rs_timer(unsigned long data)
>   if (idev->if_flags & IF_RA_RCVD)
>   goto out;
>  
> - if (idev->rs_probes++ < idev->cnf.rtr_solicits) {
> + if (idev->rs_probes++ < idev->cnf.rtr_solicits || 
> idev->cnf.rtr_solicits == -1) {
>   write_unlock(&idev->lock);
>   if (!ipv6_get_lladdr(dev, &lladdr, IFA_F_TENTATIVE))
>   ndisc_send_rs(dev, &lladdr,
> @@ -3949,7 +3949,7 @@ static void addrconf_dad_completed(struct inet6_ifaddr 
> *ifp)
>   send_mld = ifp->scope == IFA_LINK && ipv6_lonely_lladdr(ifp);
>   send_rs = send_mld &&
> ipv6_accept_ra(ifp->idev) &&
> -   ifp->idev->cnf.rtr_solicits > 0 &&
> +   ifp->idev->cnf.rtr_solicits != 0 &&
> (dev->flags&IFF_LOOPBACK) == 0;
>   read_unlock_bh(&ifp->idev->lock);
>  
> @@ -5099,7 +5099,7 @@ static int inet6_set_iftoken(struct inet6_dev *idev, 
> struct in6_addr *token)
>   return -EINVAL;
>   if (!ipv6_accept_ra(idev))
>   return -EINVAL;
> - if (idev->cnf.rtr_solicits <= 0)
> + if (idev->cnf.rtr_solicits == 0)
>   return -EINVAL;
>  
>   write_lock_bh(&idev->lock);
> @@ -5699,6 +5699,7 @@ int addrconf_sysctl_ignore_routes_with_linkdown(struct 
> ctl_table *ctl,
>   return ret;
>  }
>  
> +static const int minus_one = -1;
>  static const int one = 1;
>  static const int two_five_five = 255;
>  
> @@ -5759,7 +5760,8 @@ static const struct ctl_table addrconf_sysctl[] = {
>   .data   = &ipv6_devconf.rtr_solicits,
>   .maxlen = sizeof(int),
>   .mode   = 0644,
> - .proc_handler   = proc_dointvec,
> + .proc_handler   = proc_dointvec_minmax,
> + .extra1 = (void *)&minus_one,
>   },
>   {
>   .procname   = "router_solicitation_interval",
> 

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

Re: [PATCH v5 2/7] ipv6 addrconf: remove addrconf_sysctl_hop_limit()

2016-09-27 Thread YOSHIFUJI Hideaki

Hi,

Maciej Żenczykowski wrote:
> From: Maciej Żenczykowski 
> 
> replace with extra1/2 magic
> 
> Signed-off-by: Maciej Żenczykowski 
> ---
>  net/ipv6/addrconf.c | 21 ++---
>  1 file changed, 6 insertions(+), 15 deletions(-)
> 
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 11fa1a5564d4..8bd2d06eefe7 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -5467,20 +5467,6 @@ int addrconf_sysctl_forward(struct ctl_table *ctl, int 
> write,
>  }
>  
>  static
> -int addrconf_sysctl_hop_limit(struct ctl_table *ctl, int write,
> -  void __user *buffer, size_t *lenp, loff_t 
> *ppos)
> -{
> - struct ctl_table lctl;
> - int min_hl = 1, max_hl = 255;
> -
> - lctl = *ctl;
> - lctl.extra1 = &min_hl;
> - lctl.extra2 = &max_hl;
> -
> - return proc_dointvec_minmax(&lctl, write, buffer, lenp, ppos);
> -}
> -
> -static
>  int addrconf_sysctl_mtu(struct ctl_table *ctl, int write,
>   void __user *buffer, size_t *lenp, loff_t *ppos)
>  {
> @@ -5713,6 +5699,9 @@ int addrconf_sysctl_ignore_routes_with_linkdown(struct 
> ctl_table *ctl,
>   return ret;
>  }
>  
> +static const int one = 1;
> +static const int two_five_five = 255;
> +
>  static const struct ctl_table addrconf_sysctl[] = {
>   {
>   .procname   = "forwarding",
> @@ -5726,7 +5715,9 @@ static const struct ctl_table addrconf_sysctl[] = {
>   .data   = &ipv6_devconf.hop_limit,
>   .maxlen = sizeof(int),
>   .mode   = 0644,
> - .proc_handler   = addrconf_sysctl_hop_limit,
> + .proc_handler   = proc_dointvec_minmax,
> + .extra1 = (void *)&one,
> + .extra2 = (void *)&two_five_five,
>   },
>   {
>   .procname   = "mtu",
> 

Please submit this in a different series of patches
(like 1/7).

--yoshfuji

Re: [PATCH v5 1/7] ipv6 addrconf: enable use of proc_dointvec_minmax in addrconf_sysctl

2016-09-27 Thread YOSHIFUJI Hideaki

Hi,

Maciej Żenczykowski wrote:
> From: Maciej Żenczykowski 
> 
> Signed-off-by: Maciej Żenczykowski 
> ---
>  net/ipv6/addrconf.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 2f1f5d439788..11fa1a5564d4 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -6044,8 +6044,14 @@ static int __addrconf_sysctl_register(struct net *net, 
> char *dev_name,
>  
>   for (i = 0; table[i].data; i++) {
>   table[i].data += (char *)p - (char *)&ipv6_devconf;
> - table[i].extra1 = idev; /* embedded; no ref */
> - table[i].extra2 = net;
> + /* If one of these is already set, then it is not safe to
> +  * overwrite either of them: this makes proc_dointvec_minmax
> +  * usable.
> +  */
> + if (!table[i].extra1 && !table[i].extra2) {
> + table[i].extra1 = idev; /* embedded; no ref */
> + table[i].extra2 = net;
> + }
>   }
>  
>   snprintf(path, sizeof(path), "net/ipv6/conf/%s", dev_name);
> 

This seems nothing to do with the RFC7559 changes.
Why don't you submit this as a separate patch?

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

Re: icmpv6: issue with routing table entries from link local addresses

2016-09-12 Thread YOSHIFUJI Hideaki

Sowmini Varadhan wrote:
> On Mon, Sep 12, 2016 at 1:26 PM, Hannes Frederic Sowa
>  wrote:
>> Hello,
>>
>> On 12.09.2016 16:27, Andreas Hübner wrote:
> 
>>>
>>> I have the following setup:
>>>   - 2 directly connected hosts (A+B), both have only link local addresses
>>> configured (interface on both hosts is eth0)
>:
>>>   fe80::/64 dev eth1  proto kernel  metric 256
>>>   fe80::/64 dev eth0  proto kernel  metric 256
>:
>>> The issue I currently have is, that the echo reply that host B should
>>> generate is never sent back to host A. If I change the order of the
>>> routing table entries on host B, everything works fine.
>>> (host A is connected on eth0)
>:
>> This shouldn't be the case. We certainly carry over the ifindex of the
>> received packet into the routing lookup of the outgoing packet, thus the
>> appropriate rule, with outgoing ifindex should be selected.
> 
> Like Hannes,  I too would first check "is B sending out the echo-resp? on
> which interface?".
> 
> But a couple of unexpected things I noticed in linux: the link-local
> prefix should have a prefixlen of /10 according to
> http://www.iana.org/assignments/ipv6-address-space/ipv6-address-space.xhtml
> but "ip -6 route show"  lists this as a /64..

Do not be confused; link-local address for ethernet is described by
IPv6 over FOO document (e.g., RFC2464 for Ethernet).  The address
(fe80::/64 for Ethernet, for example) is defined inside the link-local
scope unicast address space (/10).

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

Re: Minimum MTU Mess

2016-09-11 Thread YOSHIFUJI Hideaki



Jarod Wilson wrote:
> On Tue, Sep 06, 2016 at 04:55:29PM -0700, David Miller wrote:
>> From: Jarod Wilson 
>> Date: Fri, 2 Sep 2016 13:07:42 -0400
>>
>>> In any case, the number of "mtu < 68" and "#define FOO_MIN_MTU 68", or
>>> variations thereof, under drivers/net/ is kind of crazy.
>>
>> Agreed, we can have a default and let the different cases provide
>> overrides.
>>
>> Mostly what to do here is a function of the hardware though.
> 
> So I've been tinkering with this some, and it looks like having both
> centralized min and max checking could be useful here. I'm hacking away at
> drivers now, but the basis of all this would potentially look about like
> the patch below, and each device would have to set dev->m{in,ax}_mtu one
> way or another. Drivers using alloc_etherdev and/or ether_setup would get
> the "default" values, and then they can be overridden. Probably need
> something to make sure dev->max_mtu isn't set to 0 though...
> 
> Possibly on the right track here, or might there be a better way to
> approach this?
> 
> diff --git a/include/uapi/linux/if_ether.h b/include/uapi/linux/if_ether.h
> index 117d02e..864d6f2 100644
> --- a/include/uapi/linux/if_ether.h
> +++ b/include/uapi/linux/if_ether.h
> @@ -35,6 +35,8 @@
>  #define ETH_FRAME_LEN1514/* Max. octets in frame sans 
> FCS */
>  #define ETH_FCS_LEN  4   /* Octets in the FCS */
>  
> +#define ETH_MIN_MTU  68  /* Min IPv4 MTU per RFC791  */
> +
>  /*
>   *   These are the defined Ethernet Protocol ID's.
>   */

Why don't we disable IPv4 if the MTU is lower than this value
as we do for IPv6?

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

Re: [PATCH net 1/2] ipv6: add missing netconf notif when 'all' is updated

2016-08-29 Thread YOSHIFUJI Hideaki



Nicolas Dichtel wrote:
> The 'default' value was not advertised.
> 
> Fixes: f3a1bfb11ccb ("rtnl/ipv6: use netconf msg to advertise forwarding 
> status")
> Signed-off-by: Nicolas Dichtel 
> ---
>  net/ipv6/addrconf.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index f418d2eaeddd..299f0656e87f 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -778,7 +778,14 @@ static int addrconf_fixup_forwarding(struct ctl_table 
> *table, int *p, int newf)
>   }
>  
>   if (p == &net->ipv6.devconf_all->forwarding) {
> + int old_dftl = net->ipv6.devconf_dflt->forwarding;

dflt, not dftl?

> +
>   net->ipv6.devconf_dflt->forwarding = newf;
> + if ((!newf) ^ (!old_dftl))
> + inet6_netconf_notify_devconf(net, NETCONFA_FORWARDING,
> +  NETCONFA_IFINDEX_DEFAULT,
> +  net->ipv6.devconf_dflt);
> +
>   addrconf_forward_change(net, newf);
>   if ((!newf) ^ (!old))
>   inet6_netconf_notify_devconf(net, NETCONFA_FORWARDING,
> 

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

Re: [PATCH net] net: ipv6: Fix ping to link-local addresses.

2016-08-15 Thread YOSHIFUJI Hideaki



Lorenzo Colitti wrote:
> On Wed, Aug 10, 2016 at 7:44 AM, YOSHIFUJI Hideaki
>  wrote:
>>
>>>> I could see a point of view that says when bound_if is in play sending
>>>> to destinations on/via other interfaces--by any mechanism--should
>>>> effectively get ENETUNREACH (or something).
>>>
>>> VRF uses this capability to send on an enslaved interface. ie., socket is 
>>> bound to VRF device to limit packets to that L3 domain and then uses 
>>> PKTINFO to force a packet out a particular interface.
>>
>> We could extend our code to allow enslave devices, maybe.
> 
> So something like this, then?
> 
> static inline bool inet_check_bound_oif(const struct sock *sk, int oif)
> {
> if (!oif || !sk->sk_bound_dev_if || oif == sk->sk_bound_dev_if)
> return true;
> 
> #ifdef CONFIG_NET_L3_MASTER_DEV
> return l3mdev_master_ifindex_by_index(sock_net(sk), oif) ==
> sk->sk_bound_dev_if;
> #endif
> 
> return false;
> }
> 
> and then in the various sendmsg functions:
> 
> if (!inet_check_bound_oif(sk, oif))
> return -EINVAL;
> 

Yes, something like that.

--yoshfuji

Re: [PATCH net] net: ipv6: Fix ping to link-local addresses.

2016-08-09 Thread YOSHIFUJI Hideaki



David Ahern wrote:
> On 8/9/16 1:01 AM, Erik Kline wrote:
>> On 9 August 2016 at 14:20, David Miller  wrote:
>>> From: Lorenzo Colitti 
>>> Date: Tue, 9 Aug 2016 10:00:25 +0900
>>>
 Note that pretty much every sendmsg codepath allows other data to take
 precedence over sk_bound_dev_if:

 - udpv6_sendmsg: if sin6_scope_id specified on a scoped address
 - rawv6_sendmsg: if sin6_scope_id specified on a scoped address
 - l2tp_ip6_sendmsg: if sin6_scope_id specified on a scoped address
 - ip_cmsg_send: if IP_PKTINFO or IPV6_PKTINFO specified

 What should I do about those? -EINVAL? Ignore the conflicting data? Leave 
 as is?
>>>
>>> That's a good point, I guess this needs some more thought.
>>
>> I could see a point of view that says when bound_if is in play sending
>> to destinations on/via other interfaces--by any mechanism--should
>> effectively get ENETUNREACH (or something).
> 
> VRF uses this capability to send on an enslaved interface. ie., socket is 
> bound to VRF device to limit packets to that L3 domain and then uses PKTINFO 
> to force a packet out a particular interface.
> 

We could extend our code to allow enslave devices, maybe.

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

Re: [PATCH net] net: ipv6: Fix ping to link-local addresses.

2016-08-09 Thread YOSHIFUJI Hideaki



Erik Kline wrote:
> On 9 August 2016 at 14:20, David Miller  wrote:
>> From: Lorenzo Colitti 
>> Date: Tue, 9 Aug 2016 10:00:25 +0900
>>
>>> Note that pretty much every sendmsg codepath allows other data to take
>>> precedence over sk_bound_dev_if:
>>>
>>> - udpv6_sendmsg: if sin6_scope_id specified on a scoped address
>>> - rawv6_sendmsg: if sin6_scope_id specified on a scoped address
>>> - l2tp_ip6_sendmsg: if sin6_scope_id specified on a scoped address
>>> - ip_cmsg_send: if IP_PKTINFO or IPV6_PKTINFO specified
>>>
>>> What should I do about those? -EINVAL? Ignore the conflicting data? Leave 
>>> as is?
>>
>> That's a good point, I guess this needs some more thought.
> 
> I could see a point of view that says when bound_if is in play sending
> to destinations on/via other interfaces--by any mechanism--should
> effectively get ENETUNREACH (or something).

+1

> 
> That does seem like I would involve changing some existing behavior, though.
> 

The use of sin6_scope_id and SO_BINDTODEVICE with different interfaces
is incorrect and should be rejected.

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

Re: [PATCH] net: neigh: disallow state transition DELAY->STALE in neigh_update()

2016-07-24 Thread YOSHIFUJI Hideaki/吉藤英明

Hi,

Chunhui He wrote:
> Hi,
> 
> On Fri, 22 Jul 2016 10:20:01 +0300 (EEST), Julian Anastasov  
> wrote:
>>
>>  Hello,
>>
>> On Thu, 21 Jul 2016, Chunhui He wrote:
>>
>>> If neigh entry was CONNECTED and address is not changed, and if new state is
>>> STALE, entry state will not change. Because DELAY is not in CONNECTED, it's
>>> possible to change state from DELAY to STALE.
>>>
>>> That is bad. Consider a host in IPv4 nerwork, a neigh entry in STALE state
>>> is referenced to send packets, so goes to DELAY state. If the entry is not
>>> confirmed by upper layer, it goes to PROBE state, and sends ARP request.
>>> The neigh host sends ARP reply, then the entry goes to REACHABLE state.
>>> But the entry state may be reseted to STALE by broadcast ARP packets, before
>>> the entry goes to PROBE state. So it's possible that the entry will never go
>>> to REACHABLE state, without external confirmation.
>>>
>>> In my case, the gateway refuses to send unicast packets to me, before it 
>>> sees
>>> my ARP request. So it's critical to enter REACHABLE state by sending ARP
>>> request, but not by external confirmation.
>>>
>>> This fixes neigh_update() not to change to STALE if old state is CONNECTED 
>>> or
>>> DELAY.
>>>
>>> Signed-off-by: Chunhui He 
>>> ---
>>>  net/core/neighbour.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
>>> index 510cd62..29429eb 100644
>>> --- a/net/core/neighbour.c
>>> +++ b/net/core/neighbour.c
>>> @@ -1152,7 +1152,7 @@ int neigh_update(struct neighbour *neigh, const u8 
>>> *lladdr, u8 new,
>>> } else {
>>> if (lladdr == neigh->ha && new == NUD_STALE &&
>>> ((flags & NEIGH_UPDATE_F_WEAK_OVERRIDE) ||
>>> -(old & NUD_CONNECTED))
>>> +(old & (NUD_CONNECTED | NUD_DELAY)))
>>> )
>>> new = old;
>>> }
>>
>>  You change looks correct to me. But this place
>> has more problems. There is no good reason to set NUD_STALE
>> for any state that is NUD_VALID if address is not changed.
>> This matches perfectly the comment above this code:
>> NUD_STALE should change a NUD_VALID state only when
>> address changes. It also means that IPv6 does not need
>> to provide NEIGH_UPDATE_F_WEAK_OVERRIDE anymore when
>> NEIGH_UPDATE_F_OVERRIDE is also present.
>>
> 
> The NEIGH_UPDATE_F_WEAK_OVERRIDE is confusing to me, so I choose not to deal
> with the flag.

IPv6 depends on WEAK_OVERRIDE.  Please do not change.

> 
>>  By this way the state machine can continue with
>> the resolving: NUD_STALE -> NUD_DELAY (traffic) ->
>> NUD_PROBE (retries) -> NUD_REACHABLE (unicast reply)
>> while the address is not changed. Your change covers only
>> NUD_DELAY, not NUD_PROBE, so it is better to allow more
>> retries to send. We should not give up until success (NUD_REACHABLE).
>>
> 
> I have thought about this.
> The origin code allows NUD_DELAY -> NUD_STALE and NUD_PROBE -> NUD_STALE.
> This part was imported to kernel since v2.1.79, I don't know clearly why it
> allows that.
> 
> My analysis:
> (1) As shown in my previous mail, NUD_DELAY -> NUD_STALE may cause "dead 
> loop",
> so it should be fixed.
> 
> (2) But NUD_PROBE -> NUD_STALE is acceptable, because in NUD_PROBE, ARP 
> request
> has been sent, it is sufficient to break the "dead loop".
> More attempts are accomplished by the following sequence:
> NUD_STALE --> NUD_DELAY -(sent req)-> NUD_PROBE -(reset by neigh_update())->
> NUD_STALE --> NUD_DELAY -(send req again)-> ... -->
> NUD_REACHABLE
> 
> 
> But I also agree your change.
> 
>>  Second problem: NEIGH_UPDATE_F_WEAK_OVERRIDE has no
>> priority over NEIGH_UPDATE_F_ADMIN. For example, now I can not
>> change from NUD_PERMANENT to NUD_STALE:
>>
>> # ip neigh add 192.168.168.111 lladdr 00:11:22:33:44:55 nud perm dev wlan0
>> # ip neigh show to 192.168.168.111
>> 192.168.168.111 dev wlan0 lladdr 00:11:22:33:44:55 PERMANENT
>> # ip neigh change 192.168.168.111 lladdr 00:11:22:33:44:55 nud stale dev 
>> wlan0
>> # ip neigh show to 192.168.168.111
>> 192.168.168.111 dev wlan0 lladdr 00:11:22:33:44:55 PERMANENT
>>
>>  IMHO, here is how this place should look:
>>
>> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
>> index 5cdc62a..2b1cb91 100644
>> --- a/net/core/neighbour.c
>> +++ b/net/core/neighbour.c
>> @@ -1151,10 +1151,8 @@ int neigh_update(struct neighbour *neigh, const u8 
>> *lladdr, u8 new,
>>  goto out;
>>  } else {
>>  if (lladdr == neigh->ha && new == NUD_STALE &&
>> -((flags & NEIGH_UPDATE_F_WEAK_OVERRIDE) ||
>> - (old & NUD_CONNECTED))
>> -)
>> -new = old;
>> +!(flags & NEIGH_UPDATE_F_ADMIN))
>> +goto out;
>>  }
>>

Re: [PATCH] sit: correct IP protocol used in ipip6_err

2016-06-16 Thread YOSHIFUJI Hideaki

Hi, Simon,

Simon Horman wrote:
> On Thu, Jun 16, 2016 at 05:06:19PM +0900, Simon Horman wrote:
>> Since 32b8a8e59c9c ("sit: add IPv4 over IPv4 support")
>> ipip6_err() may be called for packets whose IP protocol is
>> IPPROTO_IPIP as well as those whose IP protocol is IPPROTO_IPV6.
>>
>> In the case of IPPROTO_IPIP packets the correct protocol value is not
>> passed to ipv4_update_pmtu() or ipv4_redirect().
>>
>> This patch resolves this problem by using the IP protocol of the packet
>> rather than a hard-coded value. This appears to be consistent
>> with the usage of the protocol of a packet by icmp_socket_deliver()
>> the caller of ipip6_err().
>>
>> I was able to exercise the redirect case by using a setup where an ICMP
>> redirect was received for the destination of the encapsulated packet.
>> However, it appears that although incorrect the protocol field is not used
>> in this case and thus no problem manifests.  On inspection it does not
>> appear that a problem will manifest in the fragmentation needed/update pmtu
>> case either.
>>
>> In short I believe this is a cosmetic fix. None the less, the use of
>> IPPROTO_IPV6 seems wrong and confusing.
>>
>> Reviewed-by: Dinan Gunawardena 
>> Signed-off-by: Simon Horman 
> 
> Apologies for not making this more obvious, this is a "net-next" patch.

Acked-by: YOSHIFUJI Hideaki 

BTW, we should have similar fix in -net, -stable etc. as well, no?

> 
>> ---
>>  net/ipv6/sit.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
>> index d9f2bd6ef72d..f4356bb13f4b 100644
>> --- a/net/ipv6/sit.c
>> +++ b/net/ipv6/sit.c
>> @@ -560,13 +560,13 @@ static int ipip6_err(struct sk_buff *skb, u32 info)
>>  
>>  if (type == ICMP_DEST_UNREACH && code == ICMP_FRAG_NEEDED) {
>>  ipv4_update_pmtu(skb, dev_net(skb->dev), info,
>> - t->parms.link, 0, IPPROTO_IPV6, 0);
>> + t->parms.link, 0, iph->protocol, 0);
>>  err = 0;
>>  goto out;
>>  }
>>  if (type == ICMP_REDIRECT) {
>>  ipv4_redirect(skb, dev_net(skb->dev), t->parms.link, 0,
>> -  IPPROTO_IPV6, 0);
>> +  iph->protocol, 0);
>>  err = 0;
>>  goto out;
>>  }
>> -- 
>> 2.1.4
>>

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

Re: [PATCHv3 net-next 10/12] 6lowpan: introduce 6lowpan-nd

2016-06-15 Thread YOSHIFUJI Hideaki



Alexander Aring wrote:
> This patch introduce different 6lowpan handling for receive and transmit
> NS/NA messages for the ipv6 neighbour discovery. The first use-case is
> for supporting 802.15.4 short addresses inside the option fields and
> handling for RFC6775 6CO option field as userspace option.
> 
> Cc: David S. Miller 
> Cc: Alexey Kuznetsov 
> Cc: James Morris 
> Cc: Hideaki YOSHIFUJI 
> Cc: Patrick McHardy 
> Reviewed-by: Stefan Schmidt 
> Signed-off-by: Alexander Aring 


Acked-by: YOSHIFUJI Hideaki 


> ---
>  include/net/ndisc.h |  18 ++--
>  net/6lowpan/6lowpan_i.h |   4 +
>  net/6lowpan/Makefile|   2 +-
>  net/6lowpan/core.c  |   4 +-
>  net/6lowpan/ndisc.c | 234 
> 
>  5 files changed, 254 insertions(+), 8 deletions(-)
>  create mode 100644 net/6lowpan/ndisc.c
> 
> diff --git a/include/net/ndisc.h b/include/net/ndisc.h
> index 3f0f41d..be1fe228 100644
> --- a/include/net/ndisc.h
> +++ b/include/net/ndisc.h
> @@ -35,6 +35,7 @@ enum {
>   ND_OPT_ROUTE_INFO = 24, /* RFC4191 */
>   ND_OPT_RDNSS = 25,  /* RFC5006 */
>   ND_OPT_DNSSL = 31,  /* RFC6106 */
> + ND_OPT_6CO = 34,/* RFC6775 */
>   __ND_OPT_MAX
>  };
>  
> @@ -109,14 +110,19 @@ struct ndisc_options {
>  #endif
>   struct nd_opt_hdr *nd_useropts;
>   struct nd_opt_hdr *nd_useropts_end;
> +#if IS_ENABLED(CONFIG_IEEE802154_6LOWPAN)
> + struct nd_opt_hdr *nd_802154_opt_array[ND_OPT_TARGET_LL_ADDR + 1];
> +#endif
>  };
>  
> -#define nd_opts_src_lladdr   nd_opt_array[ND_OPT_SOURCE_LL_ADDR]
> -#define nd_opts_tgt_lladdr   nd_opt_array[ND_OPT_TARGET_LL_ADDR]
> -#define nd_opts_pi   nd_opt_array[ND_OPT_PREFIX_INFO]
> -#define nd_opts_pi_end   nd_opt_array[__ND_OPT_PREFIX_INFO_END]
> -#define nd_opts_rh   nd_opt_array[ND_OPT_REDIRECT_HDR]
> -#define nd_opts_mtu  nd_opt_array[ND_OPT_MTU]
> +#define nd_opts_src_lladdr   nd_opt_array[ND_OPT_SOURCE_LL_ADDR]
> +#define nd_opts_tgt_lladdr   nd_opt_array[ND_OPT_TARGET_LL_ADDR]
> +#define nd_opts_pi   nd_opt_array[ND_OPT_PREFIX_INFO]
> +#define nd_opts_pi_end   
> nd_opt_array[__ND_OPT_PREFIX_INFO_END]
> +#define nd_opts_rh   nd_opt_array[ND_OPT_REDIRECT_HDR]
> +#define nd_opts_mtu  nd_opt_array[ND_OPT_MTU]
> +#define nd_802154_opts_src_lladdr
> nd_802154_opt_array[ND_OPT_SOURCE_LL_ADDR]
> +#define nd_802154_opts_tgt_lladdr
> nd_802154_opt_array[ND_OPT_TARGET_LL_ADDR]
>  
>  #define NDISC_OPT_SPACE(len) (((len)+2+7)&~7)
>  
> diff --git a/net/6lowpan/6lowpan_i.h b/net/6lowpan/6lowpan_i.h
> index 97ecc27..a67caee 100644
> --- a/net/6lowpan/6lowpan_i.h
> +++ b/net/6lowpan/6lowpan_i.h
> @@ -12,6 +12,10 @@ static inline bool lowpan_is_ll(const struct net_device 
> *dev,
>   return lowpan_dev(dev)->lltype == lltype;
>  }
>  
> +extern const struct ndisc_ops lowpan_ndisc_ops;
> +
> +int addrconf_ifid_802154_6lowpan(u8 *eui, struct net_device *dev);
> +
>  #ifdef CONFIG_6LOWPAN_DEBUGFS
>  int lowpan_dev_debugfs_init(struct net_device *dev);
>  void lowpan_dev_debugfs_exit(struct net_device *dev);
> diff --git a/net/6lowpan/Makefile b/net/6lowpan/Makefile
> index e44f3bf..12d131a 100644
> --- a/net/6lowpan/Makefile
> +++ b/net/6lowpan/Makefile
> @@ -1,6 +1,6 @@
>  obj-$(CONFIG_6LOWPAN) += 6lowpan.o
>  
> -6lowpan-y := core.o iphc.o nhc.o
> +6lowpan-y := core.o iphc.o nhc.o ndisc.o
>  6lowpan-$(CONFIG_6LOWPAN_DEBUGFS) += debugfs.o
>  
>  #rfc6282 nhcs
> diff --git a/net/6lowpan/core.c b/net/6lowpan/core.c
> index 1c7a42b..5945f7e 100644
> --- a/net/6lowpan/core.c
> +++ b/net/6lowpan/core.c
> @@ -34,6 +34,8 @@ int lowpan_register_netdevice(struct net_device *dev,
>   for (i = 0; i < LOWPAN_IPHC_CTX_TABLE_SIZE; i++)
>   lowpan_dev(dev)->ctx.table[i].id = i;
>  
> + dev->ndisc_ops = &lowpan_ndisc_ops;
> +
>   ret = register_netdevice(dev);
>   if (ret < 0)
>   return ret;
> @@ -73,7 +75,7 @@ void lowpan_unregister_netdev(struct net_device *dev)
>  }
>  EXPORT_SYMBOL(lowpan_unregister_netdev);
>  
> -static int addrconf_ifid_802154_6lowpan(u8 *eui, struct net_device *dev)
> +int addrconf_ifid_802154_6lowpan(u8 *eui, struct net_device *dev)
>  {
>   struct wpan_dev *wpan_dev = 
> lowpan_802154_dev(dev)->wdev->ieee802154_ptr;
>  
> diff --git a/net/6lowpan/ndisc.c b/net/6lowpan/ndisc.c
> new file mode 100644
> index 000..ae1d419
> --- /dev/null
> +++ b/net/6lowpan/ndisc.c
> @@ -0,0 +1,234 @@
> +/* This progr

Re: [PATCHv3 net-next 09/12] ipv6: export several functions

2016-06-15 Thread YOSHIFUJI Hideaki



Alexander Aring wrote:
> This patch exports some neighbour discovery functions which can be used
> by 6lowpan neighbour discovery ops functionality then.
> 
> Cc: David S. Miller 
> Cc: Alexey Kuznetsov 
> Cc: James Morris 
> Cc: Hideaki YOSHIFUJI 
> Cc: Patrick McHardy 
> Signed-off-by: Alexander Aring 

Acked-by: YOSHIFUJI Hideaki 

> ---
>  include/net/addrconf.h |  7 +++
>  include/net/ndisc.h| 12 
>  net/ipv6/addrconf.c| 15 +++
>  net/ipv6/ndisc.c   | 14 +++---
>  4 files changed, 29 insertions(+), 19 deletions(-)
> 
> diff --git a/include/net/addrconf.h b/include/net/addrconf.h
> index b1774eb..9826d3a 100644
> --- a/include/net/addrconf.h
> +++ b/include/net/addrconf.h
> @@ -97,6 +97,13 @@ void addrconf_leave_solict(struct inet6_dev *idev, const 
> struct in6_addr *addr);
>  void addrconf_add_linklocal(struct inet6_dev *idev,
>   const struct in6_addr *addr, u32 flags);
>  
> +int addrconf_prefix_rcv_add_addr(struct net *net, struct net_device *dev,
> +  const struct prefix_info *pinfo,
> +  struct inet6_dev *in6_dev,
> +  const struct in6_addr *addr, int addr_type,
> +  u32 addr_flags, bool sllao, bool tokenized,
> +  __u32 valid_lft, u32 prefered_lft);
> +
>  static inline int addrconf_ifid_eui48(u8 *eui, struct net_device *dev)
>  {
>   if (dev->addr_len != ETH_ALEN)
> diff --git a/include/net/ndisc.h b/include/net/ndisc.h
> index a5e2767..3f0f41d 100644
> --- a/include/net/ndisc.h
> +++ b/include/net/ndisc.h
> @@ -53,6 +53,15 @@ enum {
>  
>  #include 
>  
> +/* Set to 3 to get tracing... */
> +#define ND_DEBUG 1
> +
> +#define ND_PRINTK(val, level, fmt, ...)  \
> +do { \
> + if (val <= ND_DEBUG)\
> + net_##level##_ratelimited(fmt, ##__VA_ARGS__);  \
> +} while (0)
> +
>  struct ctl_table;
>  struct inet6_dev;
>  struct net_device;
> @@ -115,6 +124,9 @@ struct ndisc_options *ndisc_parse_options(const struct 
> net_device *dev,
> u8 *opt, int opt_len,
> struct ndisc_options *ndopts);
>  
> +void __ndisc_fill_addr_option(struct sk_buff *skb, int type, void *data,
> +   int data_len, int pad);
> +
>  #define NDISC_OPS_REDIRECT_DATA_SPACE2
>  
>  /*
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 2d678c0..9c7d660 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -2333,14 +2333,12 @@ static bool is_addr_mode_generate_stable(struct 
> inet6_dev *idev)
>  idev->addr_gen_mode == IN6_ADDR_GEN_MODE_RANDOM;
>  }
>  
> -static int addrconf_prefix_rcv_add_addr(struct net *net,
> - struct net_device *dev,
> - const struct prefix_info *pinfo,
> - struct inet6_dev *in6_dev,
> - const struct in6_addr *addr,
> - int addr_type, u32 addr_flags,
> - bool sllao, bool tokenized,
> - __u32 valid_lft, u32 prefered_lft)
> +int addrconf_prefix_rcv_add_addr(struct net *net, struct net_device *dev,
> +  const struct prefix_info *pinfo,
> +  struct inet6_dev *in6_dev,
> +  const struct in6_addr *addr, int addr_type,
> +  u32 addr_flags, bool sllao, bool tokenized,
> +  __u32 valid_lft, u32 prefered_lft)
>  {
>   struct inet6_ifaddr *ifp = ipv6_get_ifaddr(net, addr, dev, 1);
>   int create = 0, update_lft = 0;
> @@ -2430,6 +2428,7 @@ static int addrconf_prefix_rcv_add_addr(struct net *net,
>  
>   return 0;
>  }
> +EXPORT_SYMBOL_GPL(addrconf_prefix_rcv_add_addr);
>  
>  void addrconf_prefix_rcv(struct net_device *dev, u8 *opt, int len, bool 
> sllao)
>  {
> diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
> index 2f4afd1..fe65cdc 100644
> --- a/net/ipv6/ndisc.c
> +++ b/net/ipv6/ndisc.c
> @@ -73,15 +73,6 @@
>  #include 
>  #include 
>  
> -/* Set to 3 to get tracing... */
> -#define ND_DEBUG 1
> -
> -#define ND_PRINTK(val, level, fmt, ...)  \
> -do { \
> - if (val <= ND_DEBUG)

Re: [PATCHv3 net-next 08/12] ipv6: introduce neighbour discovery ops

2016-06-15 Thread YOSHIFUJI Hideaki/吉藤英明



Alexander Aring wrote:
> This patch introduces neighbour discovery ops callback structure. The
> idea is to separate the handling for 6LoWPAN into the 6lowpan module.
> 
> These callback offers 6lowpan different handling, such as 802.15.4 short
> address handling or RFC6775 (Neighbor Discovery Optimization for IPv6
> over 6LoWPANs).
> 
> Cc: David S. Miller 
> Cc: Alexey Kuznetsov 
> Cc: James Morris 
> Cc: Hideaki YOSHIFUJI 
> Cc: Patrick McHardy 
> Signed-off-by: Alexander Aring 

Acked-by: YOSHIFUJI Hideaki 

> ---
>  include/linux/netdevice.h |   5 ++
>  include/net/ndisc.h   | 197 
> +-
>  net/ipv6/addrconf.c   |  13 ++-
>  net/ipv6/ndisc.c  | 101 
>  net/ipv6/route.c  |   8 +-
>  5 files changed, 284 insertions(+), 40 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 36e43bd..890158e 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1456,6 +1456,8 @@ enum netdev_priv_flags {
>   *   @netdev_ops:Includes several pointers to callbacks,
>   *   if one wants to override the ndo_*() functions
>   *   @ethtool_ops:   Management operations
> + *   @ndisc_ops: Includes callbacks for different IPv6 neighbour
> + *   discovery handling. Necessary for e.g. 6LoWPAN.
>   *   @header_ops:Includes callbacks for creating,parsing,caching,etc
>   *   of Layer 2 headers.
>   *
> @@ -1672,6 +1674,9 @@ struct net_device {
>  #ifdef CONFIG_NET_L3_MASTER_DEV
>   const struct l3mdev_ops *l3mdev_ops;
>  #endif
> +#if IS_ENABLED(CONFIG_IPV6)
> + const struct ndisc_ops *ndisc_ops;
> +#endif
>  
>   const struct header_ops *header_ops;
>  
> diff --git a/include/net/ndisc.h b/include/net/ndisc.h
> index c8962ad..a5e2767 100644
> --- a/include/net/ndisc.h
> +++ b/include/net/ndisc.h
> @@ -58,6 +58,7 @@ struct inet6_dev;
>  struct net_device;
>  struct net_proto_family;
>  struct sk_buff;
> +struct prefix_info;
>  
>  extern struct neigh_table nd_tbl;
>  
> @@ -110,9 +111,182 @@ struct ndisc_options {
>  
>  #define NDISC_OPT_SPACE(len) (((len)+2+7)&~7)
>  
> -struct ndisc_options *ndisc_parse_options(u8 *opt, int opt_len,
> +struct ndisc_options *ndisc_parse_options(const struct net_device *dev,
> +   u8 *opt, int opt_len,
> struct ndisc_options *ndopts);
>  
> +#define NDISC_OPS_REDIRECT_DATA_SPACE2
> +
> +/*
> + * This structure defines the hooks for IPv6 neighbour discovery.
> + * The following hooks can be defined; unless noted otherwise, they are
> + * optional and can be filled with a null pointer.
> + *
> + * int (*is_useropt)(u8 nd_opt_type):
> + * This function is called when IPv6 decide RA userspace options. if
> + * this function returns 1 then the option given by nd_opt_type will
> + * be handled as userspace option additional to the IPv6 options.
> + *
> + * int (*parse_options)(const struct net_device *dev,
> + *   struct nd_opt_hdr *nd_opt,
> + *   struct ndisc_options *ndopts):
> + * This function is called while parsing ndisc ops and put each position
> + * as pointer into ndopts. If this function return unequal 0, then this
> + * function took care about the ndisc option, if 0 then the IPv6 ndisc
> + * option parser will take care about that option.
> + *
> + * void (*update)(const struct net_device *dev, struct neighbour *n,
> + * u32 flags, u8 icmp6_type,
> + * const struct ndisc_options *ndopts):
> + * This function is called when IPv6 ndisc updates the neighbour cache
> + * entry. Additional options which can be updated may be previously
> + * parsed by parse_opts callback and accessible over ndopts parameter.
> + *
> + * int (*opt_addr_space)(const struct net_device *dev, u8 icmp6_type,
> + *struct neighbour *neigh, u8 *ha_buf,
> + *u8 **ha):
> + * This function is called when the necessary option space will be
> + * calculated before allocating a skb. The parameters neigh, ha_buf
> + * abd ha are available on NDISC_REDIRECT messages only.
> + *
> + * void (*fill_addr_option)(const struct net_device *dev,
> + *   struct sk_buff *skb, u8 icmp6_type,
> + *   const u8 *ha):
> + * This function is called when the skb will finally fill the option
> + * fields inside skb. NOTE: this callback should fill the option
> + * fields to the skb which are previously indicated b

Re: [PATCHv3 net-next 07/12] addrconf: put prefix address add in an own function

2016-06-15 Thread YOSHIFUJI Hideaki



Alexander Aring wrote:
> This patch moves the functionality to add a RA PIO prefix generated
> address in an own function. This move prepares to add a hook for
> adding a second address for a second link-layer address. E.g. short
> address for 802.15.4 6LoWPAN.
> 
> Cc: David S. Miller 
> Cc: Alexey Kuznetsov 
> Cc: James Morris 
> Cc: Hideaki YOSHIFUJI 
> Cc: Patrick McHardy 
> Reviewed-by: Stefan Schmidt 
> Signed-off-by: Alexander Aring 


Acked-by: YOSHIFUJI Hideaki 


> ---
>  net/ipv6/addrconf.c | 203 
> 
>  1 file changed, 109 insertions(+), 94 deletions(-)
> 
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index beaad49..0ca31e1 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -2333,12 +2333,110 @@ static bool is_addr_mode_generate_stable(struct 
> inet6_dev *idev)
>  idev->addr_gen_mode == IN6_ADDR_GEN_MODE_RANDOM;
>  }
>  
> +static int addrconf_prefix_rcv_add_addr(struct net *net,
> + struct net_device *dev,
> + const struct prefix_info *pinfo,
> + struct inet6_dev *in6_dev,
> + const struct in6_addr *addr,
> + int addr_type, u32 addr_flags,
> + bool sllao, bool tokenized,
> + __u32 valid_lft, u32 prefered_lft)
> +{
> + struct inet6_ifaddr *ifp = ipv6_get_ifaddr(net, addr, dev, 1);
> + int create = 0, update_lft = 0;
> +
> + if (!ifp && valid_lft) {
> + int max_addresses = in6_dev->cnf.max_addresses;
> +
> +#ifdef CONFIG_IPV6_OPTIMISTIC_DAD
> + if (in6_dev->cnf.optimistic_dad &&
> + !net->ipv6.devconf_all->forwarding && sllao)
> + addr_flags |= IFA_F_OPTIMISTIC;
> +#endif
> +
> + /* Do not allow to create too much of autoconfigured
> +  * addresses; this would be too easy way to crash kernel.
> +  */
> + if (!max_addresses ||
> + ipv6_count_addresses(in6_dev) < max_addresses)
> + ifp = ipv6_add_addr(in6_dev, addr, NULL,
> + pinfo->prefix_len,
> + addr_type&IPV6_ADDR_SCOPE_MASK,
> + addr_flags, valid_lft,
> + prefered_lft);
> +
> + if (IS_ERR_OR_NULL(ifp))
> + return -1;
> +
> + update_lft = 0;
> + create = 1;
> + spin_lock_bh(&ifp->lock);
> + ifp->flags |= IFA_F_MANAGETEMPADDR;
> + ifp->cstamp = jiffies;
> + ifp->tokenized = tokenized;
> + spin_unlock_bh(&ifp->lock);
> + addrconf_dad_start(ifp);
> + }
> +
> + if (ifp) {
> + u32 flags;
> + unsigned long now;
> + u32 stored_lft;
> +
> + /* update lifetime (RFC2462 5.5.3 e) */
> + spin_lock_bh(&ifp->lock);
> + now = jiffies;
> + if (ifp->valid_lft > (now - ifp->tstamp) / HZ)
> + stored_lft = ifp->valid_lft - (now - ifp->tstamp) / HZ;
> + else
> + stored_lft = 0;
> + if (!update_lft && !create && stored_lft) {
> + const u32 minimum_lft = min_t(u32,
> + stored_lft, MIN_VALID_LIFETIME);
> + valid_lft = max(valid_lft, minimum_lft);
> +
> + /* RFC4862 Section 5.5.3e:
> +  * "Note that the preferred lifetime of the
> +  *  corresponding address is always reset to
> +  *  the Preferred Lifetime in the received
> +  *  Prefix Information option, regardless of
> +  *  whether the valid lifetime is also reset or
> +  *  ignored."
> +  *
> +  * So we should always update prefered_lft here.
> +  */
> + update_lft = 1;
> + }
> +
> + if (update_lft) {
> + ifp->valid_lft = valid_lft;
> + ifp->prefered_lft = prefered_lft;
> + ifp->tstamp = now;
> + flags = ifp->flags;
> +

Re: [PATCHv3 net-next 06/12] ndisc: add __ndisc_fill_addr_option function

2016-06-15 Thread YOSHIFUJI Hideaki



Alexander Aring wrote:
> This patch adds __ndisc_fill_addr_option as low-level function for
> ndisc_fill_addr_option which doesn't depend on net_device parameter.
> 
> Cc: David S. Miller 
> Cc: Alexey Kuznetsov 
> Cc: James Morris 
> Cc: Hideaki YOSHIFUJI 
> Cc: Patrick McHardy 
> Signed-off-by: Alexander Aring 


Acked-by: YOSHIFUJI Hideaki 

> ---
>  net/ipv6/ndisc.c | 14 ++
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
> index c245895..a7b9468 100644
> --- a/net/ipv6/ndisc.c
> +++ b/net/ipv6/ndisc.c
> @@ -150,11 +150,10 @@ struct neigh_table nd_tbl = {
>  };
>  EXPORT_SYMBOL_GPL(nd_tbl);
>  
> -static void ndisc_fill_addr_option(struct sk_buff *skb, int type, void *data)
> +static void __ndisc_fill_addr_option(struct sk_buff *skb, int type, void 
> *data,
> +  int data_len, int pad)
>  {
> - int pad   = ndisc_addr_option_pad(skb->dev->type);
> - int data_len = skb->dev->addr_len;
> - int space = ndisc_opt_addr_space(skb->dev);
> + int space = __ndisc_opt_addr_space(data_len, pad);
>   u8 *opt = skb_put(skb, space);
>  
>   opt[0] = type;
> @@ -172,6 +171,13 @@ static void ndisc_fill_addr_option(struct sk_buff *skb, 
> int type, void *data)
>   memset(opt, 0, space);
>  }
>  
> +static inline void ndisc_fill_addr_option(struct sk_buff *skb, int type,
> +   void *data)
> +{
> + __ndisc_fill_addr_option(skb, type, data, skb->dev->addr_len,
> +  ndisc_addr_option_pad(skb->dev->type));
> +}
> +
>  static struct nd_opt_hdr *ndisc_next_option(struct nd_opt_hdr *cur,
>   struct nd_opt_hdr *end)
>  {
> 

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

Re: [PATCHv3 net-next 05/12] ndisc: add __ndisc_opt_addr_data function

2016-06-15 Thread YOSHIFUJI Hideaki



Alexander Aring wrote:
> This patch adds __ndisc_opt_addr_data as low-level function for
> ndisc_opt_addr_data which doesn't depend on net_device parameter.
> 
> Cc: David S. Miller 
> Cc: Alexey Kuznetsov 
> Cc: James Morris 
> Cc: Hideaki YOSHIFUJI 
> Cc: Patrick McHardy 
> Signed-off-by: Alexander Aring 

Acked-by: YOSHIFUJI Hideaki 

> ---
>  include/net/ndisc.h | 14 ++
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/include/net/ndisc.h b/include/net/ndisc.h
> index 4cee826..c8962ad 100644
> --- a/include/net/ndisc.h
> +++ b/include/net/ndisc.h
> @@ -138,17 +138,23 @@ static inline int ndisc_opt_addr_space(struct 
> net_device *dev)
> ndisc_addr_option_pad(dev->type));
>  }
>  
> -static inline u8 *ndisc_opt_addr_data(struct nd_opt_hdr *p,
> -   struct net_device *dev)
> +static inline u8 *__ndisc_opt_addr_data(struct nd_opt_hdr *p,
> + unsigned char addr_len, int prepad)
>  {
>   u8 *lladdr = (u8 *)(p + 1);
>   int lladdrlen = p->nd_opt_len << 3;
> - int prepad = ndisc_addr_option_pad(dev->type);
> - if (lladdrlen != ndisc_opt_addr_space(dev))
> + if (lladdrlen != __ndisc_opt_addr_space(addr_len, prepad))
>   return NULL;
>   return lladdr + prepad;
>  }
>  
> +static inline u8 *ndisc_opt_addr_data(struct nd_opt_hdr *p,
> +   struct net_device *dev)
> +{
> + return __ndisc_opt_addr_data(p, dev->addr_len,
> +  ndisc_addr_option_pad(dev->type));
> +}
> +
>  static inline u32 ndisc_hashfn(const void *pkey, const struct net_device 
> *dev, __u32 *hash_rnd)
>  {
>   const u32 *p32 = pkey;
> 

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

Re: [PATCHv3 net-next 04/12] ndisc: add __ndisc_opt_addr_space function

2016-06-15 Thread YOSHIFUJI Hideaki

Alexander Aring wrote:
> This patch adds __ndisc_opt_addr_space as low-level function for
> ndisc_opt_addr_space which doesn't depend on net_device parameter.
> 
> Cc: David S. Miller 
> Cc: Alexey Kuznetsov 
> Cc: James Morris 
> Cc: Hideaki YOSHIFUJI 
> Cc: Patrick McHardy 
> Signed-off-by: Alexander Aring 

Acked-by: YOSHIFUJI Hideaki 

> ---
>  include/net/ndisc.h | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/ndisc.h b/include/net/ndisc.h
> index 2d8edaa..4cee826 100644
> --- a/include/net/ndisc.h
> +++ b/include/net/ndisc.h
> @@ -127,10 +127,15 @@ static inline int ndisc_addr_option_pad(unsigned short 
> type)
>   }
>  }
>  
> +static inline int __ndisc_opt_addr_space(unsigned char addr_len, int pad)
> +{
> + return NDISC_OPT_SPACE(addr_len + pad);
> +}
> +
>  static inline int ndisc_opt_addr_space(struct net_device *dev)
>  {
> - return NDISC_OPT_SPACE(dev->addr_len +
> -ndisc_addr_option_pad(dev->type));
> + return __ndisc_opt_addr_space(dev->addr_len,
> +   ndisc_addr_option_pad(dev->type));
>  }
>  
>  static inline u8 *ndisc_opt_addr_data(struct nd_opt_hdr *p,
> 

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

Re: [PATCHv3 net-next 01/12] 6lowpan: add private neighbour data

2016-06-15 Thread YOSHIFUJI Hideaki



Alexander Aring wrote:
> This patch will introduce a 6lowpan neighbour private data. Like the
> interface private data we handle private data for generic 6lowpan and
> for link-layer specific 6lowpan.
> 
> The current first use case if to save the short address for a 802.15.4
> 6lowpan neighbour.
> 
> Cc: David S. Miller 
> Reviewed-by: Stefan Schmidt 
> Signed-off-by: Alexander Aring 

Acked-by: YOSHIFUJI Hideaki 

> ---
>  include/linux/netdevice.h |  3 +--
>  include/net/6lowpan.h | 10 ++
>  net/ieee802154/6lowpan/core.c | 12 
>  3 files changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index d101e4d..36e43bd 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1483,8 +1483,7 @@ enum netdev_priv_flags {
>   *   @perm_addr: Permanent hw address
>   *   @addr_assign_type:  Hw address assignment type
>   *   @addr_len:  Hardware address length
> - *   @neigh_priv_len;Used in neigh_alloc(),
> - *   initialized only in atm/clip.c
> + *   @neigh_priv_len:Used in neigh_alloc()
>   *   @dev_id:Used to differentiate devices that share
>   *   the same link layer address
>   *   @dev_port:  Used to differentiate devices that share
> diff --git a/include/net/6lowpan.h b/include/net/6lowpan.h
> index da84cf9..2d9b9d3 100644
> --- a/include/net/6lowpan.h
> +++ b/include/net/6lowpan.h
> @@ -141,6 +141,16 @@ struct lowpan_dev {
>   u8 priv[0] __aligned(sizeof(void *));
>  };
>  
> +struct lowpan_802154_neigh {
> + __le16 short_addr;
> +};
> +
> +static inline
> +struct lowpan_802154_neigh *lowpan_802154_neigh(void *neigh_priv)
> +{
> + return neigh_priv;
> +}
> +
>  static inline
>  struct lowpan_dev *lowpan_dev(const struct net_device *dev)
>  {
> diff --git a/net/ieee802154/6lowpan/core.c b/net/ieee802154/6lowpan/core.c
> index 4e2b308..8c004a0 100644
> --- a/net/ieee802154/6lowpan/core.c
> +++ b/net/ieee802154/6lowpan/core.c
> @@ -81,11 +81,21 @@ static int lowpan_stop(struct net_device *dev)
>   return 0;
>  }
>  
> +static int lowpan_neigh_construct(struct neighbour *n)
> +{
> + struct lowpan_802154_neigh *neigh = 
> lowpan_802154_neigh(neighbour_priv(n));
> +
> + /* default no short_addr is available for a neighbour */
> + neigh->short_addr = cpu_to_le16(IEEE802154_ADDR_SHORT_UNSPEC);
> + return 0;
> +}
> +
>  static const struct net_device_ops lowpan_netdev_ops = {
>   .ndo_init   = lowpan_dev_init,
>   .ndo_start_xmit = lowpan_xmit,
>   .ndo_open   = lowpan_open,
>   .ndo_stop   = lowpan_stop,
> + .ndo_neigh_construct= lowpan_neigh_construct,
>  };
>  
>  static void lowpan_setup(struct net_device *ldev)
> @@ -150,6 +160,8 @@ static int lowpan_newlink(struct net *src_net, struct 
> net_device *ldev,
>   wdev->needed_headroom;
>   ldev->needed_tailroom = wdev->needed_tailroom;
>  
> + ldev->neigh_priv_len = sizeof(struct lowpan_802154_neigh);
> +
>   ret = lowpan_register_netdevice(ldev, LOWPAN_LLTYPE_IEEE802154);
>   if (ret < 0) {
>   dev_put(wdev);
> 

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

Re: [PATCHv3 net-next 00/12] 6lowpan: introduce 6lowpan-nd

2016-06-15 Thread YOSHIFUJI Hideaki

Hi,

Alexander Aring wrote:
> Alexander Aring (12):
>   6lowpan: add private neighbour data
>   6lowpan: add 802.15.4 short addr slaac
>   6lowpan: remove ipv6 module request
>   ndisc: add __ndisc_opt_addr_space function
>   ndisc: add __ndisc_opt_addr_data function
>   ndisc: add __ndisc_fill_addr_option function
>   addrconf: put prefix address add in an own function
>   ipv6: introduce neighbour discovery ops
>   ipv6: export several functions
>   6lowpan: introduce 6lowpan-nd
>   6lowpan: add support for getting short address
>   6lowpan: add support for 802.15.4 short addr handling
> 
>  include/linux/netdevice.h |   8 +-
>  include/net/6lowpan.h |  16 +++
>  include/net/addrconf.h|  10 ++
>  include/net/ndisc.h   | 248 
> +++---
>  net/6lowpan/6lowpan_i.h   |   4 +
>  net/6lowpan/Makefile  |   2 +-
>  net/6lowpan/core.c|  50 -
>  net/6lowpan/debugfs.c |  39 +++
>  net/6lowpan/iphc.c| 167 +++-
>  net/6lowpan/ndisc.c   | 234 +++
>  net/ieee802154/6lowpan/core.c |  12 ++
>  net/ieee802154/6lowpan/tx.c   | 113 +--
>  net/ipv6/addrconf.c   | 218 +
>  net/ipv6/ndisc.c  | 123 +
>  net/ipv6/route.c  |   8 +-
>  15 files changed, 1004 insertions(+), 248 deletions(-)
>  create mode 100644 net/6lowpan/ndisc.c
> 

Looks good to me.

Acked-by: YOSHIFUJI Hideaki 

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

Re: [PATCH] net: Fragment large datagrams even when IP_HDRINCL is set.

2016-06-08 Thread YOSHIFUJI Hideaki

Hi

Please do not use "top-posting" here.

Alan Davey wrote:
> Hi David
> 
> Thank you for your email.  I understand raw IP sockets have worked this way 
> for a long time, but I think that there is real benefit in this patch for 
> little risk.  Please take a look at the following and let me know what you 
> think.
> 
> -  The current behaviour is counter-intuitive (fragmentation takes place in 
> all other cases) and therefore different to what everyone expects.

I do NEVER expect fragmentation occurs with IP_HDRINCL, at least.

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

Re: [PATCH] ping: ICMP error replies while errno < 0 is a hard error

2016-06-07 Thread YOSHIFUJI Hideaki

Hi,

Jason A. Donenfeld wrote:
> There are some odd conditions in which a device will return an error for
> sendto, while at the same time an ICMP error response is generated. In
> this case, with the current code, the packet is retransmitted in a
> flood, which is not what anybody wants. In fact, when these two
> conditions occur, we want to treat the packet as a "hard local error"
> and do the ordinary pause as before. The reasoning is that if we did
> receive an ICMP error message, then the packet was transmitted, and so
> it should be accounted for. But, since we also received an errno, this
> should be reported to the user, as something is wonky with their network
> interface.
> 
> Signed-off-by: Jason A. Donenfeld 

Applied. Thank you.

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

Re: [PATCH iputils 1/6] start gitignore files

2016-06-02 Thread YOSHIFUJI Hideaki

Hi,

Mike Frysinger wrote:
> On 02 Jun 2016 11:06, YOSHIFUJI Hideaki wrote:
>> Mike Frysinger wrote:
>>> Signed-off-by: Mike Frysinger 
>>> ---
>>>  .gitignore | 22 ++
>>>  doc/.gitignore |  2 ++
>>>  2 files changed, 24 insertions(+)
>>>  create mode 100644 .gitignore
>>>  create mode 100644 doc/.gitignore
>>>
>>
>> files under ninfod/ as well, maybe?
> 
> perhaps ... i've never seen this dir before.  it has generated files
> committed to the git repo though which kind on conflicts with using
> gitignore.
> 
> maybe merge this and then wait for someone who actually builds/tests
> ninfod to submit an update ?
> -mike
> 

Okay, applied.  Thanks.

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

Re: [PATCH iputils v2] ping6: allow disabling of openssl/libgcrypt support

2016-06-02 Thread YOSHIFUJI Hideaki

Hi,

Mike Frysinger wrote:
> Signed-off-by: Mike Frysinger 
> ---
>  Makefile |  5 -
>  iputils_md5dig.h |  4 +++-
>  ping6.c  | 19 ++-
>  3 files changed, 25 insertions(+), 3 deletions(-)
> 
> diff --git a/Makefile b/Makefile
> index b6cf512..8b9e2aa 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -36,7 +36,7 @@ ARPING_DEFAULT_DEVICE=
>  
>  # Libgcrypt (for MD5) for ping6 [yes|no|static]
>  USE_GCRYPT=yes
> -# Crypto library for ping6 [shared|static]
> +# Crypto library for ping6 [shared|static|no]
>  USE_CRYPTO=shared
>  # Resolv library for ping6 [yes|static]
>  USE_RESOLV=yes
> @@ -66,7 +66,10 @@ ifneq ($(USE_GCRYPT),no)
>   LIB_CRYPTO = $(call FUNC_LIB,$(USE_GCRYPT),$(LDFLAG_GCRYPT))
>   DEF_CRYPTO = -DUSE_GCRYPT
>  else
> +ifneq ($(USE_CRYPTO),no)
>   LIB_CRYPTO = $(call FUNC_LIB,$(USE_CRYPTO),$(LDFLAG_CRYPTO))
> + DEF_CRYPTO = -DUSE_OPENSSL
> +endif
>  endif
>  
>  # USE_RESOLV: LIB_RESOLV
> diff --git a/iputils_md5dig.h b/iputils_md5dig.h
> index 4cec866..d6c4d46 100644
> --- a/iputils_md5dig.h
> +++ b/iputils_md5dig.h
> @@ -5,8 +5,10 @@
>  # include 
>  # include 
>  # define IPUTILS_MD5DIG_LEN  16
> -#else
> +# define USE_CRYPTO
> +#elif defined(USE_OPENSSL)
>  # include 
> +# define USE_CRYPTO
>  #endif
>  
>  #ifdef USE_GCRYPT

Please define ENABLE_NIQUERY (or something else) for
USE_GCRYPT || USE_CRYPTO case and use it in sources.

> diff --git a/ping6.c b/ping6.c
> index 6d1a6db..db7ec4a 100644
> --- a/ping6.c
> +++ b/ping6.c


> @@ -891,6 +899,7 @@ int main(int argc, char *argv[])
>   }
>  #endif
>  
> +#ifdef USE_CRYPTO
>   if (niquery_is_enabled()) {
>   niquery_init_nonce();
>  

Make niquery_is_enabled() returns 0 without ENABLE_NIQUERY
and reduce #ifdefs.

Thank you.

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

Re: [PATCH iputils 5/6] tftpd: fix syslog setup

2016-06-02 Thread YOSHIFUJI Hideaki

Hi,

Mike Frysinger wrote:
> On 02 Jun 2016 11:10, YOSHIFUJI Hideaki wrote:
>> Mike Frysinger wrote:
>>> Commit d81a44625b04d487c895473aa77af13420b7afdd added support for checking
>>> the set*id calls, but would call syslog() before it had called openlog().
>>> Move the call up earlier to fix that.
>>
>> Please describe what it really fixes.
> 
> i already did: you need to call openlog before calling syslog, but the
> code here very clearly doesn't.  if you read the commit i referenced,
> and the all <25 lines of context here, it should be pretty obvious the
> code is wrong.
> -mike
> 

OK, applied. Thank you.

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

Re: [PATCH iputils v2] ping: always accept . delimiter with -i number parsing

2016-06-02 Thread YOSHIFUJI Hideaki

Hi,

Mike Frysinger wrote:
> Always allow #.# format for the -i flag even when the current locale
> uses a different separator.  Locale de_DE which uses #,# normally.
> 
> Simple testcase:
> $ make USE_IDN=1
> $ LANG=de_DE.UTF8 ./ping -i 0.5 localhost
> $ LANG=de_DE.UTF8 ./ping -i 0,5 localhost
> 
> Reported-by: Sergey Fionov 
> Signed-off-by: Mike Frysinger 
> ---

Applied, thanks.

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

Re: [PATCH iputils 2/6] doc: fix parallel build of html/man pages

2016-06-01 Thread YOSHIFUJI Hideaki

Hi,

Mike Frysinger wrote:
> The use of the same tempdir prevents building of these files in parallel.
> So build all of them in unique tempdirs so we can do them in parallel.
> 
> Signed-off-by: Mike Frysinger 

Applied.  Thank you.

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

Re: [PATCH iputils 4/6] fix handling of CFLAGS

2016-06-01 Thread YOSHIFUJI Hideaki

Mike Frysinger wrote:
> This defaults CFLAGS to -O3 without clobbering settings people have set
> up in the environment already.
> 
> Signed-off-by: Mike Frysinger 

Fixed in different way. Thank you.

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

Re: [PATCH iputils 5/6] tftpd: fix syslog setup

2016-06-01 Thread YOSHIFUJI Hideaki

Hi,

Mike Frysinger wrote:
> Commit d81a44625b04d487c895473aa77af13420b7afdd added support for checking
> the set*id calls, but would call syslog() before it had called openlog().
> Move the call up earlier to fix that.

Please describe what it really fixes.

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

Re: [PATCH iputils 6/6] ping: fix -i number parsing in locales

2016-06-01 Thread YOSHIFUJI Hideaki



Mike Frysinger wrote:
> Always use #.# format for the -i flag even when the current locale uses
> a different separator.  Locale de_DE which uses #,# normally.
> 
> Simple testcase:
> $ make USE_IDN=1
> $ LANG=de_DE.UTF8 ./ping -i 0.5 localhost
> 
> Reported-by: Sergey Fionov 
> Signed-off-by: Mike Frysinger 
> ---
>  ping_common.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/ping_common.c b/ping_common.c
> index 62f53a6..0a37e09 100644
> --- a/ping_common.c
> +++ b/ping_common.c
> @@ -269,9 +269,17 @@ void common_options(int ch)
>   double dbl;
>   char *ep;
>  
> +#ifdef USE_IDN
> + setlocale(LC_ALL, "C");
> +#endif
> +
>   errno = 0;
>   dbl = strtod(optarg, &ep);
>  
> +#ifdef USE_IDN
> + setlocale(LC_ALL, "");
> +#endif
> +
>   if (errno || *ep != '\0' ||
>   !finite(dbl) || dbl < 0.0 || dbl >= (double)INT_MAX / 1000 
> - 1.0) {
>   fprintf(stderr, "ping: bad timing interval\n");
> 

Please make it accept both.

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

Re: [PATCH iputils 1/6] start gitignore files

2016-06-01 Thread YOSHIFUJI Hideaki

Hi,

Mike Frysinger wrote:
> Signed-off-by: Mike Frysinger 
> ---
>  .gitignore | 22 ++
>  doc/.gitignore |  2 ++
>  2 files changed, 24 insertions(+)
>  create mode 100644 .gitignore
>  create mode 100644 doc/.gitignore
> 

files under ninfod/ as well, maybe?

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

Re: [PATCH iputils 3/6] ping6: allow disabling of openssl support

2016-06-01 Thread YOSHIFUJI Hideaki

Hi,

Mike Frysinger wrote:
> Signed-off-by: Mike Frysinger 
> ---
>  Makefile |  5 -
>  iputils_md5dig.h |  4 +++-
>  ping6.c  | 10 ++
>  3 files changed, 17 insertions(+), 2 deletions(-)
> 
:

> diff --git a/ping6.c b/ping6.c
> index 6d1a6db..cd140e2 100644
> --- a/ping6.c
> +++ b/ping6.c
> @@ -324,6 +324,7 @@ static void niquery_init_nonce(void)
>  #if !PING6_NONCE_MEMORY
>  static int niquery_nonce(__u8 *nonce, int fill)
>  {
> +# ifdef USE_CRYPTO
>   static __u8 digest[MD5_DIGEST_LENGTH];
>   static int seq = -1;
>  
> @@ -346,6 +347,10 @@ static int niquery_nonce(__u8 *nonce, int fill)
>   return -1;
>   return ntohsp((__u16 *)nonce);
>   }
> +# else
> + fprintf(stderr, "ping6: function not available; crypto disabled\n");
> + exit(3);
> +# endif
>  }
>  #endif
>  
> @@ -500,6 +505,7 @@ static int niquery_option_subject_addr_handler(int index, 
> const char *arg)
>  
>  static int niquery_option_subject_name_handler(int index, const char *arg)
>  {
> +#ifdef USE_CRYPTO
>   static char nigroup_buf[INET6_ADDRSTRLEN + 1 + IFNAMSIZ];
>   unsigned char *dnptrs[2], **dpp, **lastdnptr;
>   int n;
> @@ -625,6 +631,10 @@ errexit:
>   free(idn);
>   free(name);
>   exit(1);
> +#else
> + fprintf(stderr, "ping6: function not available; crypto disabled\n");
> + exit(3);
> +#endif
>  }
>  
>  int niquery_option_help_handler(int index, const char *arg)
> 

NAK. If you really build ping without crypto libraries,
you should disable Node Ifnforamtion Query support completely.

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

Re: IPv6 extension header privileges

2016-05-26 Thread YOSHIFUJI Hideaki

Hi,

Tom Herbert wrote:
> Hi,
> 
> In ipv6_sockglue.c I noticed:
> 
> /* hop-by-hop / destination options are privileged option */
> retv = -EPERM;
> if (optname != IPV6_RTHDR && !ns_capable(net->user_ns, CAP_NET_RAW))
>break;
> 
> Can anyone provide that rationale as to why these are privileged ops?

It is better to disallow by default for security.
FreeBSD does this in the same way.
We may have sysctl bitmaps, of course.

--yoshfuji

> 
> Thanks,
> Tom
> 

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

Re: [RFC 08/12] ipv6: introduce neighbour discovery ops

2016-05-24 Thread YOSHIFUJI Hideaki

Hi,

Alexander Aring wrote:
> This patch introduces neighbour discovery ops callback structure. The
> idea is to separate the handling for 6LoWPAN into the 6lowpan module.
> 
> These callback offers 6lowpan different handling, such as 802.15.4 short
> address handling or RFC6775 (Neighbor Discovery Optimization for IPv6
> over 6LoWPANs).
> 
> Cc: David S. Miller 
> Cc: Alexey Kuznetsov 
> Cc: James Morris 
> Cc: Hideaki YOSHIFUJI 
> Cc: Patrick McHardy 
> Signed-off-by: Alexander Aring 
> ---
>  include/linux/netdevice.h |   5 ++
>  include/net/ndisc.h   | 176 
> +-
>  net/ipv6/addrconf.c   |   9 ++-
>  net/ipv6/ndisc.c  | 119 +--
>  net/ipv6/route.c  |  14 ++--
>  5 files changed, 275 insertions(+), 48 deletions(-)
> 

> @@ -205,6 +376,9 @@ void ndisc_send_redirect(struct sk_buff *skb, const 
> struct in6_addr *target);
>  int ndisc_mc_map(const struct in6_addr *addr, char *buf, struct net_device 
> *dev,
>int dir);
>  
> +void ndisc_neigh_update(const struct net_device *dev, struct neighbour 
> *neigh,
> + const u8 *lladdr, u8 new, u32 flags, u8 icmp6_type,
> + struct ndisc_options *ndopts);
>  

I prefer ndisc_update().


>  /*
>   *   IGMP
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 393cdbf..4506cac 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -2531,7 +2531,7 @@ void addrconf_prefix_rcv(struct net_device *dev, u8 
> *opt, int len, bool sllao)
>  
>   if (pinfo->autoconf && in6_dev->cnf.autoconf) {
>   struct in6_addr addr;
> - bool tokenized = false;
> + bool tokenized = false, dev_addr_generated = false;
>  
>   if (pinfo->prefix_len == 64) {
>   memcpy(&addr, &pinfo->prefix, 8);
> @@ -2551,6 +2551,8 @@ void addrconf_prefix_rcv(struct net_device *dev, u8 
> *opt, int len, bool sllao)
>  ipv6_inherit_eui64(addr.s6_addr + 8, 
> in6_dev)) {
>   in6_dev_put(in6_dev);
>   return;
> + } else {
> + dev_addr_generated = true;
>   }
>   goto ok;
>   }
> @@ -2564,6 +2566,11 @@ ok:
>addr_type, addr_flags, sllao,
>tokenized, valid_lft,
>prefered_lft);
> + ndisc_ops_prefix_rcv_add_addr(net, dev, pinfo, in6_dev, &addr,
> +   addr_type, addr_flags, sllao,
> +   tokenized, valid_lft,
> +   prefered_lft,
> +   dev_addr_generated);
>   }
>   inet6_prefix_notify(RTM_NEWPREFIX, in6_dev, pinfo);
>   in6_dev_put(in6_dev);
> diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
> index d794d64..99fd53c 100644
> --- a/net/ipv6/ndisc.c
> +++ b/net/ipv6/ndisc.c
> @@ -191,24 +191,28 @@ static struct nd_opt_hdr *ndisc_next_option(struct 
> nd_opt_hdr *cur,
>   return cur <= end && cur->nd_opt_type == type ? cur : NULL;
>  }
>  
> -static inline int ndisc_is_useropt(struct nd_opt_hdr *opt)
> +static inline int ndisc_is_useropt(const struct net_device *dev,
> +struct nd_opt_hdr *opt)
>  {
>   return opt->nd_opt_type == ND_OPT_RDNSS ||
> - opt->nd_opt_type == ND_OPT_DNSSL;
> + opt->nd_opt_type == ND_OPT_DNSSL ||
> + ndisc_ops_is_useropt(dev, opt->nd_opt_type);
>  }
>  
> -static struct nd_opt_hdr *ndisc_next_useropt(struct nd_opt_hdr *cur,
> +static struct nd_opt_hdr *ndisc_next_useropt(const struct net_device *dev,
> +  struct nd_opt_hdr *cur,
>struct nd_opt_hdr *end)
>  {
>   if (!cur || !end || cur >= end)
>   return NULL;
>   do {
>   cur = ((void *)cur) + (cur->nd_opt_len << 3);
> - } while (cur < end && !ndisc_is_useropt(cur));
> - return cur <= end && ndisc_is_useropt(cur) ? cur : NULL;
> + } while (cur < end && !ndisc_is_useropt(dev, cur));
> + return cur <= end && ndisc_is_useropt(dev, cur) ? cur : NULL;
>  }
>  
> -struct ndisc_options *ndisc_parse_options(u8 *opt, int opt_len,
> +struct ndisc_options *ndisc_parse_options(const struct net_device *dev,
> +   u8 *opt, int opt_len,
> struct ndisc_options *ndopts)
>  {
>   struct nd_opt_hdr *nd_opt = (struct nd_opt_hdr *)opt;
> @@ -223,6 +227,8 @@ struct ndisc_options *ndisc_parse_options(u8 *opt, int 
> opt_len,
>   l = nd_opt->nd_opt_len << 3;
>   if (opt_len < l || l == 0)
>   return NULL;
> +

Re: [RFC 04/12] ndisc: get rid off dev parameter in ndisc_opt_addr_space

2016-05-24 Thread YOSHIFUJI Hideaki



Alexander Aring wrote:
> This patch removes the net_device parameter from ndisc_opt_addr_space
> function. This can be useful for calling such functionality which
> doesn't depends on dev parameter. For current existing functionality
> which depends on dev parameter, we introduce ndisc_dev_opt_addr_space to have
> an easy replacement for the ndisc_opt_addr_space function.
> 
> Cc: David S. Miller 
> Cc: Alexey Kuznetsov 
> Cc: James Morris 
> Cc: Hideaki YOSHIFUJI 
> Cc: Patrick McHardy 
> Signed-off-by: Alexander Aring 
> ---
>  include/net/ndisc.h | 13 +
>  net/ipv6/ndisc.c| 12 ++--
>  2 files changed, 15 insertions(+), 10 deletions(-)
> 
> diff --git a/include/net/ndisc.h b/include/net/ndisc.h
> index 2d8edaa..dbc8d01 100644
> --- a/include/net/ndisc.h
> +++ b/include/net/ndisc.h
> @@ -127,10 +127,15 @@ static inline int ndisc_addr_option_pad(unsigned short 
> type)
>   }
>  }
>  
> -static inline int ndisc_opt_addr_space(struct net_device *dev)
> +static inline int ndisc_opt_addr_space(unsigned char addr_len, int pad)
>  {
> - return NDISC_OPT_SPACE(dev->addr_len +
> -ndisc_addr_option_pad(dev->type));
> + return NDISC_OPT_SPACE(addr_len + pad);
> +}
> +
> +static inline int ndisc_dev_opt_addr_space(const struct net_device *dev)
> +{
> + return ndisc_opt_addr_space(dev->addr_len,
> + ndisc_addr_option_pad(dev->type));
>  }
>  

I prefer not to change existing functions such as ndisc_opt_addr_space(),
and name new function __ndisc_opt_addr_space() etc.

Plus, my original thought (when I implement these functions) was to
have per-net_device ndisc_opt_addr_spece(), ndisc_opt_adr_data() etc.

What do you think of that?

>  static inline u8 *ndisc_opt_addr_data(struct nd_opt_hdr *p,
> @@ -139,7 +144,7 @@ static inline u8 *ndisc_opt_addr_data(struct nd_opt_hdr 
> *p,
>   u8 *lladdr = (u8 *)(p + 1);
>   int lladdrlen = p->nd_opt_len << 3;
>   int prepad = ndisc_addr_option_pad(dev->type);
> - if (lladdrlen != ndisc_opt_addr_space(dev))
> + if (lladdrlen != ndisc_opt_addr_space(dev->addr_len, prepad))
>   return NULL;
>   return lladdr + prepad;
>  }
> diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
> index c245895..2241f06 100644
> --- a/net/ipv6/ndisc.c
> +++ b/net/ipv6/ndisc.c
> @@ -152,9 +152,9 @@ EXPORT_SYMBOL_GPL(nd_tbl);
>  
>  static void ndisc_fill_addr_option(struct sk_buff *skb, int type, void *data)
>  {
> - int pad   = ndisc_addr_option_pad(skb->dev->type);
> + int pad = ndisc_addr_option_pad(skb->dev->type);
>   int data_len = skb->dev->addr_len;
> - int space = ndisc_opt_addr_space(skb->dev);
> + int space = ndisc_opt_addr_space(data_len, pad);
>   u8 *opt = skb_put(skb, space);
>  
>   opt[0] = type;
> @@ -509,7 +509,7 @@ void ndisc_send_na(struct net_device *dev, const struct 
> in6_addr *daddr,
>   if (!dev->addr_len)
>   inc_opt = 0;
>   if (inc_opt)
> - optlen += ndisc_opt_addr_space(dev);
> + optlen += ndisc_dev_opt_addr_space(dev);
>  
>   skb = ndisc_alloc_skb(dev, sizeof(*msg) + optlen);
>   if (!skb)
> @@ -574,7 +574,7 @@ void ndisc_send_ns(struct net_device *dev, const struct 
> in6_addr *solicit,
>   if (ipv6_addr_any(saddr))
>   inc_opt = false;
>   if (inc_opt)
> - optlen += ndisc_opt_addr_space(dev);
> + optlen += ndisc_dev_opt_addr_space(dev);
>  
>   skb = ndisc_alloc_skb(dev, sizeof(*msg) + optlen);
>   if (!skb)
> @@ -626,7 +626,7 @@ void ndisc_send_rs(struct net_device *dev, const struct 
> in6_addr *saddr,
>   }
>  #endif
>   if (send_sllao)
> - optlen += ndisc_opt_addr_space(dev);
> + optlen += ndisc_dev_opt_addr_space(dev);
>  
>   skb = ndisc_alloc_skb(dev, sizeof(*msg) + optlen);
>   if (!skb)
> @@ -1563,7 +1563,7 @@ void ndisc_send_redirect(struct sk_buff *skb, const 
> struct in6_addr *target)
>   memcpy(ha_buf, neigh->ha, dev->addr_len);
>   read_unlock_bh(&neigh->lock);
>   ha = ha_buf;
> - optlen += ndisc_opt_addr_space(dev);
> + optlen += ndisc_dev_opt_addr_space(dev);
>   } else
>   read_unlock_bh(&neigh->lock);
>  
> 

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

Re: [RFC 00/12] 6lowpan: introduce 6lowpan-nd

2016-05-24 Thread YOSHIFUJI Hideaki

Hi.

Alexander Aring wrote:
> Hi,
> 
> this patch series introduces the ndisc ops callback structure to add different
> handling for IPv6 neighbour discovery cache functionality. It implements at 
> first
> the two following use-cases:
> 
>  - 6CO handling as userspace option (For all 6LoWPAN layers, BTLE/802.15.4) 
> [0]
>  - short address handling for 802.15.4 6LoWPAN only [1]
> 
> Since my last patch series, I completely changed the whole ndisc_ops callback
> structure to not replace the whole ndisc functionality at recv/send level of
> NS/NA/RS/RA which I send in my previous patch-series "6lowpan: introduce basic
> 6lowpan-nd". I changed it now to add different handling in a very low-level 
> way
> of ndisc functionality.

Thank you for your work!  It looks much better now.
Some comments will follow.

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

Re: [PATCHv2 bluetooth-next 00/10] 6lowpan: introduce basic 6lowpan-nd

2016-05-12 Thread YOSHIFUJI Hideaki

Hi,

Marcel Holtmann wrote:
> Hi Dave,
:
>> include/linux/netdevice.h |   6 +-
>> include/net/6lowpan.h |  24 ++
>> include/net/addrconf.h|   3 +
>> include/net/ndisc.h   | 124 -
>> net/6lowpan/6lowpan_i.h   |   2 +
>> net/6lowpan/Makefile  |   2 +-
>> net/6lowpan/core.c|  50 +++-
>> net/6lowpan/iphc.c| 167 +--
>> net/6lowpan/ndisc.c   | 633 
>> ++
>> net/bluetooth/6lowpan.c   |   2 +
>> net/ieee802154/6lowpan/core.c |  12 +
>> net/ieee802154/6lowpan/tx.c   | 107 ---
>> net/ipv6/addrconf.c   |   7 +-
>> net/ipv6/ndisc.c  | 132 +
>> net/ipv6/route.c  |   4 +-
>> 15 files changed, 1117 insertions(+), 158 deletions(-)
>> create mode 100644 net/6lowpan/ndisc.c
> 
> is there a chance that we get input into this patch set? I wonder also if it 
> would be acceptable to take this through bluetooth-next or should it better 
> go straight into net-next?

The core idea of introducing ndisc_ops is okay, but I do think this
series of patches should be refactored; we should not make another
"copy" of core of ndisc logic.  We can introduce "ops" for each
option, for example.

Thank you.

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

Re: [PATCH v2 2/3] net: ipv4: tcp_probe: Replace timespec with timespec64

2016-02-28 Thread YOSHIFUJI Hideaki



Deepa Dinamani wrote:
> TCP probe log timestamps use struct timespec which is
> not y2038 safe. Even though timespec might be good enough here
> as it is used to represent delta time, the plan is to get rid
> of all uses of timespec in the kernel.
> Replace with struct timespec64 which is y2038 safe.
> 
> Prints still use unsigned long format and type.
> 
> Signed-off-by: Deepa Dinamani 

Acked-by: YOSHIFUJI Hideaki 

> Reviewed-by: Arnd Bergmann 
> Cc: "David S. Miller" 
> Cc: Alexey Kuznetsov 
> Cc: James Morris 
> Cc: Hideaki YOSHIFUJI 
> Cc: Patrick McHardy 
> ---
>  net/ipv4/tcp_probe.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/net/ipv4/tcp_probe.c b/net/ipv4/tcp_probe.c
> index ebf5ff5..f6c50af 100644
> --- a/net/ipv4/tcp_probe.c
> +++ b/net/ipv4/tcp_probe.c
> @@ -187,13 +187,13 @@ static int tcpprobe_sprint(char *tbuf, int n)
>  {
>   const struct tcp_log *p
>   = tcp_probe.log + tcp_probe.tail;
> - struct timespec tv
> - = ktime_to_timespec(ktime_sub(p->tstamp, tcp_probe.start));
> + struct timespec64 ts
> + = ktime_to_timespec64(ktime_sub(p->tstamp, tcp_probe.start));
>  
>   return scnprintf(tbuf, n,
>   "%lu.%09lu %pISpc %pISpc %d %#x %#x %u %u %u %u %u\n",
> - (unsigned long)tv.tv_sec,
> - (unsigned long)tv.tv_nsec,
> + (unsigned long)ts.tv_sec,
> + (unsigned long)ts.tv_nsec,
>   &p->src, &p->dst, p->length, p->snd_nxt, p->snd_una,
>   p->snd_cwnd, p->ssthresh, p->snd_wnd, p->srtt, 
> p->rcv_wnd);
>  }
> 

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

Re: [PATCH v2 1/3] net: ipv4: Convert IP network timestamps to be y2038 safe

2016-02-28 Thread YOSHIFUJI Hideaki



Deepa Dinamani wrote:
> ICMP timestamp messages and IP source route options require
> timestamps to be in milliseconds modulo 24 hours from
> midnight UT format.
> 
> Add inet_current_timestamp() function to support this. The function
> returns the required timestamp in network byte order.
> 
> Timestamp calculation is also changed to call ktime_get_real_ts64()
> which uses struct timespec64. struct timespec64 is y2038 safe.
> Previously it called getnstimeofday() which uses struct timespec.
> struct timespec is not y2038 safe.
> 
> Signed-off-by: Deepa Dinamani 

Acked-by: YOSHIFUJI Hideaki 

--yoshfuji

> Cc: "David S. Miller" 
> Cc: Alexey Kuznetsov 
> Cc: Hideaki YOSHIFUJI 
> Cc: James Morris 
> Cc: Patrick McHardy 
> ---
>  include/net/ip.h  |  2 ++
>  net/ipv4/af_inet.c| 26 ++
>  net/ipv4/icmp.c   |  5 +
>  net/ipv4/ip_options.c | 14 ++
>  4 files changed, 35 insertions(+), 12 deletions(-)
> 
> diff --git a/include/net/ip.h b/include/net/ip.h
> index 1a98f1c..5d3a9eb 100644
> --- a/include/net/ip.h
> +++ b/include/net/ip.h
> @@ -240,6 +240,8 @@ static inline int inet_is_local_reserved_port(struct net 
> *net, int port)
>  }
>  #endif
>  
> +__be32 inet_current_timestamp(void);
> +
>  /* From inetpeer.c */
>  extern int inet_peer_threshold;
>  extern int inet_peer_minttl;
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index eade66d..408e2b3 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1386,6 +1386,32 @@ out:
>   return pp;
>  }
>  
> +#define SECONDS_PER_DAY  86400
> +
> +/* inet_current_timestamp - Return IP network timestamp
> + *
> + * Return milliseconds since midnight in network byte order.
> + */
> +__be32 inet_current_timestamp(void)
> +{
> + u32 secs;
> + u32 msecs;
> + struct timespec64 ts;
> +
> + ktime_get_real_ts64(&ts);
> +
> + /* Get secs since midnight. */
> + (void)div_u64_rem(ts.tv_sec, SECONDS_PER_DAY, &secs);
> + /* Convert to msecs. */
> + msecs = secs * MSEC_PER_SEC;
> + /* Convert nsec to msec. */
> + msecs += (u32)ts.tv_nsec / NSEC_PER_MSEC;
> +
> + /* Convert to network byte order. */
> + return htons(msecs);
> +}
> +EXPORT_SYMBOL(inet_current_timestamp);
> +
>  int inet_recv_error(struct sock *sk, struct msghdr *msg, int len, int 
> *addr_len)
>  {
>   if (sk->sk_family == AF_INET)
> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
> index 36e2697..6333489 100644
> --- a/net/ipv4/icmp.c
> +++ b/net/ipv4/icmp.c
> @@ -931,7 +931,6 @@ static bool icmp_echo(struct sk_buff *skb)
>   */
>  static bool icmp_timestamp(struct sk_buff *skb)
>  {
> - struct timespec tv;
>   struct icmp_bxm icmp_param;
>   /*
>*  Too short.
> @@ -942,9 +941,7 @@ static bool icmp_timestamp(struct sk_buff *skb)
>   /*
>*  Fill in the current time as ms since midnight UT:
>*/
> - getnstimeofday(&tv);
> - icmp_param.data.times[1] = htonl((tv.tv_sec % 86400) * MSEC_PER_SEC +
> -  tv.tv_nsec / NSEC_PER_MSEC);
> + icmp_param.data.times[1] = inet_current_timestamp();
>   icmp_param.data.times[2] = icmp_param.data.times[1];
>   if (skb_copy_bits(skb, 0, &icmp_param.data.times[0], 4))
>   BUG();
> diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c
> index bd24679..4d158ff 100644
> --- a/net/ipv4/ip_options.c
> +++ b/net/ipv4/ip_options.c
> @@ -58,10 +58,9 @@ void ip_options_build(struct sk_buff *skb, struct 
> ip_options *opt,
>   if (opt->ts_needaddr)
>   ip_rt_get_source(iph+opt->ts+iph[opt->ts+2]-9, skb, rt);
>   if (opt->ts_needtime) {
> - struct timespec tv;
>   __be32 midtime;
> - getnstimeofday(&tv);
> - midtime = htonl((tv.tv_sec % 86400) * MSEC_PER_SEC + 
> tv.tv_nsec / NSEC_PER_MSEC);
> +
> + midtime = inet_current_timestamp();
>   memcpy(iph+opt->ts+iph[opt->ts+2]-5, &midtime, 4);
>   }
>   return;
> @@ -415,11 +414,10 @@ int ip_options_compile(struct net *net,
>   break;
>   }
>   if (timeptr) {
> - struct timespec tv;
> - u32  midtime;
> - getnstimeofday(&tv);
> -

Re: [PATCH 1/4] kernel: time: Add current_nw_timestamp() for network timestamps

2016-02-25 Thread YOSHIFUJI Hideaki/吉藤英明

Hi,

Deepa Dinamani wrote:
>>>  include/linux/ip.h |  2 ++
>>>  include/linux/time64.h |  3 +++
>>>  kernel/time/time.c | 26 ++
>>>  3 files changed, 31 insertions(+)
>>>
>> Since net/ipv4/* are the only users, it is enough to put
>> it in under net/ipv4/.
> 
> time.c hosts functions that are used by individual subsystems like
> current_fs_time() used by filesystems
> (sometimes used by other subsystems also).
> 
> The network timestamp function is used for both source route ip option
> and timestamp icmp messages.
> So it makes it difficult for it to be owned by a single layer.
> This is the reason it was chosen to include here.
> 
> Another option is to include it in the lowest layer its used:
> af_inet.c. Is this what you were suggesting?
> 

Yes, that's right.

--yoshfuji

> -Deepa
>

Re: [PATCH 1/4] kernel: time: Add current_nw_timestamp() for network timestamps

2016-02-25 Thread YOSHIFUJI Hideaki

Hi,

Deepa Dinamani wrote:
> ICMP timestamp messages and IP source route options require
> timestamps to be in milliseconds modulo 24 hours from
> midnight UTC format.
> 
> Add a time function to support this. The function returns the
> required timestamp in network byte order.
> 
> The function also uses y2038 safe time functions and data
> structures.
> 
> Signed-off-by: Deepa Dinamani 
> Cc: John Stultz 
> Cc: Thomas Gleixner 
> Cc: "David S. Miller" 
> Cc: Alexey Kuznetsov 
> Cc: James Morris 
> Cc: Hideaki YOSHIFUJI 
> Cc: Patrick McHardy 
> ---
>  include/linux/ip.h |  2 ++
>  include/linux/time64.h |  3 +++
>  kernel/time/time.c | 26 ++
>  3 files changed, 31 insertions(+)
> 
> diff --git a/include/linux/ip.h b/include/linux/ip.h
> index 492bc65..edf923e 100644
> --- a/include/linux/ip.h
> +++ b/include/linux/ip.h
> @@ -34,4 +34,6 @@ static inline struct iphdr *ipip_hdr(const struct sk_buff 
> *skb)
>  {
>   return (struct iphdr *)skb_transport_header(skb);
>  }
> +
> +extern __be32 current_nw_timestamp(void);
>  #endif   /* _LINUX_IP_H */
> diff --git a/include/linux/time64.h b/include/linux/time64.h
> index 367d5af..5b5db3b 100644
> --- a/include/linux/time64.h
> +++ b/include/linux/time64.h
> @@ -35,6 +35,9 @@ struct itimerspec64 {
>  #define NSEC_PER_SEC 10L
>  #define FSEC_PER_SEC 1000LL
>  
> +#define SECONDS_PER_DAY  86400
> +#define MSEC_PER_DAY (SECONDS_PER_DAY * MSEC_PER_SEC)
> +
>  /* Located here for timespec[64]_valid_strict */
>  #define TIME64_MAX   ((s64)~((u64)1 << 63))
>  #define KTIME_MAX((s64)~((u64)1 << 63))
> diff --git a/kernel/time/time.c b/kernel/time/time.c
> index 86751c6..6df15df 100644
> --- a/kernel/time/time.c
> +++ b/kernel/time/time.c
> @@ -245,6 +245,32 @@ struct timespec current_fs_time(struct super_block *sb)
>  EXPORT_SYMBOL(current_fs_time);
>  
>  /*
> + * current_nw_timestamp - Return network time
> + *
> + * Return milliseconds since midnight in network byte order.
> + */
> +__be32 current_nw_timestamp(void)
> +{
> + u64 now;
> + u32 secs;
> + u32 msecs;
> + struct timespec64 ts;
> +
> + ktime_get_ts64(&ts);
> +
> + /* Get secs since midnight. */
> + now = div_u64_rem(ts.tv_sec, SECONDS_PER_DAY, &secs);
> + /* Convert to msecs. */
> + msecs = secs * MSEC_PER_SEC;
> + /* Convert nsec to msec. */
> + msecs += (u32)ts.tv_nsec/NSEC_PER_MSEC;
> +
> + /* Convert to network byte order. */
> + return htons(msecs);
> +}
> +EXPORT_SYMBOL(current_nw_timestamp);
> +
> +/*
>   * Convert jiffies to milliseconds and back.
>   *
>   * Avoid unnecessary multiplications/divisions in the
> 

Since net/ipv4/* are the only users, it is enough to put
it in under net/ipv4/.

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

Re: [PATCH 3/3] gianfar: fix endianness for hardware timestamp

2016-02-24 Thread YOSHIFUJI Hideaki

Hi,

Yangbo Lu wrote:
>> -Original Message-
>> From: Richard Cochran [mailto:richardcoch...@gmail.com]
>> Sent: Monday, February 22, 2016 4:48 PM
>> To: Yangbo Lu
>> Cc: netdev@vger.kernel.org; Claudiu Manoil
>> Subject: Re: [PATCH 3/3] gianfar: fix endianness for hardware timestamp
>>
>> On Mon, Feb 22, 2016 at 02:49:33PM +0800, Yangbo Lu wrote:
>>> @@ -2708,6 +2708,7 @@ static void gfar_clean_tx_ring(struct
>> gfar_priv_tx_q *tx_queue)
>>> struct skb_shared_hwtstamps shhwtstamps;
>>> u64 *ns = (u64 *)(((uintptr_t)skb->data + 0x10) &
>>>   ~0x7UL);
>>> +   *ns = be64_to_cpu(*ns);
>>>
>>> memset(&shhwtstamps, 0, sizeof(shhwtstamps));
>>> shhwtstamps.hwtstamp = ns_to_ktime(*ns);
>>
>> There is no point in modifying the buffer data in place.
>>
>> Instead, do this:
>>
>>  memset(&shhwtstamps, 0, sizeof(shhwtstamps));
>>  shhwtstamps.hwtstamp = ns_to_ktime(be64_to_cpu(*ns));
>>
>> or this:
>>
>>  u64 ns, *ptr = (u64 *)(((uintptr_t)skb->data + 0x10) &
>> ~0x7UL);
>>  ns = be64_to_cpu(*ptr);
>>
>>  memset(&shhwtstamps, 0, sizeof(shhwtstamps));
>>  shhwtstamps.hwtstamp = ns_to_ktime(ns);
>>
> 
> [Lu Yangbo-B47093] I will modify codes according your suggestion.
> Thank you so much!

You may want to use PTR_ALIGN() and be64_to_cpup() here.

--yoshfuji

>  
>>> @@ -3037,6 +3038,7 @@ static void gfar_process_frame(struct net_device
>> *ndev, struct sk_buff *skb)
>>> if (priv->hwts_rx_en) {
>>> struct skb_shared_hwtstamps *shhwtstamps = skb_hwtstamps(skb);
>>> u64 *ns = (u64 *) skb->data;
>>> +   *ns = be64_to_cpu(*ns);
>>>
>>> memset(shhwtstamps, 0, sizeof(*shhwtstamps));
>>> shhwtstamps->hwtstamp = ns_to_ktime(*ns);
>>
>> Same here.
>>
>> Thanks,
>> Richard

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

Re: [PATCH] net: ipv6: Make address flushing on ifdown optional

2016-02-16 Thread YOSHIFUJI Hideaki

Hi,

David Ahern wrote:
> On 2/16/16 1:45 AM, YOSHIFUJI Hideaki wrote:
>>> diff --git a/Documentation/networking/ip-sysctl.txt 
>>> b/Documentation/networking/ip-sysctl.txt
>>> index 24ce97f42d35..7ddbbb67f0db 100644
>>> --- a/Documentation/networking/ip-sysctl.txt
>>> +++ b/Documentation/networking/ip-sysctl.txt
>>> @@ -1563,6 +1563,12 @@ temp_prefered_lft - INTEGER
>>> Preferred lifetime (in seconds) for temporary addresses.
>>> Default: 86400 (1 day)
>>>   
>>> +keep_addr_on_down - BOOLEAN
>>> +   Keep all IPv6 addresses on an interface down event. If set static
>>> +   global addresses with no expiration time are not flushed.
>>> +
>>> +   Default: disabled
>>> +
>>
>> How about this:
>> 1: enabled
>> 0: system default
>>-1: disabled
>> so that an iterface can override system-wide config?
> 
> It is my understanding that the 'all' settings override the individual
> interface settings. From Documentation/networking/ip-sysctl.txt +1346:
> 
> conf/all/*:
> Change all the interface-specific settings.

Well, document is not correct.
1) Some of "all" variables set all interface specific settings.
2) Some of "all" variables override interface specific settings.
3) Some provide "fall-back" values; such an interface specific
   setting overrides the corresponding "all" variable.
   (Note: "default" variables are values per-interface settings
   are initialized to.)
4) Others are ignored (the exists but no-ops).

> 
> 
> -8<-
> 
> 
>>> diff --git a/include/net/if_inet6.h b/include/net/if_inet6.h
>>> index 1c8b6820b694..01ba6a286a4b 100644
>>> --- a/include/net/if_inet6.h
>>> +++ b/include/net/if_inet6.h
>>> @@ -72,6 +72,7 @@ struct inet6_ifaddr {
>>> int regen_count;
>>>   
>>> booltokenized;
>>> +   booluser_managed;
>>
>> Can't we use IFA_F_PERMANENT?
> 
> I think so. Will fix.
> 
> 
> -8<-
> 
>>> @@ -3356,7 +3413,9 @@ static int addrconf_ifdown(struct net_device *dev, 
>>> int how)
>>>   {
>>> struct net *net = dev_net(dev);
>>> struct inet6_dev *idev;
>>> -   struct inet6_ifaddr *ifa;
>>> +   struct inet6_ifaddr *ifa, *tmp;
>>> +   struct list_head del_list;
>>> +   int keep_addr;
>>> int state, i;
>>>   
>>> ASSERT_RTNL();
>>> @@ -3383,6 +3442,10 @@ static int addrconf_ifdown(struct net_device *dev, 
>>> int how)
>>>   
>>> }
>>>   
>>> +   keep_addr = net->ipv6.devconf_all->keep_addr_on_down;
>>> +   if (!keep_addr)
>>> +   keep_addr = idev->cnf.keep_addr_on_down;
>>> +
>>> /* Step 2: clear hash table */
>>> for (i = 0; i < IN6_ADDR_HSIZE; i++) {
>>> struct hlist_head *h = &inet6_addr_lst[i];
> 
> So what I have here is if the system-wide setting says keep the address
> it is kept. Else if the individual interface setting is enabled the
> address is kept.
> 
> 
> 

Other admin may want to enable it system-wide with some exceptions.

And well, you could just check per-interface configuration; 4 above.

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

Re: [PATCH] net: ipv6: Make address flushing on ifdown optional

2016-02-16 Thread YOSHIFUJI Hideaki

Hi,

David Ahern wrote:
> Currently, all ipv6 addresses are flushed when the interface is configured
> down, including global, static addresses:
> 
> $ ip -6 addr add dev eth1 2000:11:1:1::1/64
> $ ip addr show dev eth1
> 3: eth1:  mtu 1500 qdisc noop state DOWN group 
> default qlen 1000
> link/ether 02:04:11:22:33:01 brd ff:ff:ff:ff:ff:ff
> inet6 2000:11:1:1::1/64 scope global tentative
>valid_lft forever preferred_lft forever
> $ ip link set dev eth1 up
> $ ip link set dev eth1 down
> $ ip addr show dev eth1
> 3: eth1:  mtu 1500 qdisc pfifo_fast state DOWN group 
> default qlen 1000
> link/ether 02:04:11:22:33:01 brd ff:ff:ff:ff:ff:ff
> 
> Add a new sysctl to make this behavior optional. The new setting defaults to
> flush all addresses to maintain backwards compatibility. When the set global
> addresses with no expire times are not flushed on an admin down:
> 
> $ echo 1 > /proc/sys/net/ipv6/conf/eth1/keep_addr_on_down
> $ ip -6 addr add dev eth1 2000:11:1:1::1/64
> $ ip addr show dev eth1
> 3: eth1:  mtu 1500 qdisc pfifo_fast state DOWN group 
> default qlen 1000
> link/ether 02:04:11:22:33:01 brd ff:ff:ff:ff:ff:ff
> inet6 2000:11:1:1::1/64 scope global tentative
>valid_lft forever preferred_lft forever
> $ ip link set dev eth1 up
> $ ip link set dev eth1 down
> $ ip addr show dev eth1
> 3: eth1:  mtu 1500 qdisc pfifo_fast state DOWN group 
> default qlen 1000
> link/ether 02:04:11:22:33:01 brd ff:ff:ff:ff:ff:ff
> inet6 2000:11:1:1::1/64 scope global
>valid_lft forever preferred_lft forever
> inet6 fe80::4:11ff:fe22:3301/64 scope link
>valid_lft forever preferred_lft forever
> 
> Signed-off-by: David Ahern 
> ---
> Dave: per the discussion at netconf tossing this out again. While the
>   failure semantics are not ideal it only occurs on GFP_ATOMIC
>   memory failures.
:
> diff --git a/Documentation/networking/ip-sysctl.txt 
> b/Documentation/networking/ip-sysctl.txt
> index 24ce97f42d35..7ddbbb67f0db 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -1563,6 +1563,12 @@ temp_prefered_lft - INTEGER
>   Preferred lifetime (in seconds) for temporary addresses.
>   Default: 86400 (1 day)
>  
> +keep_addr_on_down - BOOLEAN
> + Keep all IPv6 addresses on an interface down event. If set static
> + global addresses with no expiration time are not flushed.
> +
> + Default: disabled
> +

How about this:
   1: enabled
   0: system default
  -1: disabled
so that an iterface can override system-wide config?

>  max_desync_factor - INTEGER
>   Maximum value for DESYNC_FACTOR, which is a random value
>   that ensures that clients don't synchronize with each
> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> index 4b2267e1b7c3..7edc14fb66b6 100644
> --- a/include/linux/ipv6.h
> +++ b/include/linux/ipv6.h
> @@ -62,6 +62,7 @@ struct ipv6_devconf {
>   struct in6_addr secret;
>   } stable_secret;
>   __s32   use_oif_addrs_only;
> + __s32   keep_addr_on_down;
>   void*sysctl;
>  };
>  
> diff --git a/include/net/if_inet6.h b/include/net/if_inet6.h
> index 1c8b6820b694..01ba6a286a4b 100644
> --- a/include/net/if_inet6.h
> +++ b/include/net/if_inet6.h
> @@ -72,6 +72,7 @@ struct inet6_ifaddr {
>   int regen_count;
>  
>   booltokenized;
> + booluser_managed;

Can't we use IFA_F_PERMANENT?

> diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
> index ec117b65d5a5..395876060f50 100644
> --- a/include/uapi/linux/ipv6.h
> +++ b/include/uapi/linux/ipv6.h
> @@ -176,6 +176,7 @@ enum {
>   DEVCONF_IGNORE_ROUTES_WITH_LINKDOWN,
>   DEVCONF_DROP_UNICAST_IN_L2_MULTICAST,
>   DEVCONF_DROP_UNSOLICITED_NA,
> + DEVCONF_KEEP_ADDR_ON_DOWN,
>   DEVCONF_MAX
>  };
>  
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index ac0ba9e4e06b..0bcb0f538e54 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -216,6 +216,7 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
>   },
>   .use_oif_addrs_only = 0,
>   .ignore_routes_with_linkdown = 0,
> + .keep_addr_on_down  = 0,
>  };
>  
>  static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
> @@ -260,6 +261,7 @@ static struct ipv6_devconf ipv6_devconf_dflt 
> __read_mostly = {
>   },
>   .use_oif_addrs_only = 0,
>   .ignore_routes_with_linkdown = 0,
> + .keep_addr_on_down  = 0,
>  };
>  
>  /* Check if a valid qdisc is available */
> @@ -962,6 +964,7 @@ ipv6_add_addr(struct inet6_dev *idev, const struct 
> in6_addr *addr,
>   ifa->prefered_lft = prefered_lft;
>   ifa->cstamp = ifa->tstamp = jiffies;
>   ifa->tokenized = false;
> + ifa->user_managed = false;
>  
>

Re: [PATCH net v2 1/2] ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail()

2016-01-31 Thread YOSHIFUJI Hideaki



Hannes Frederic Sowa wrote:
> On 29.01.2016 12:30, Paolo Abeni wrote:
>> The current implementation of ip6_dst_lookup_tail basically
>> ignore the egress ifindex match: if the saddr is set,
>> ip6_route_output() purposefully ignores flowi6_oif, due
>> to the commit d46a9d678e4c ("net: ipv6: Dont add RT6_LOOKUP_F_IFACE
>> flag if saddr set"), if the saddr is 'any' the first route lookup
>> in ip6_dst_lookup_tail fails, but upon failure a second lookup will
>> be performed with saddr set, thus ignoring the ifindex constraint.
>>
>> This commit adds an output route lookup function variant, which
>> allows the caller to specify lookup flags, and modify
>> ip6_dst_lookup_tail() to enforce the ifindex match on the second
>> lookup via said helper.
>>
>> ip6_route_output() becames now a static inline function build on
>> top of ip6_route_output_flags(); as a side effect, out-of-tree
>> modules need now a GPL license to access the output route lookup
>> functionality.
>>
>> Signed-off-by: Paolo Abeni 
>> -- 
>> v1 -> v2 move the ip6_route_output() implementation into the header
> 
> Acked-by: Hannes Frederic Sowa 
> 
Acked-by: YOSHIFUJI Hideaki 

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

Re: [PATCH net v2 2/2] ipv6/udp: use sticky pktinfo egress ifindex on connect()

2016-01-31 Thread YOSHIFUJI Hideaki



Hannes Frederic Sowa wrote:
> On 29.01.2016 12:30, Paolo Abeni wrote:
>> Currently, the egress interface index specified via IPV6_PKTINFO
>> is ignored by __ip6_datagram_connect(), so that RFC 3542 section 6.7
>> can be subverted when the user space application calls connect()
>> before sendmsg().
>> Fix it by initializing properly flowi6_oif in connect() before
>> performing the route lookup.
>>
>> Signed-off-by: Paolo Abeni 
> 
> Acked-by: Hannes Frederic Sowa 
> 
> 

Acked-by: YOSHIFUJI Hideaki 

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

Re: [PATCH 1/1] net/ipv6: add sysctl option accept_ra_min_hop_limit

2016-01-05 Thread YOSHIFUJI Hideaki

Hi, Machida-san.

Yuki Machida wrote:
> Please apply the following patch to v4.1.x.
> 
> By ommit 6fd99094de2b ("ipv6: Don't reduce hop limit for an interface")

s/ommit/commit/

Futher comment below.

:
> Signed-off-by: Hangbin Liu 
> Acked-by: YOSHIFUJI Hideaki 
> Signed-off-by: David S. Miller 
> ---
>  Documentation/networking/ip-sysctl.txt |  8 
>  include/linux/ipv6.h   |  1 +
>  include/uapi/linux/ipv6.h  |  1 +
>  net/ipv6/addrconf.c| 10 ++
>  net/ipv6/ndisc.c   | 16 +++-
>  5 files changed, 27 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/networking/ip-sysctl.txt 
> b/Documentation/networking/ip-sysctl.txt
> index 071fb18..07fad3d 100644
:
> --- a/include/uapi/linux/ipv6.h
> +++ b/include/uapi/linux/ipv6.h
> @@ -171,6 +171,7 @@ enum {
>   DEVCONF_USE_OPTIMISTIC,
>   DEVCONF_ACCEPT_RA_MTU,
>   DEVCONF_STABLE_SECRET,

You have to add a hole for DEVCONF_USE_OIF_ADDRS_ONLY.

--yoshfuji

> + DEVCONF_ACCEPT_RA_MIN_HOP_LIMIT,
>   DEVCONF_MAX
>  };
>  

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ipv6/addrlabel: fix ip6addrlbl_get()

2015-12-22 Thread YOSHIFUJI Hideaki

Cong Wang wrote:
> On Mon, Dec 21, 2015 at 1:54 AM, Andrey Ryabinin
>  wrote:
>> ip6addrlbl_get() has never worked. If ip6addrlbl_hold() succeeded,
>> ip6addrlbl_get() will exit with '-ESRCH'. If ip6addrlbl_hold() failed,
>> ip6addrlbl_get() will use about to be free ip6addrlbl_entry pointer.
>>
>> Fix this by inverting ip6addrlbl_hold() check.
>>
>> Fixes: 2a8cc6c89039 ("[IPV6] ADDRCONF: Support RFC3484 configurable address 
>> selection policy table.")
>> Signed-off-by: Andrey Ryabinin 
> 
> Good catch!
> 
> Reviewed-by: Cong Wang 
Acked-by: YOSHIFUJI Hideaki 

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFCv3 bluetooth-next 3/4] ipv6: add ipv6_addr_prefix_copy

2015-12-01 Thread YOSHIFUJI Hideaki/吉藤英明

Hannes Frederic Sowa wrote:
> 
> 
> On Sun, Nov 29, 2015, at 12:34, Alexander Aring wrote:
>> This patch adds a static inline function ipv6_addr_prefix_copy which
>> copies a ipv6 address prefix(argument pfx) into the ipv6 address prefix.
>> The prefix len is given by plen as bits. This function mainly based on
>> ipv6_addr_prefix which copies one address prefix from address into a new
>> ipv6 address destination and zero all other address bits.
>>
>> The difference is that ipv6_addr_prefix_copy don't get a prefix from an
>> ipv6 address, it sets a prefix to an ipv6 address with keeping other
>> address bits. The use case is for context based address compression
>> inside 6LoWPAN IPHC header which keeping ipv6 prefixes inside a context
>> table to lookup address-bits without sending them.
>>
>> Cc: David S. Miller 
>> Cc: Alexey Kuznetsov 
>> Cc: James Morris 
>> Cc: Hideaki YOSHIFUJI 
>> Cc: Patrick McHardy 
>> Signed-off-by: Alexander Aring 
>> ---
>>  include/net/ipv6.h | 15 +++
>>  1 file changed, 15 insertions(+)
>>
>> diff --git a/include/net/ipv6.h b/include/net/ipv6.h
>> index e1a10b0..cd3881e6 100644
>> --- a/include/net/ipv6.h
>> +++ b/include/net/ipv6.h
>> @@ -382,6 +382,21 @@ static inline void ipv6_addr_prefix(struct in6_addr
>> *pfx,
>>  pfx->s6_addr[o] = addr->s6_addr[o] & (0xff00 >> b);
>>  }
>>  
>> +static inline void ipv6_addr_prefix_copy(struct in6_addr *addr,
>> +const struct in6_addr *pfx,
>> +int plen)
>> +{
>> +   /* caller must guarantee 0 <= plen <= 128 */
>> +   int o = plen >> 3,
>> +   b = plen & 0x7;
>> +
>> +   memcpy(addr->s6_addr, pfx, o);
>> +   if (b != 0) {
>> +   addr->s6_addr[o] &= ~(0xff00 >> b);
>> +   addr->s6_addr[o] |= (pfx->s6_addr[o] & (0xff00 >> b));
>> +   }
>> +}
>> +
> 
> Acked-by: Hannes Frederic Sowa 
Acked-by: YOSHIFUJI Hideaki 


> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
吉藤英明 
ミラクル・リナックス株式会社 技術本部 サポート部
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] ipv6: use a random ifid for headerless devices

2015-11-30 Thread YOSHIFUJI Hideaki

Hi,

Bjørn Mork wrote:
> 吉藤英明  writes:
> 
 +static int addrconf_ifid_random(u8 *eui, struct net_device *dev)
>>> +{
>>> +   get_random_bytes(eui, 8);
>>> +   eui[0] |= 0x02;
>>> +   return 0;
>>> +}
>>> +
>>
>> Since random identifier is locally assigned, drop the global bit
>> instead if setting it.
> 
> Yes, definitely. Thanks.  I'm considering reusing __ipv6_regen_rndid()
> which already does this correctly, and also avoids some locally assigned
> addresses with special meanings.
> 
> Another issue with the initial RFC is that every prefix will have a new
> random ifid, which isn't necessarily what the users expect.  I wonder if
> it would be acceptable to abuse the rndid field for storing a "permanent"
> random ifid?

Well, I think we should introduce ifid in inet6_dev.
After that we could use it from other ifid methods.

--yoshfuji

> 
> 
> Bjørn
> 

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] ipv6: kill sk_dst_lock

2015-11-30 Thread YOSHIFUJI Hideaki

Hi,

Eric Dumazet wrote:
> diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
> index 2bfb2ad2fab1..877f682989b8 100644
> --- a/include/net/ip6_route.h
> +++ b/include/net/ip6_route.h
> @@ -133,27 +133,18 @@ void rt6_clean_tohost(struct net *net, struct in6_addr 
> *gateway);
>  /*
>   *   Store a destination cache entry in a socket
>   */
> -static inline void __ip6_dst_store(struct sock *sk, struct dst_entry *dst,
> -const struct in6_addr *daddr,
> -const struct in6_addr *saddr)
> +static inline void ip6_dst_store(struct sock *sk, struct dst_entry *dst,
> +  const struct in6_addr *daddr,
> +  const struct in6_addr *saddr)
>  {
>   struct ipv6_pinfo *np = inet6_sk(sk);
> - struct rt6_info *rt = (struct rt6_info *) dst;
>  
> + np->dst_cookie = rt6_get_cookie((struct rt6_info *)dst);
>   sk_setup_caps(sk, dst);
>   np->daddr_cache = daddr;
>  #ifdef CONFIG_IPV6_SUBTREES
>   np->saddr_cache = saddr;
>  #endif
> - np->dst_cookie = rt6_get_cookie(rt);
> -}
> -

I believe you do not have to change function inside, right?

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 1/2] geneve: implement support for IPv6-based tunnels

2015-10-22 Thread YOSHIFUJI Hideaki

Hi,

John W. Linville wrote:
> NOTE: Link-local IPv6 addresses for remote endpoints are not supported,
> since the driver currently has no capacity for binding a geneve
> interface to a specific link.
> 
> Signed-off-by: John W. Linville 
> ---
> v5:
> - wrap declaration of sock6 in geneve_dev with IS_ENABLED(CONFIG_IPV6)
> - remove superfluous '!!' when assigning geneve->collect_md to bool
> - use skb_scrub_packet in IPv4 tx path as well 
> - check for NULL ip_tunnel_info pointer in geneve[6]_xmit_skb
> - use ipv6_addr_equal for comparing IPv6 addresses
> - more use of IS_ENABLED(CONFIG_IPV6) for preserving build integrity
> - reject link-local ipv6 address for remote tunnel endpoint
:

> @@ -870,15 +1147,35 @@ static int geneve_newlink(struct net *net, struct 
> net_device *dev,
:
> +
> + if (ipv6_addr_type(&remote.sin6.sin6_addr) &
> + IPV6_ADDR_LINKLOCAL)
> + netdev_dbg(dev, "link-local remote is unsupported\n");
> + return -EINVAL;
> + }
> +

This always returns -EINVAL; {} is missing.

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 1/2] geneve: implement support for IPv6-based tunnels

2015-10-20 Thread YOSHIFUJI Hideaki/吉藤英明

Hi,

John W. Linville wrote:
> Signed-off-by: John W. Linville 
> ---
> v4:
> - treat mode field of ip_tunnel_info as flags
> - add a missing IS_ENABLED(CONFIG_IPV6) to geneve_rx
> - remove unneeded flags field in geneve_dev
> - NULL-check parameter for __geneve_sock_release
> - check remote socket family for AF_UNSPEC in geneve_configure
> - rename geneve_get_{rt,dst} as geneve_get_{v4_rt,v6_dst}
> - refactor some error handling in the xmit paths
> 
> v3:
> - declare geneve_remote_unspec as static
> 
> v2:
> - do not require remote address for tx on metadata tunnels
> - pass correct sockaddr family to udp_tun_rx_dst in geneve_rx
> - accommodate both ipv4 and ipv6 sockets open on same tunnel
> - move declaration of geneve_get_dst for aesthetic purposes
> 
>  drivers/net/geneve.c | 459 
> +++
>  include/uapi/linux/if_link.h |   1 +
>  2 files changed, 377 insertions(+), 83 deletions(-)
> 
> diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
> index 8f5c02eed47d..217b472ab9e7 100644
> --- a/drivers/net/geneve.c
> +++ b/drivers/net/geneve.c
> @@ -46,16 +46,25 @@ struct geneve_net {
>  
>  static int geneve_net_id;
>  
> +union geneve_addr {
> + struct sockaddr_in sin;
> + struct sockaddr_in6 sin6;
> + struct sockaddr sa;
> +};
> +
> +static union geneve_addr geneve_remote_unspec = { .sa.sa_family = AF_UNSPEC, 
> };
> +
>  /* Pseudo network device */
>  struct geneve_dev {
>   struct hlist_node  hlist;   /* vni hash table */
>   struct net *net;/* netns for packet i/o */
>   struct net_device  *dev;/* netdev for geneve tunnel */
> - struct geneve_sock *sock;   /* socket used for geneve tunnel */
> + struct geneve_sock *sock4;  /* IPv4 socket used for geneve tunnel */
> + struct geneve_sock *sock6;  /* IPv6 socket used for geneve tunnel */
>   u8 vni[3];  /* virtual network ID for tunnel */
>   u8 ttl; /* TTL override */
>   u8 tos; /* TOS override */
> - struct sockaddr_in remote;  /* IPv4 address for link partner */
> + union geneve_addr  remote;  /* IP address for link partner */
>   struct list_head   next;/* geneve's per namespace list */
>   __be16 dst_port;
>   bool   collect_md;
> @@ -103,11 +112,32 @@ static struct geneve_dev *geneve_lookup(struct 
> geneve_sock *gs,
>   vni_list_head = &gs->vni_list[hash];
>   hlist_for_each_entry_rcu(geneve, vni_list_head, hlist) {
>   if (!memcmp(vni, geneve->vni, sizeof(geneve->vni)) &&
> - addr == geneve->remote.sin_addr.s_addr)
> + addr == geneve->remote.sin.sin_addr.s_addr)
> + return geneve;
> + }
> + return NULL;
> +}
> +
> +#if IS_ENABLED(CONFIG_IPV6)
> +static struct geneve_dev *geneve6_lookup(struct geneve_sock *gs,
> +  struct in6_addr addr6, u8 vni[])
> +{
> + struct hlist_head *vni_list_head;
> + struct geneve_dev *geneve;
> + __u32 hash;
> +
> + /* Find the device for this VNI */
> + hash = geneve_net_vni_hash(vni);
> + vni_list_head = &gs->vni_list[hash];
> + hlist_for_each_entry_rcu(geneve, vni_list_head, hlist) {
> + if (!memcmp(vni, geneve->vni, sizeof(geneve->vni)) &&
> + !memcmp(&addr6, &geneve->remote.sin6.sin6_addr,
> + sizeof(addr6)))

Please use ipv6_addr_equal().
How do you handle link-local addresses here?

>   return geneve;
>   }
>   return NULL;
>  }
> +#endif
>  
>  static inline struct genevehdr *geneve_hdr(const struct sk_buff *skb)
>  {
> @@ -121,24 +151,49 @@ static void geneve_rx(struct geneve_sock *gs, struct 
> sk_buff *skb)
>   struct metadata_dst *tun_dst = NULL;
>   struct geneve_dev *geneve = NULL;
>   struct pcpu_sw_netstats *stats;
> - struct iphdr *iph;
> - u8 *vni;
> + struct iphdr *iph = NULL;
>   __be32 addr;
> - int err;
> + static u8 zero_vni[3];
> + u8 *vni;
> + int err = 0;
> + sa_family_t sa_family;
> +#if IS_ENABLED(CONFIG_IPV6)
> + struct ipv6hdr *ip6h = NULL;
> + struct in6_addr addr6;
> + static struct in6_addr zero_addr6;
> +#endif
>  
> - iph = ip_hdr(skb); /* outer IP header... */
> + sa_family = gs->sock->sk->sk_family;
>  
> - if (gs->collect_md) {
> - static u8 zero_vni[3];
> + if (sa_family == AF_INET) {
> + iph = ip_hdr(skb); /* outer IP header... */
>  
> - vni = zero_vni;
> - addr = 0;
> - } else {
> - vni = gnvh->vni;
> - addr = iph->saddr;
> - }
> + if (gs->collect_md) {
> + vni = zero_vni;
> + addr = 0;
> + } else {
> + vni = gnvh->vni;
> +
> + addr = iph->sad

Re: [PATCH net-next] Revert "net/ipv6: add sysctl option accept_ra_min_hop_limit"

2015-09-10 Thread YOSHIFUJI Hideaki



Sabrina Dubroca wrote:
> 2015-09-10, 14:52:45 +0900, YOSHIFUJI Hideaki wrote:
>> Sabrina Dubroca wrote:
>>> 2015-09-02, 16:11:10 -0700, David Miller wrote:
>>>> The only thing I would entertain is potentially an adjustment of the
>>>> default, working in concert with the TAHI folks to make sure their
>>>> tests still pass with any new default.
>>>
>>> Would you agree with a default of 64, as Florian suggested?
>>
>> 1 was chosen to restore our behavior before introduction of current
>> hoplimit check.  I am not in favor of changing that value.
> 
> But our old behavior had a security issue, which is why the >= current
> check was introduced.

We have the knob to "protect" ourselves now but it has drawbacks no to
accept lower values than specified.  We can never have ultimate default
for everybody.  The knob might "mitigate" the issue but once we have
any rouge routers on our L2, we lose anyway.  So, I do want to keep it
as-is not to change our traditional behavior.

> 
> 
>> Plus, 64 is too restrictive and 32 would be enough for global
>> internet, IMHO.
> 
> I guess I could live with that, if 32 is indeed enough for everybody.
> 
> 
>>> Can we still modify the behavior of this sysctl? It's already been in
>>> Linus's tree for a while, but if we can, I would rather restrict the
>>> values we let the user write to accept_ra_min_hop_limit, as anything
>>> outside [0..255] does not really make sense.
>>
>> [1..256], maybe, but it is not harmful to set outside the range.
>> 0 is always ignored. If it is set to 256 or more, the option is
>> completely ignored.
> 
> Not harmful, but maybe slightly misleading, and requires the "< 256"
> check when processing a RA.

The "Cur Hop Limit" field is 8bit long...

> 
> 
>>> Allowing an RA to update the hop limit if
>>>
>>>current hop limit < RA.hop_limit < accept_ra_min_hop_limit
>>>
>>> might also be desirable, but I'm not so sure about this case.
>>>
>>>
>>
>> It might be... byt I don't think it is a good idea since it becomes
>> more complex.
> 
> A bit more complex, yes.  But I don't think this should hold us back
> if it results in better behavior.
> 
> 
> Thanks,
> 

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] Revert "net/ipv6: add sysctl option accept_ra_min_hop_limit"

2015-09-09 Thread YOSHIFUJI Hideaki

Sabrina Dubroca wrote:
> 2015-09-02, 16:11:10 -0700, David Miller wrote:
>> From: Sabrina Dubroca 
>> Date: Wed, 2 Sep 2015 11:43:01 +0200
>>
>>> This reverts commit 8013d1d7eafb0589ca766db6b74026f76b7f5cb4.
>>>
>>> There are several issues with this patch.
>>> It completely cancels the security changes introduced by 6fd99094de2b
>>> ("ipv6: Don't reduce hop limit for an interface").
>>> The current default value (min hop limit = 1) can result in the same
>>> denial of service that 6fd99094de2b prevents, but it is hard to define
>>> a correct and sane default value.
>>> More generally, it is yet another IPv6 sysctl, and we already have too
>>> many.
>>>
>>> This was introduced to satisfy a TAHI test case which, in my opinion, is
>>> too strict, turning the RFC's "SHOULD" into a "MUST":
>>>
>>> If the received Cur Hop Limit value is non-zero, the host
>>> SHOULD set its CurHopLimit variable to the received value.
>>>
>>> The behavior of this sysctl is wrong in multiple ways.  Some are
>>> fixable, but let's not rush this commit into mainline, and revert this
>>> while we still can, then we can come up with a better solution.
>>>
>>> Signed-off-by: Sabrina Dubroca 
>>
>> I don't agree with this revert.
>>
>> If you look at the original commit, the quoted RFC recommends adding
>> a configurable method to protect against this.
>>
>> And that's exactly what the commit you are trying to revert is doing.
>>
>> The only thing I would entertain is potentially an adjustment of the
>> default, working in concert with the TAHI folks to make sure their
>> tests still pass with any new default.
> 
> Would you agree with a default of 64, as Florian suggested?

1 was chosen to restore our behavior before introduction of current
hoplimit check.  I am not in favor of changing that value.

Plus, 64 is too restrictive and 32 would be enough for global
internet, IMHO.

> 
> 
> Can we still modify the behavior of this sysctl? It's already been in
> Linus's tree for a while, but if we can, I would rather restrict the
> values we let the user write to accept_ra_min_hop_limit, as anything
> outside [0..255] does not really make sense.

[1..256], maybe, but it is not harmful to set outside the range.
0 is always ignored. If it is set to 256 or more, the option is
completely ignored.

> 
> Allowing an RA to update the hop limit if
> 
>current hop limit < RA.hop_limit < accept_ra_min_hop_limit
> 
> might also be desirable, but I'm not so sure about this case.
> 
> 

It might be... byt I don't think it is a good idea since it becomes
more complex.

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 net-next 13/13] ipv6: route: per route IP tunnel metadata via lightweight tunnel

2015-08-19 Thread YOSHIFUJI Hideaki

Jiri Benc wrote:
> On Wed, 19 Aug 2015 19:27:22 +0900, YOSHIFUJI Hideaki wrote:
>>> You're right generally. But this one should be okay and I did this
>>> deliberately: the patch adding LWTUNNEL_ENCAP_ILA was merged two days
>>> ago, is in net-next only, is not used by anything in user space yet.
>>> And I think it's better to have LWTUNNEL_ENCAP_IP and
>>> LWTUNNEL_ENCAP_IP6 without anything in between.
>>
>> I do think you should have some descriptions.
> 
> Sorry, I meant to put this into the description but forget to add it
> after the rebase on top of ILA (as the patchset conflicted with the ILA
> work and was developed in parallel).
>
> Are you okay with inserting LWTUNNEL_ENCAP_IP6 before
> LWTUNNEL_ENCAP_ILA? If so, I'll resend with the explanation added.

Well, I think we should always avoid adding new entries into the
middle of enums because it will make bisecting more complex or more
difficult for example even if it *seems* that we have no users yet and
the risk is not so high.

Dave?

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 net-next 13/13] ipv6: route: per route IP tunnel metadata via lightweight tunnel

2015-08-19 Thread YOSHIFUJI Hideaki

Jiri Benc wrote:
> On Wed, 19 Aug 2015 19:17:21 +0900, YOSHIFUJI Hideaki wrote:
>> Jiri Benc wrote:
>>> Allow specification of per route IP tunnel instructions also for IPv6.
>>> This complements commit 3093fbe7ff4b ("route: Per route IP tunnel metadata
>>> via lightweight tunnel").
>>>
>>> Signed-off-by: Jiri Benc 
>>> ---
>>>  include/uapi/linux/lwtunnel.h |  16 +++
>>>  net/ipv4/ip_tunnel_core.c | 102 
>>> ++
>>>  2 files changed, 118 insertions(+)
>>>
>>> diff --git a/include/uapi/linux/lwtunnel.h b/include/uapi/linux/lwtunnel.h
>>> index aa84ca396bcb..32a149571417 100644
>>> --- a/include/uapi/linux/lwtunnel.h
>>> +++ b/include/uapi/linux/lwtunnel.h
>>> @@ -7,6 +7,7 @@ enum lwtunnel_encap_types {
>>> LWTUNNEL_ENCAP_NONE,
>>> LWTUNNEL_ENCAP_MPLS,
>>> LWTUNNEL_ENCAP_IP,
>>> +   LWTUNNEL_ENCAP_IP6,
>>> LWTUNNEL_ENCAP_ILA,
>>> __LWTUNNEL_ENCAP_MAX,
>>>  };
>>
>> Please do not add new one in the middle of enums.
> 
> You're right generally. But this one should be okay and I did this
> deliberately: the patch adding LWTUNNEL_ENCAP_ILA was merged two days
> ago, is in net-next only, is not used by anything in user space yet.
> And I think it's better to have LWTUNNEL_ENCAP_IP and
> LWTUNNEL_ENCAP_IP6 without anything in between.

I do think you should have some descriptions.

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 net-next 13/13] ipv6: route: per route IP tunnel metadata via lightweight tunnel

2015-08-19 Thread YOSHIFUJI Hideaki

Jiri Benc wrote:
> Allow specification of per route IP tunnel instructions also for IPv6.
> This complements commit 3093fbe7ff4b ("route: Per route IP tunnel metadata
> via lightweight tunnel").
> 
> Signed-off-by: Jiri Benc 
> ---
>  include/uapi/linux/lwtunnel.h |  16 +++
>  net/ipv4/ip_tunnel_core.c | 102 
> ++
>  2 files changed, 118 insertions(+)
> 
> diff --git a/include/uapi/linux/lwtunnel.h b/include/uapi/linux/lwtunnel.h
> index aa84ca396bcb..32a149571417 100644
> --- a/include/uapi/linux/lwtunnel.h
> +++ b/include/uapi/linux/lwtunnel.h
> @@ -7,6 +7,7 @@ enum lwtunnel_encap_types {
>   LWTUNNEL_ENCAP_NONE,
>   LWTUNNEL_ENCAP_MPLS,
>   LWTUNNEL_ENCAP_IP,
> + LWTUNNEL_ENCAP_IP6,
>   LWTUNNEL_ENCAP_ILA,
>   __LWTUNNEL_ENCAP_MAX,
>  };

Please do not add new one in the middle of enums.

--yoshfuji
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] net: ipv4: increase dhcp inter device timeout

2015-08-07 Thread YOSHIFUJI Hideaki

Hi,

Mugunthan V N wrote:
> When a system has multiple ethernet devices and during DHCP
> request (for using NFS), the system waits only for HZ/2 which is
> 500mS before switching to another interface for DHCP.
> 
> There are some routers (Ex: Trendnet routers) which responds to
> DHCP request at about 560mS. When the system has only one
> ethernet interface there is no issue as the timeout is 2S and the
> dev xid doesn't change and only retries.
> 
> But when the system has multiple Ethernet like DRA74x with CPSW
> in dual EMAC mode, the DHCP response is dropped as the dev xid
> changes while shifting to the next device. So changing inter
> device timeout to HZ (which is 1S).
> 
> Signed-off-by: Mugunthan V N 
> ---
>  net/ipv4/ipconfig.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c
> index 8e7328c..bdb8cb5 100644
> --- a/net/ipv4/ipconfig.c
> +++ b/net/ipv4/ipconfig.c
> @@ -94,7 +94,7 @@
>  /* Define the timeout for waiting for a DHCP/BOOTP/RARP reply */
>  #define CONF_OPEN_RETRIES2   /* (Re)open devices twice */
>  #define CONF_SEND_RETRIES6   /* Send six requests per open */
> -#define CONF_INTER_TIMEOUT   (HZ/2)  /* Inter-device timeout: 1/2 second */
> +#define CONF_INTER_TIMEOUT   (HZ)/* Inter-device timeout: 1/2 second */

You should update comment as well at least.

--yoshfuji

>  #define CONF_BASE_TIMEOUT(HZ*2)  /* Initial timeout: 2 seconds */
>  #define CONF_TIMEOUT_RANDOM  (HZ)/* Maximum amount of randomization */
>  #define CONF_TIMEOUT_MULT*7/4/* Rate of timeout growth */
> 

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [[PATCH net-next v5] net/ipv6: add sysctl option accept_ra_min_hop_limit

2015-07-29 Thread YOSHIFUJI Hideaki

Hi,

Hangbin Liu wrote:
> Commit 6fd99094de2b ("ipv6: Don't reduce hop limit for an interface")
> disabled accept hop limit from RA if it is smaller than the current hop
> limit for security stuff. But this behavior kind of break the RFC definition.
> 
> RFC 4861, 6.3.4.  Processing Received Router Advertisements
>A Router Advertisement field (e.g., Cur Hop Limit, Reachable Time,
>and Retrans Timer) may contain a value denoting that it is
>unspecified.  In such cases, the parameter should be ignored and the
>host should continue using whatever value it is already using.
> 
>If the received Cur Hop Limit value is non-zero, the host SHOULD set
>its CurHopLimit variable to the received value.
> 
> So add sysctl option accept_ra_min_hop_limit to let user choose the minimum
> hop limit value they can accept from RA. And set default to 1 to meet RFC
> standards.
> 
> Signed-off-by: Hangbin Liu 

Acked-by: YOSHIFUJI Hideaki 

> ---
>  Documentation/networking/ip-sysctl.txt |  8 
>  include/linux/ipv6.h   |  1 +
>  include/uapi/linux/ipv6.h  |  1 +
>  net/ipv6/addrconf.c| 10 ++
>  net/ipv6/ndisc.c   | 16 +++-
>  5 files changed, 27 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/networking/ip-sysctl.txt 
> b/Documentation/networking/ip-sysctl.txt
> index 1a5ab21b..00d26d9 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -1340,6 +1340,14 @@ accept_ra_from_local - BOOLEAN
>  disabled if accept_ra_from_local is disabled
> on a specific interface.
>  
> +accept_ra_min_hop_limit - INTEGER
> + Minimum hop limit Information in Router Advertisement.
> +
> + Hop limit Information in Router Advertisement less than this
> + variable shall be ignored.
> +
> + Default: 1
> +
>  accept_ra_pinfo - BOOLEAN
>   Learn Prefix Information in Router Advertisement.
>  
> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> index 06ed637..cb9dcad 100644
> --- a/include/linux/ipv6.h
> +++ b/include/linux/ipv6.h
> @@ -29,6 +29,7 @@ struct ipv6_devconf {
>   __s32   max_desync_factor;
>   __s32   max_addresses;
>   __s32   accept_ra_defrtr;
> + __s32   accept_ra_min_hop_limit;
>   __s32   accept_ra_pinfo;
>  #ifdef CONFIG_IPV6_ROUTER_PREF
>   __s32   accept_ra_rtr_pref;
> diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
> index 641a146..80f3b74 100644
> --- a/include/uapi/linux/ipv6.h
> +++ b/include/uapi/linux/ipv6.h
> @@ -172,6 +172,7 @@ enum {
>   DEVCONF_ACCEPT_RA_MTU,
>   DEVCONF_STABLE_SECRET,
>   DEVCONF_USE_OIF_ADDRS_ONLY,
> + DEVCONF_ACCEPT_RA_MIN_HOP_LIMIT,
>   DEVCONF_MAX
>  };
>  
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index eb0c6a3..53e3a9d 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -195,6 +195,7 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
>   .max_addresses  = IPV6_MAX_ADDRESSES,
>   .accept_ra_defrtr   = 1,
>   .accept_ra_from_local   = 0,
> + .accept_ra_min_hop_limit= 1,
>   .accept_ra_pinfo= 1,
>  #ifdef CONFIG_IPV6_ROUTER_PREF
>   .accept_ra_rtr_pref = 1,
> @@ -237,6 +238,7 @@ static struct ipv6_devconf ipv6_devconf_dflt 
> __read_mostly = {
>   .max_addresses  = IPV6_MAX_ADDRESSES,
>   .accept_ra_defrtr   = 1,
>   .accept_ra_from_local   = 0,
> + .accept_ra_min_hop_limit= 1,
>   .accept_ra_pinfo= 1,
>  #ifdef CONFIG_IPV6_ROUTER_PREF
>   .accept_ra_rtr_pref = 1,
> @@ -4588,6 +4590,7 @@ static inline void ipv6_store_devconf(struct 
> ipv6_devconf *cnf,
>   array[DEVCONF_MAX_DESYNC_FACTOR] = cnf->max_desync_factor;
>   array[DEVCONF_MAX_ADDRESSES] = cnf->max_addresses;
>   array[DEVCONF_ACCEPT_RA_DEFRTR] = cnf->accept_ra_defrtr;
> + array[DEVCONF_ACCEPT_RA_MIN_HOP_LIMIT] = cnf->accept_ra_min_hop_limit;
>   array[DEVCONF_ACCEPT_RA_PINFO] = cnf->accept_ra_pinfo;
>  #ifdef CONFIG_IPV6_ROUTER_PREF
>   array[DEVCONF_ACCEPT_RA_RTR_PREF] = cnf->accept_ra_rtr_pref;
> @@ -5485,6 +5488,13 @@ static struct addrconf_sysctl_table
>   .proc_handler   = proc_dointvec,
>   },
>   {
> + .procname   = "accept_ra_min_hop_limit",
> + .data   = &ipv6_devconf.accept_ra_min_hop_limit,
> + .maxlen = sizeof(int),
> + .m

Re: [PATCH net-next v4] net/ipv6: add sysctl option accept_ra_min_hop_limit

2015-07-29 Thread YOSHIFUJI Hideaki

Hi,

Hangbin Liu wrote:
> Commit 6fd99094de2b ("ipv6: Don't reduce hop limit for an interface")
> disabled accept hop limit from RA if it is higher than the current hop

higher?

> limit for security stuff. But this behavior kind of break the RFC definition.
> 
> RFC 4861, 6.3.4.  Processing Received Router Advertisements
>A Router Advertisement field (e.g., Cur Hop Limit, Reachable Time,
>and Retrans Timer) may contain a value denoting that it is
>unspecified.  In such cases, the parameter should be ignored and the
>host should continue using whatever value it is already using.
> 
>If the received Cur Hop Limit value is non-zero, the host SHOULD set
>its CurHopLimit variable to the received value.
> 
> So add sysctl option accept_ra_min_hop_limit to let user choose the minimum
> hop limit value they can accept from RA. And set default to 1 to meet RFC
> standards.
> 
> Signed-off-by: Hangbin Liu 
> ---
>  Documentation/networking/ip-sysctl.txt |  8 
>  include/linux/ipv6.h   |  1 +
>  include/uapi/linux/ipv6.h  |  1 +
>  net/ipv6/addrconf.c| 10 ++
>  net/ipv6/ndisc.c   | 14 +-
>  5 files changed, 25 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/networking/ip-sysctl.txt 
> b/Documentation/networking/ip-sysctl.txt
> index 1a5ab21b..00d26d9 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -1340,6 +1340,14 @@ accept_ra_from_local - BOOLEAN
>  disabled if accept_ra_from_local is disabled
> on a specific interface.
>  
> +accept_ra_min_hop_limit - INTEGER
> + Minimum hop limit Information in Router Advertisement.
> +
> + Hop limit Information in Router Advertisement less than this
> + variable shall be ignored.
> +
> + Default: 1
> +
>  accept_ra_pinfo - BOOLEAN
>   Learn Prefix Information in Router Advertisement.
>  
> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> index 06ed637..cb9dcad 100644
> --- a/include/linux/ipv6.h
> +++ b/include/linux/ipv6.h
> @@ -29,6 +29,7 @@ struct ipv6_devconf {
>   __s32   max_desync_factor;
>   __s32   max_addresses;
>   __s32   accept_ra_defrtr;
> + __s32   accept_ra_min_hop_limit;
>   __s32   accept_ra_pinfo;
>  #ifdef CONFIG_IPV6_ROUTER_PREF
>   __s32   accept_ra_rtr_pref;
> diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
> index 641a146..80f3b74 100644
> --- a/include/uapi/linux/ipv6.h
> +++ b/include/uapi/linux/ipv6.h
> @@ -172,6 +172,7 @@ enum {
>   DEVCONF_ACCEPT_RA_MTU,
>   DEVCONF_STABLE_SECRET,
>   DEVCONF_USE_OIF_ADDRS_ONLY,
> + DEVCONF_ACCEPT_RA_MIN_HOP_LIMIT,
>   DEVCONF_MAX
>  };
>  
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index eb0c6a3..53e3a9d 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -195,6 +195,7 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
>   .max_addresses  = IPV6_MAX_ADDRESSES,
>   .accept_ra_defrtr   = 1,
>   .accept_ra_from_local   = 0,
> + .accept_ra_min_hop_limit= 1,
>   .accept_ra_pinfo= 1,
>  #ifdef CONFIG_IPV6_ROUTER_PREF
>   .accept_ra_rtr_pref = 1,
> @@ -237,6 +238,7 @@ static struct ipv6_devconf ipv6_devconf_dflt 
> __read_mostly = {
>   .max_addresses  = IPV6_MAX_ADDRESSES,
>   .accept_ra_defrtr   = 1,
>   .accept_ra_from_local   = 0,
> + .accept_ra_min_hop_limit= 1,
>   .accept_ra_pinfo= 1,
>  #ifdef CONFIG_IPV6_ROUTER_PREF
>   .accept_ra_rtr_pref = 1,
> @@ -4588,6 +4590,7 @@ static inline void ipv6_store_devconf(struct 
> ipv6_devconf *cnf,
>   array[DEVCONF_MAX_DESYNC_FACTOR] = cnf->max_desync_factor;
>   array[DEVCONF_MAX_ADDRESSES] = cnf->max_addresses;
>   array[DEVCONF_ACCEPT_RA_DEFRTR] = cnf->accept_ra_defrtr;
> + array[DEVCONF_ACCEPT_RA_MIN_HOP_LIMIT] = cnf->accept_ra_min_hop_limit;
>   array[DEVCONF_ACCEPT_RA_PINFO] = cnf->accept_ra_pinfo;
>  #ifdef CONFIG_IPV6_ROUTER_PREF
>   array[DEVCONF_ACCEPT_RA_RTR_PREF] = cnf->accept_ra_rtr_pref;
> @@ -5485,6 +5488,13 @@ static struct addrconf_sysctl_table
>   .proc_handler   = proc_dointvec,
>   },
>   {
> + .procname   = "accept_ra_min_hop_limit",
> + .data   = &ipv6_devconf.accept_ra_min_hop_limit,
> + .maxlen = sizeof(int),
> + .mode   = 0644,
> + .proc_handler   = proc_dointvec,
> + },
> + {
>   .procname   = "accept_ra_pinfo",
>   .data   = &ipv6_devconf.accept_ra_pinfo,
>   .maxlen = sizeof(int),
> diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
> index 0a05b35..5619e64 100644
> --- a/

Re: [PATCHv3] net/ipv6: add sysctl option accept_ra_min_hop_limit

2015-07-29 Thread YOSHIFUJI Hideaki

Hi,

Hangbin Liu wrote:
>>> diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
>>> index 0a05b35..acda056 100644
>>> --- a/net/ipv6/ndisc.c
>>> +++ b/net/ipv6/ndisc.c
>>> @@ -1226,13 +1226,11 @@ static void ndisc_router_discovery(struct sk_buff 
>>> *skb)
>>>   if (rt)
>>>   rt6_set_expires(rt, jiffies + (HZ * lifetime));
>>>   if (ra_msg->icmph.icmp6_hop_limit) {
>>> - /* Only set hop_limit on the interface if it is higher than
>>> -  * the current hop_limit.
>>> -  */
>>> - if (in6_dev->cnf.hop_limit < ra_msg->icmph.icmp6_hop_limit) {
>>> + if (in6_dev->cnf.accept_ra_min_hop_limit <= 
>>> ra_msg->icmph.icmp6_hop_limit &&
>>> + ra_msg->icmph.icmp6_hop_limit != 0) {
>>>   in6_dev->cnf.hop_limit = 
>>> ra_msg->icmph.icmp6_hop_limit;
>>>   } else {
>>> - ND_PRINTK(2, warn, "RA: Got route advertisement with 
>>> lower hop_limit than current\n");
>>> + ND_PRINTK(2, warn, "RA: Got route advertisement with 
>>> lower hop_limit than minimum\n");
>>>   }
>>>   if (rt)
>>>   dst_metric_set(&rt->dst, RTAX_HOPLIMIT,
>>>
>>
>> Please see my comments against your previous patch.
> 
> I pasted you comments here so we don't need to discuss in two mails :)
> 
>>
>> ra_msg->icmph.icmp6_hop_limit != 0 is checkd by outer "if".
> 
> Yes, thanks for your reminding :)
> 
>>
>> You do not need to update cnf.hop_limit if it is already equal to
>> hop limit received.
> 
> We need to update cnf.hop_limit if min_hop_limit <= icmp6_hop_limit. e.g.
> current hop limit is 64, min hop limit is 1 and ra hop limit is 32, then we 
> need
> update current hop limit to 32.

OK

> 
>>
>> How about ignoring hop limit without message is configured value is
>> larger than 255, BTW?
> 
> Although set accept_ra_min_hop_limit great than 255 is meaningless,  there
> is also no need to check it since icmp6_hop_limit will not larger than 255. so
> 
> +   if (in6_dev->cnf.accept_ra_min_hop_limit <= 255 &&
> +   in6_dev->cnf.accept_ra_min_hop_limit <=
> ra_msg->icmph.icmp6_hop_limit )
> in6_dev->cnf.hop_limit = 
> ra_msg->icmph.icmp6_hop_limit;
> 
> is  duplicated check. How do you think?

How about checking in6_dev->cnf.accept_ra_min_hop_limit by outer if, then?


if (in6_dev->cnf.accept_ra_min_hop_limit < 256 &&
ra_msg->icmph.icmp6_hop_limit) {
...
}

> 
>>
>>>   if (rt)
>>>   dst_metric_set(&rt->dst, RTAX_HOPLIMIT,
>>>
>>
>> This can be inside the inner "if".
> 
> OK, will move it.
> 
> 
> Best Regards
> Hangbin
> 

Regards,

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv3] net/ipv6: add sysctl option accept_ra_min_hop_limit

2015-07-29 Thread YOSHIFUJI Hideaki

Hi,

Thank you for you updated patch.

Hangbin Liu wrote:
> Commit 6fd99094de2b ("ipv6: Don't reduce hop limit for an interface")
> disabled accept hop limit from RA if it is higher than the current hop
> limit for security stuff. But this behavior kind of break the RFC definition.
> 
> RFC 4861, 6.3.4.  Processing Received Router Advertisements
>A Router Advertisement field (e.g., Cur Hop Limit, Reachable Time,
>and Retrans Timer) may contain a value denoting that it is
>unspecified.  In such cases, the parameter should be ignored and the
>host should continue using whatever value it is already using.
> 
>If the received Cur Hop Limit value is non-zero, the host SHOULD set
>its CurHopLimit variable to the received value.
> 
> So add sysctl option accept_ra_min_hop_limit to let user choose the minimum
> hop limit value they can accept from RA. And set default to 1 to meet RFC
> standards.
> 
> Signed-off-by: Hangbin Liu 
> ---
>  Documentation/networking/ip-sysctl.txt |  8 
>  include/linux/ipv6.h   |  1 +
>  include/uapi/linux/ipv6.h  |  1 +
>  net/ipv6/addrconf.c| 10 ++
>  net/ipv6/ndisc.c   |  8 +++-
>  5 files changed, 23 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/networking/ip-sysctl.txt 
> b/Documentation/networking/ip-sysctl.txt
> index 5fae770..ced0a38 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -1346,6 +1346,14 @@ accept_ra_pinfo - BOOLEAN
>   Functional default: enabled if accept_ra is enabled.
>   disabled if accept_ra is disabled.
>  
> +accept_ra_min_hop_limit - INTEGER
> + Minimum hop limit Information in Router Advertisement.
> +
> + Hop limit Information in Router Advertisement less than this
> + variable shall be ignored.
> +
> + Default: 1
> +
>  accept_ra_rt_info_max_plen - INTEGER
>   Maximum prefix length of Route Information in RA.
>  
> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> index 82806c6..ac01ab4 100644
> --- a/include/linux/ipv6.h
> +++ b/include/linux/ipv6.h
> @@ -30,6 +30,7 @@ struct ipv6_devconf {
>   __s32   max_addresses;
>   __s32   accept_ra_defrtr;
>   __s32   accept_ra_pinfo;
> + __s32   accept_ra_min_hop_limit;

accept_ra_defrtr
accept_ra_min_hop_limit
accept_ra_pinfo
(matter of taste)

>  #ifdef CONFIG_IPV6_ROUTER_PREF
>   __s32   accept_ra_rtr_pref;
>   __s32   rtr_probe_interval;
> diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
> index 5efa54a..68094e33 100644
> --- a/include/uapi/linux/ipv6.h
> +++ b/include/uapi/linux/ipv6.h
> @@ -171,6 +171,7 @@ enum {
>   DEVCONF_USE_OPTIMISTIC,
>   DEVCONF_ACCEPT_RA_MTU,
>   DEVCONF_STABLE_SECRET,
> + DEVCONF_ACCEPT_RA_MIN_HOP_LIMIT,
>   DEVCONF_MAX
>  };
>  

This patch cannot apply to current net-next.  Please rebase your patch.

> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 21c2c81..77df8f2 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -196,6 +196,7 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
>   .accept_ra_defrtr   = 1,
>   .accept_ra_from_local   = 0,
>   .accept_ra_pinfo= 1,
> + .accept_ra_min_hop_limit= 1,
>  #ifdef CONFIG_IPV6_ROUTER_PREF
>   .accept_ra_rtr_pref = 1,
>   .rtr_probe_interval = 60 * HZ,
> @@ -237,6 +238,7 @@ static struct ipv6_devconf ipv6_devconf_dflt 
> __read_mostly = {
>   .accept_ra_defrtr   = 1,
>   .accept_ra_from_local   = 0,
>   .accept_ra_pinfo= 1,
> + .accept_ra_min_hop_limit= 1,
>  #ifdef CONFIG_IPV6_ROUTER_PREF
>   .accept_ra_rtr_pref = 1,
>   .rtr_probe_interval = 60 * HZ,
> @@ -4561,6 +4563,7 @@ static inline void ipv6_store_devconf(struct 
> ipv6_devconf *cnf,
>   array[DEVCONF_MAX_ADDRESSES] = cnf->max_addresses;
>   array[DEVCONF_ACCEPT_RA_DEFRTR] = cnf->accept_ra_defrtr;
>   array[DEVCONF_ACCEPT_RA_PINFO] = cnf->accept_ra_pinfo;
> + array[DEVCONF_ACCEPT_RA_MIN_HOP_LIMIT] = cnf->accept_ra_min_hop_limit;

DEFRTR, MIN_HOP_LIMIT then PINFO
(matter of taste)

>  #ifdef CONFIG_IPV6_ROUTER_PREF
>   array[DEVCONF_ACCEPT_RA_RTR_PREF] = cnf->accept_ra_rtr_pref;
>   array[DEVCONF_RTR_PROBE_INTERVAL] =
> @@ -5462,6 +5465,13 @@ static struct addrconf_sysctl_table
>   .mode   = 0644,
>   .proc_handler   = proc_dointvec,
>   },
> + {
> + .procname   = "accept_ra_min_hop_limit",
> + .data   = &ipv6_devconf.accept_ra_min_hop_limit,
> + .maxlen = sizeof(int),
> + .mode   = 0644,
> + .proc_handler   = proc_dointvec,
> + },
>  #ifdef CONFIG_IPV6_ROUTER_PREF
>

Re: [PATCH] net/ipv6: add sysctl option min_hop_limit

2015-07-29 Thread YOSHIFUJI Hideaki

Hi,

Hangbin Liu wrote:
> Commit 6fd99094de2b ("ipv6: Don't reduce hop limit for an interface")
> disabled accept hop limit from RA if it is higher than the current hop
> limit for security stuff. But this behavior kind of break the RFC definition.
> 
> RFC 4861, 6.3.4.  Processing Received Router Advertisements
>A Router Advertisement field (e.g., Cur Hop Limit, Reachable Time,
>and Retrans Timer) may contain a value denoting that it is
>unspecified.  In such cases, the parameter should be ignored and the
>host should continue using whatever value it is already using.
> 
>If the received Cur Hop Limit value is non-zero, the host SHOULD set
>its CurHopLimit variable to the received value.
> 
> So add sysctl option min_hop_limit to let user choose the minimum hop limit
> value they can accept from RA. And set default to 1 to meet RFC standards.
> 
> Signed-off-by: Hangbin Liu 
> ---
>  Documentation/networking/ip-sysctl.txt |  4 
>  include/linux/ipv6.h   |  1 +
>  include/uapi/linux/ipv6.h  |  1 +
>  net/ipv6/addrconf.c| 10 ++
>  net/ipv6/ndisc.c   |  8 +++-
>  5 files changed, 19 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/networking/ip-sysctl.txt 
> b/Documentation/networking/ip-sysctl.txt
> index 5fae770..844904b 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -1431,6 +1431,10 @@ hop_limit - INTEGER
>   Default Hop Limit to set.
>   Default: 64
>  
> +min_hop_limit - INTEGER
> + Minimum hop limit value can be accepted from Router Advertisement.
> + Default: 1
> +

I prefer accept_ra_min_hop_limit, as we have accept_ra_rt_info_max_plen.


>  mtu - INTEGER
>   Default Maximum Transfer Unit
>   Default: 1280 (IPv6 required minimum)
> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> index 82806c6..cd9e6af 100644
> --- a/include/linux/ipv6.h
> +++ b/include/linux/ipv6.h
> @@ -11,6 +11,7 @@
>  struct ipv6_devconf {
>   __s32   forwarding;
>   __s32   hop_limit;
> + __s32   min_hop_limit;
>   __s32   mtu6;
>   __s32   accept_ra;
>   __s32   accept_redirects;
> diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
> index 5efa54a..c1c09bb 100644
> --- a/include/uapi/linux/ipv6.h
> +++ b/include/uapi/linux/ipv6.h
> @@ -171,6 +171,7 @@ enum {
>   DEVCONF_USE_OPTIMISTIC,
>   DEVCONF_ACCEPT_RA_MTU,
>   DEVCONF_STABLE_SECRET,
> + DEVCONF_MIN_HOP_LIMIT,
>   DEVCONF_MAX
>  };
>  
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 21c2c81..b6ed39e 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -176,6 +176,7 @@ static bool ipv6_chk_same_addr(struct net *net, const 
> struct in6_addr *addr,
>  static struct ipv6_devconf ipv6_devconf __read_mostly = {
>   .forwarding = 0,
>   .hop_limit  = IPV6_DEFAULT_HOPLIMIT,
> + .min_hop_limit  = 1,
>   .mtu6   = IPV6_MIN_MTU,
>   .accept_ra  = 1,
>   .accept_redirects   = 1,
> @@ -217,6 +218,7 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
>  static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
>   .forwarding = 0,
>   .hop_limit  = IPV6_DEFAULT_HOPLIMIT,
> + .min_hop_limit  = 1,
>   .mtu6   = IPV6_MIN_MTU,
>   .accept_ra  = 1,
>   .accept_redirects   = 1,
> @@ -4538,6 +4540,7 @@ static inline void ipv6_store_devconf(struct 
> ipv6_devconf *cnf,
>   memset(array, 0, bytes);
>   array[DEVCONF_FORWARDING] = cnf->forwarding;
>   array[DEVCONF_HOPLIMIT] = cnf->hop_limit;
> + array[DEVCONF_MIN_HOP_LIMIT] = cnf->min_hop_limit;
>   array[DEVCONF_MTU6] = cnf->mtu6;
>   array[DEVCONF_ACCEPT_RA] = cnf->accept_ra;
>   array[DEVCONF_ACCEPT_REDIRECTS] = cnf->accept_redirects;
> @@ -5328,6 +5331,13 @@ static struct addrconf_sysctl_table
>   .proc_handler   = proc_dointvec,
>   },
>   {
> + .procname   = "min_hop_limit",
> + .data   = &ipv6_devconf.min_hop_limit,
> + .maxlen = sizeof(int),
> + .mode   = 0644,
> + .proc_handler   = proc_dointvec,
> + },
> + {
>   .procname   = "mtu",
>   .data   = &ipv6_devconf.mtu6,
>   .maxlen = sizeof(int),
> diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
> index 0a05b35..c22f3ac 100644
> --- a/net/ipv6/ndisc.c
> +++ b/net/ipv6/ndisc.c
> @@ -1226,13 +1226,11 @@ static void ndisc_router_discovery(struct sk_buff 
> *skb)
>   if (rt)
>   rt6_set_expires(rt, jiffies + (HZ * lifetime));
>   if (ra_msg

Re: [PATCHv2] net/ipv6: add sysctl option accept_ra_hop_limit

2015-07-28 Thread YOSHIFUJI Hideaki

Hangbin Liu wrote:
> 2015-07-28 11:58 GMT+08:00 YOSHIFUJI Hideaki
> :
>> Hi,
>>
>> Hangbin Liu wrote:
>>> 2015-07-28 7:50 GMT+08:00 YOSHIFUJI Hideaki/吉藤英明
>>> :
>>>> Hi,
>>>>
>>>> Hangbin Liu wrote:
>>>>> Commit 6fd99094de2b ("ipv6: Don't reduce hop limit for an interface")
>>>>> disabled accept hop limit from RA if it is higher than the current hop
>>>>> limit for security stuff. But this behavior kind of break the RFC 
>>>>> definition.
>>>>>
>>>>> RFC 4861, 6.3.4.  Processing Received Router Advertisements
>>>>>If the received Cur Hop Limit value is non-zero, the host SHOULD set
>>>>>its CurHopLimit variable to the received value.
>>>>>
>>>>> So add sysctl option accept_ra_hop_limit to let user choose whether accept
>>>>> hop limit info in RA.
>>>>>
>>>>
>>>> How about introducing "minimum hop limit", instead?
>>>
>>> Hi Yoshifuji,
>>>
>>> This is a good idea. Maybe this can be another sysctl option?
>>>
>>> The minimum hop limit can be an enhancement of the security issue, then we 
>>> will
>>> not only increase the hop limit, but also could decrease it in the
>>> range of values we
>>> accept.
>>>
>>> On the other hand, with this patch, we can enable, disable or partly
>>> enable accept
>>> hop limit. If we only use "minimum hop limit", people could not use a 
>>> static hop
>>> limit value.
>>>
>>> May be we use a “hop limit range" instead? How do you think?
>>
>> I think name of sysctl is the same as you suggested and change the
>> semantics.  default value is 0 to accept all hotlimit value
>> as before and people can set it to 32 (for example) to reject
>> too-small hoplimit (0-31).
> 
> OK, then I will try submit a "minimum hop limit", thanks for your suggestion 
> :)

accept_ra_min_hop_limit would be better as we have
accept_ra_rt_info_max_plen.

> 
> Regards
> Hangbin
>>
>> --yoshfuji
>>
>>>
>>> Thanks
>>> Hangbin
>>>
>>>>
>>>> |commit 6fd99094de2b83d1d4c8457f2c83483b2828e75a
>>>> |Author: D.S. Ljungmark 
>>>> |Date:   Wed Mar 25 09:28:15 2015 +0100
>>>> |
>>>> |ipv6: Don't reduce hop limit for an interface
>>>> :
>>>> |RFC 3756, Section 4.2.7, "Parameter Spoofing"
>>>> |
>>>> :
>>>> |   >   As an example, one possible approach to mitigate this threat is to
>>>> |>   ignore very small hop limits.  The nodes could implement a
>>>> |>   configurable minimum hop limit, and ignore attempts to set it 
>>>> below
>>>> |>   said limit.
>>
>> --
>> Hideaki Yoshifuji 
>> Technical Division, MIRACLE LINUX CORPORATION

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv2] net/ipv6: add sysctl option accept_ra_hop_limit

2015-07-27 Thread YOSHIFUJI Hideaki

Hi,

Hangbin Liu wrote:
> 2015-07-28 7:50 GMT+08:00 YOSHIFUJI Hideaki/吉藤英明
> :
>> Hi,
>>
>> Hangbin Liu wrote:
>>> Commit 6fd99094de2b ("ipv6: Don't reduce hop limit for an interface")
>>> disabled accept hop limit from RA if it is higher than the current hop
>>> limit for security stuff. But this behavior kind of break the RFC 
>>> definition.
>>>
>>> RFC 4861, 6.3.4.  Processing Received Router Advertisements
>>>If the received Cur Hop Limit value is non-zero, the host SHOULD set
>>>its CurHopLimit variable to the received value.
>>>
>>> So add sysctl option accept_ra_hop_limit to let user choose whether accept
>>> hop limit info in RA.
>>>
>>
>> How about introducing "minimum hop limit", instead?
> 
> Hi Yoshifuji,
> 
> This is a good idea. Maybe this can be another sysctl option?
> 
> The minimum hop limit can be an enhancement of the security issue, then we 
> will
> not only increase the hop limit, but also could decrease it in the
> range of values we
> accept.
> 
> On the other hand, with this patch, we can enable, disable or partly
> enable accept
> hop limit. If we only use "minimum hop limit", people could not use a static 
> hop
> limit value.
> 
> May be we use a “hop limit range" instead? How do you think?

I think name of sysctl is the same as you suggested and change the
semantics.  default value is 0 to accept all hotlimit value
as before and people can set it to 32 (for example) to reject
too-small hoplimit (0-31).

--yoshfuji

> 
> Thanks
> Hangbin
> 
>>
>> |commit 6fd99094de2b83d1d4c8457f2c83483b2828e75a
>> |Author: D.S. Ljungmark 
>> |Date:   Wed Mar 25 09:28:15 2015 +0100
>> |
>> |ipv6: Don't reduce hop limit for an interface
>> :
>> |RFC 3756, Section 4.2.7, "Parameter Spoofing"
>> |
>> :
>> |   >   As an example, one possible approach to mitigate this threat is to
>> |>   ignore very small hop limits.  The nodes could implement a
>> |>   configurable minimum hop limit, and ignore attempts to set it below
>> |>   said limit.

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v2 2/2] ipv6: Avoid rt6_probe() taking writer lock in the fast path

2015-07-27 Thread YOSHIFUJI Hideaki

Hi,

Martin KaFai Lau wrote:
> The patch checks neigh->nud_state before acquiring the writer lock.
> Note that rt6_probe() is only used in CONFIG_IPV6_ROUTER_PREF.
> 
> 40 udpflood processes and a /64 gateway route are used.
> The gateway has NUD_PERMANENT.  Each of them is run for 30s.
> At the end, the total number of finished sendto():
> 
> Before: 55M
> After: 95M

I think it is better to describe why it is okay without any locks.

--yoshfuji

> 
> Signed-off-by: Martin KaFai Lau 
> Cc: Hannes Frederic Sowa 
> CC: Julian Anastasov 
> CC: YOSHIFUJI Hideaki 
> ---
>  net/ipv6/route.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 6d503db..76dcff8 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -560,6 +560,9 @@ static void rt6_probe(struct rt6_info *rt)
>   rcu_read_lock_bh();
>   neigh = __ipv6_neigh_lookup_noref(rt->dst.dev, &rt->rt6i_gateway);
>   if (neigh) {
> + if (neigh->nud_state & NUD_VALID)
> + goto out;
> +
>   work = NULL;
>   write_lock(&neigh->lock);
>   if (!(neigh->nud_state & NUD_VALID) &&
> @@ -583,6 +586,7 @@ static void rt6_probe(struct rt6_info *rt)
>   schedule_work(&work->work);
>   }
>  
> +out:
>   rcu_read_unlock_bh();
>  }
>  #else
> 

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v2 1/2] ipv6: Re-arrange code in rt6_probe()

2015-07-27 Thread YOSHIFUJI Hideaki/吉藤英明

Martin KaFai Lau wrote:
> It is a prep work for the next patch to remove write_lock
> from rt6_probe().
> 
> 1. Reduce the number of if(neigh) check.  From 4 to 1.
> 2. Bring the write_(un)lock() closer to the operations that the
>lock is protecting.
> 
> Hopefully, the above make rt6_probe() more readable.
> 
> Signed-off-by: Martin KaFai Lau 
> Cc: Hannes Frederic Sowa 
> Cc: Julian Anastasov 
> Cc: YOSHIFUJI Hideaki 

Acked-by: YOSHIFUJI Hideaki 

--yoshfuji

> ---
>  net/ipv6/route.c | 44 
>  1 file changed, 20 insertions(+), 24 deletions(-)
> 
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 7f2214f..6d503db 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -545,6 +545,7 @@ static void rt6_probe_deferred(struct work_struct *w)
>  
>  static void rt6_probe(struct rt6_info *rt)
>  {
> + struct __rt6_probe_work *work;
>   struct neighbour *neigh;
>   /*
>* Okay, this does not seem to be appropriate
> @@ -559,34 +560,29 @@ static void rt6_probe(struct rt6_info *rt)
>   rcu_read_lock_bh();
>   neigh = __ipv6_neigh_lookup_noref(rt->dst.dev, &rt->rt6i_gateway);
>   if (neigh) {
> + work = NULL;
>   write_lock(&neigh->lock);
> - if (neigh->nud_state & NUD_VALID)
> - goto out;
> - }
> -
> - if (!neigh ||
> - time_after(jiffies, neigh->updated + 
> rt->rt6i_idev->cnf.rtr_probe_interval)) {
> - struct __rt6_probe_work *work;
> -
> - work = kmalloc(sizeof(*work), GFP_ATOMIC);
> -
> - if (neigh && work)
> - __neigh_set_probe_once(neigh);
> -
> - if (neigh)
> - write_unlock(&neigh->lock);
> -
> - if (work) {
> - INIT_WORK(&work->work, rt6_probe_deferred);
> - work->target = rt->rt6i_gateway;
> - dev_hold(rt->dst.dev);
> - work->dev = rt->dst.dev;
> - schedule_work(&work->work);
> + if (!(neigh->nud_state & NUD_VALID) &&
> + time_after(jiffies,
> +neigh->updated +
> +rt->rt6i_idev->cnf.rtr_probe_interval)) {
> + work = kmalloc(sizeof(*work), GFP_ATOMIC);
> + if (work)
> + __neigh_set_probe_once(neigh);
>   }
> - } else {
> -out:
>   write_unlock(&neigh->lock);
> + } else {
> + work = kmalloc(sizeof(*work), GFP_ATOMIC);
> + }
> +
> + if (work) {
> + INIT_WORK(&work->work, rt6_probe_deferred);
> + work->target = rt->rt6i_gateway;
> + dev_hold(rt->dst.dev);
> + work->dev = rt->dst.dev;
> + schedule_work(&work->work);
>   }
> +
>   rcu_read_unlock_bh();
>  }
>  #else
> 

-- 
吉藤英明 
ミラクル・リナックス株式会社 技術本部 サポート部
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv2] net/ipv6: add sysctl option accept_ra_hop_limit

2015-07-27 Thread YOSHIFUJI Hideaki/吉藤英明

Hi,

Hangbin Liu wrote:
> Commit 6fd99094de2b ("ipv6: Don't reduce hop limit for an interface")
> disabled accept hop limit from RA if it is higher than the current hop
> limit for security stuff. But this behavior kind of break the RFC definition.
> 
> RFC 4861, 6.3.4.  Processing Received Router Advertisements
>If the received Cur Hop Limit value is non-zero, the host SHOULD set
>its CurHopLimit variable to the received value.
> 
> So add sysctl option accept_ra_hop_limit to let user choose whether accept
> hop limit info in RA.
> 

How about introducing "minimum hop limit", instead?

|commit 6fd99094de2b83d1d4c8457f2c83483b2828e75a
|Author: D.S. Ljungmark 
|Date:   Wed Mar 25 09:28:15 2015 +0100
|
|ipv6: Don't reduce hop limit for an interface
:
|RFC 3756, Section 4.2.7, "Parameter Spoofing"
|
:
|   >   As an example, one possible approach to mitigate this threat is to
|>   ignore very small hop limits.  The nodes could implement a
|>   configurable minimum hop limit, and ignore attempts to set it below
|>   said limit.

--yoshfuji


> Signed-off-by: Hangbin Liu 
> ---
>  Documentation/networking/ip-sysctl.txt | 11 +++
>  include/linux/ipv6.h   |  1 +
>  include/uapi/linux/ipv6.h  |  1 +
>  net/ipv6/addrconf.c| 10 ++
>  net/ipv6/ndisc.c   | 17 +++--
>  5 files changed, 34 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/networking/ip-sysctl.txt 
> b/Documentation/networking/ip-sysctl.txt
> index 5fae770..778c479 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -1346,6 +1346,17 @@ accept_ra_pinfo - BOOLEAN
>   Functional default: enabled if accept_ra is enabled.
>   disabled if accept_ra is disabled.
>  
> +accept_ra_hop_limit - INTEGER
> + Learn hop limit in Router Advertisement.
> +
> + Possible values are:
> + 0 Do not accept hop limit in Router Advertisements.
> + 1 Accept hop limit in Router Advertisements if it is higher
> +   than the current hop limit.
> + 2 Accept hop limit in Router Advertisements anyway.
> +
> + Default: 1
> +
>  accept_ra_rt_info_max_plen - INTEGER
>   Maximum prefix length of Route Information in RA.
>  
> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> index 82806c6..a21a9c6 100644
> --- a/include/linux/ipv6.h
> +++ b/include/linux/ipv6.h
> @@ -30,6 +30,7 @@ struct ipv6_devconf {
>   __s32   max_addresses;
>   __s32   accept_ra_defrtr;
>   __s32   accept_ra_pinfo;
> + __s32   accept_ra_hop_limit;
>  #ifdef CONFIG_IPV6_ROUTER_PREF
>   __s32   accept_ra_rtr_pref;
>   __s32   rtr_probe_interval;
> diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
> index 5efa54a..a8c1083 100644
> --- a/include/uapi/linux/ipv6.h
> +++ b/include/uapi/linux/ipv6.h
> @@ -171,6 +171,7 @@ enum {
>   DEVCONF_USE_OPTIMISTIC,
>   DEVCONF_ACCEPT_RA_MTU,
>   DEVCONF_STABLE_SECRET,
> + DEVCONF_ACCEPT_RA_HOP_LIMIT,
>   DEVCONF_MAX
>  };
>  
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 21c2c81..486a7a5 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -196,6 +196,7 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
>   .accept_ra_defrtr   = 1,
>   .accept_ra_from_local   = 0,
>   .accept_ra_pinfo= 1,
> + .accept_ra_hop_limit= 1,
>  #ifdef CONFIG_IPV6_ROUTER_PREF
>   .accept_ra_rtr_pref = 1,
>   .rtr_probe_interval = 60 * HZ,
> @@ -237,6 +238,7 @@ static struct ipv6_devconf ipv6_devconf_dflt 
> __read_mostly = {
>   .accept_ra_defrtr   = 1,
>   .accept_ra_from_local   = 0,
>   .accept_ra_pinfo= 1,
> + .accept_ra_hop_limit= 1,
>  #ifdef CONFIG_IPV6_ROUTER_PREF
>   .accept_ra_rtr_pref = 1,
>   .rtr_probe_interval = 60 * HZ,
> @@ -4561,6 +4563,7 @@ static inline void ipv6_store_devconf(struct 
> ipv6_devconf *cnf,
>   array[DEVCONF_MAX_ADDRESSES] = cnf->max_addresses;
>   array[DEVCONF_ACCEPT_RA_DEFRTR] = cnf->accept_ra_defrtr;
>   array[DEVCONF_ACCEPT_RA_PINFO] = cnf->accept_ra_pinfo;
> + array[DEVCONF_ACCEPT_RA_HOP_LIMIT] = cnf->accept_ra_hop_limit;
>  #ifdef CONFIG_IPV6_ROUTER_PREF
>   array[DEVCONF_ACCEPT_RA_RTR_PREF] = cnf->accept_ra_rtr_pref;
>   array[DEVCONF_RTR_PROBE_INTERVAL] =
> @@ -5462,6 +5465,13 @@ static struct addrconf_sysctl_table
>   .mode   = 0644,
>   .proc_handler   = proc_dointvec,
>   },
> + {
> + .procname   = "accept_ra_hop_limit",
> + .data   = &ipv6_devconf.accept_ra_hop_limit,
> + .maxlen = sizeof(int),
> + .mode   = 0644,
> +

Re: [RFC PATCH v3 net-next 2/3] tcp: add in_flight to tcp_skb_cb

2015-07-23 Thread YOSHIFUJI Hideaki/吉藤英明

Hi,

Lawrence Brakmo wrote:
> Based on comments by Neal Cardwell to tcp_nv patch:
> 
>   AFAICT this patch would not require an increase in the size of sk_buff
>   cb[] if it were to take advantage of the fact that the tcp_skb_cb
>   header.h4 and header.h6 fields are only used in the packet reception
>   code path, and this in_flight field is only used on the transmit
>   side. So the in_flight field could be placed in a struct that is
>   itself placed in a union with the "header" union.

Please make another patch only for this.

> 
>   That way the sender code can remember the in_flight value
>   without requiring any extra space. And in the future other
>   sender-side info could be stored in the "tx" struct, if needed.
> 
> Signed-off-by: Lawrence Brakmo 
> ---
>  include/net/tcp.h | 13 ++---
>  net/ipv4/tcp_input.c  |  5 -
>  net/ipv4/tcp_output.c |  4 +++-
>  3 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 1e6c5b04..b98d79a 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -755,11 +755,17 @@ struct tcp_skb_cb {
>   /* 1 byte hole */
>   __u32   ack_seq;/* Sequence number ACK'd*/
>   union {
> - struct inet_skb_parmh4;
> + struct {
> + /* bytes in flight when this packet was sent */
> + __u32 in_flight;
> + } tx;   /* only used for outgoing skbs */
> + union {
> + struct inet_skb_parmh4;
>  #if IS_ENABLED(CONFIG_IPV6)
> - struct inet6_skb_parm   h6;
> + struct inet6_skb_parm   h6;
>  #endif
> - } header;   /* For incoming frames  */
> + } header;   /* For incoming skbs */
> + };
>  };
>  
>  #define TCP_SKB_CB(__skb)((struct tcp_skb_cb *)&((__skb)->cb[0]))
> @@ -837,6 +843,7 @@ union tcp_cc_info;
>  struct ack_sample {
>   u32 pkts_acked;
>   s32 rtt_us;
> + u32 in_flight;
>  };
>  
>  struct tcp_congestion_ops {
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 423d3af..3ab4178 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -3068,6 +3068,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, int 
> prior_fackets,
>   long ca_rtt_us = -1L;
>   struct sk_buff *skb;
>   u32 pkts_acked = 0;
> + u32 last_in_flight = 0;
>   bool rtt_update;
>   int flag = 0;
>  
> @@ -3107,6 +3108,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, int 
> prior_fackets,
>   if (!first_ackt.v64)
>   first_ackt = last_ackt;
>  
> + last_in_flight = TCP_SKB_CB(skb)->tx.in_flight;
>   reord = min(pkts_acked, reord);
>   if (!after(scb->end_seq, tp->high_seq))
>   flag |= FLAG_ORIG_SACK_ACKED;
> @@ -3196,7 +3198,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, int 
> prior_fackets,
>   }
>  
>   if (icsk->icsk_ca_ops->pkts_acked) {
> - struct ack_sample sample = {pkts_acked, ca_rtt_us};
> + struct ack_sample sample = {pkts_acked, ca_rtt_us,
> + last_in_flight};
>  
>   icsk->icsk_ca_ops->pkts_acked(sk, &sample);
>   }
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 7105784..e9deab5 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -920,9 +920,12 @@ static int tcp_transmit_skb(struct sock *sk, struct 
> sk_buff *skb, int clone_it,
>   int err;
>  
>   BUG_ON(!skb || !tcp_skb_pcount(skb));
> + tp = tcp_sk(sk);
>  
>   if (clone_it) {
>   skb_mstamp_get(&skb->skb_mstamp);
> + TCP_SKB_CB(skb)->tx.in_flight = TCP_SKB_CB(skb)->end_seq
> + - tp->snd_una;
>  
>   if (unlikely(skb_cloned(skb)))
>   skb = pskb_copy(skb, gfp_mask);
> @@ -933,7 +936,6 @@ static int tcp_transmit_skb(struct sock *sk, struct 
> sk_buff *skb, int clone_it,
>   }
>  
>   inet = inet_sk(sk);
> - tp = tcp_sk(sk);
>   tcb = TCP_SKB_CB(skb);
>   memset(&opts, 0, sizeof(opts));
>  
> 

-- 
吉藤英明 
ミラクル・リナックス株式会社 技術本部 サポート部
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net/ipv6: add sysctl option accept_ra_hop_limit

2015-07-23 Thread YOSHIFUJI Hideaki

Hi,

Hangbin Liu wrote:
> Commit 6fd99094de2b ("ipv6: Don't reduce hop limit for an interface")
> disabled accept hop limit from RA if it is higher than the current hop
> limit for security stuff. But this behavior kind of break the RFC definition.
> 
> RFC 4861, 6.3.4.  Processing Received Router Advertisements
>If the received Cur Hop Limit value is non-zero, the host SHOULD set
>its CurHopLimit variable to the received value.
> 
> So add sysctl option accept_ra_hop_limit to let user choose whether accept
> hop limit info in RA.
> 
> Signed-off-by: Hangbin Liu 
> Acked-by: Hannes Frederic Sowa 
> ---
>  Documentation/networking/ip-sysctl.txt | 11 +++
>  include/linux/ipv6.h   |  1 +
>  include/uapi/linux/ipv6.h  |  1 +
>  net/ipv6/addrconf.c| 10 ++
>  net/ipv6/ndisc.c   | 17 +++--
>  5 files changed, 34 insertions(+), 6 deletions(-)
> 
:
> diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
> index 5efa54a..9f40ac9 100644
> --- a/include/uapi/linux/ipv6.h
> +++ b/include/uapi/linux/ipv6.h
> @@ -153,6 +153,7 @@ enum {
>   DEVCONF_FORCE_MLD_VERSION,
>   DEVCONF_ACCEPT_RA_DEFRTR,
>   DEVCONF_ACCEPT_RA_PINFO,
> + DEVCONF_ACCEPT_RA_HOP_LIMIT,
>   DEVCONF_ACCEPT_RA_RTR_PREF,
>   DEVCONF_RTR_PROBE_INTERVAL,
>   DEVCONF_ACCEPT_RA_RT_INFO_MAX_PLEN,

No, you cannot add new one in the middle of these since 
values are exported to userspace.

--yoshfuji
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] ipv6: Avoid rt6_probe() taking writer lock in the fast path

2015-07-21 Thread YOSHIFUJI Hideaki

Hi,

Martin KaFai Lau wrote:
> The patch checks neigh->nud_state before acquiring the writer lock.
> Note that rt6_probe() is only used in CONFIG_IPV6_ROUTER_PREF.

You have to take "some" lock when accessing neigh->nud_state
theoretically.

> 
> I also take this chance to re-arrange the code.

No, please do not mix multiple changes.

> 
> 40 udpflood processes and a /64 gateway route are used.
> The gateway has NUD_PERMANENT.  Each of them is run for 30s.
> At the end, the total number of finished sendto():
> 
> BeforeAfter
> 55M 95M
> 
> Signed-off-by: Martin KaFai Lau 
> Cc: Hannes Frederic Sowa 
> ---
>  net/ipv6/route.c | 41 -
>  1 file changed, 20 insertions(+), 21 deletions(-)
> 
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 6090969..a6c6b5a 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -544,6 +544,7 @@ static void rt6_probe_deferred(struct work_struct *w)
>  
>  static void rt6_probe(struct rt6_info *rt)
>  {
> + struct __rt6_probe_work *work;
>   struct neighbour *neigh;
>   /*
>* Okay, this does not seem to be appropriate
> @@ -558,34 +559,32 @@ static void rt6_probe(struct rt6_info *rt)
>   rcu_read_lock_bh();
>   neigh = __ipv6_neigh_lookup_noref(rt->dst.dev, &rt->rt6i_gateway);
>   if (neigh) {
> - write_lock(&neigh->lock);
>   if (neigh->nud_state & NUD_VALID)
>   goto out;
> - }
> -
> - if (!neigh ||
> - time_after(jiffies, neigh->updated + 
> rt->rt6i_idev->cnf.rtr_probe_interval)) {
> - struct __rt6_probe_work *work;
>  
> + work = NULL;
> + write_lock(&neigh->lock);
> + if (!(neigh->nud_state & NUD_VALID) &&
> + time_after(jiffies, neigh->updated + 
> rt->rt6i_idev->cnf.rtr_probe_interval)) {
> + work = kmalloc(sizeof(*work), GFP_ATOMIC);
> + if (work) {
> + __neigh_set_probe_once(neigh);
> + }
> + }
> + write_unlock(&neigh->lock);
> + } else {
>   work = kmalloc(sizeof(*work), GFP_ATOMIC);
> + }
>  
> - if (neigh && work)
> - __neigh_set_probe_once(neigh);
> -
> - if (neigh)
> - write_unlock(&neigh->lock);
> + if (work) {
> + INIT_WORK(&work->work, rt6_probe_deferred);
> + work->target = rt->rt6i_gateway;
> + dev_hold(rt->dst.dev);
> + work->dev = rt->dst.dev;
> + schedule_work(&work->work);
> + }
>  
> - if (work) {
> - INIT_WORK(&work->work, rt6_probe_deferred);
> - work->target = rt->rt6i_gateway;
> - dev_hold(rt->dst.dev);
> - work->dev = rt->dst.dev;
> - schedule_work(&work->work);
> - }
> - } else {
>  out:
> - write_unlock(&neigh->lock);
> - }
>   rcu_read_unlock_bh();
>  }
>  #else
> 

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next] ipv6: Remove unused arguments for __ipv6_dev_get_saddr().

2015-07-16 Thread YOSHIFUJI Hideaki

Signed-off-by: YOSHIFUJI Hideaki 
---
 net/ipv6/addrconf.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 4c9a024..32153c2 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1360,8 +1360,6 @@ out:
 
 static int __ipv6_dev_get_saddr(struct net *net,
struct ipv6_saddr_dst *dst,
-   unsigned int prefs,
-   const struct in6_addr *saddr,
struct inet6_dev *idev,
struct ipv6_saddr_score *scores,
int hiscore_idx)
@@ -1485,13 +1483,13 @@ int ipv6_dev_get_saddr(struct net *net, const struct 
net_device *dst_dev,
 
if (use_oif_addr) {
if (idev)
-   hiscore_idx = __ipv6_dev_get_saddr(net, &dst, prefs, 
saddr, idev, scores, hiscore_idx);
+   hiscore_idx = __ipv6_dev_get_saddr(net, &dst, idev, 
scores, hiscore_idx);
} else {
for_each_netdev_rcu(net, dev) {
idev = __in6_dev_get(dev);
if (!idev)
continue;
-   hiscore_idx = __ipv6_dev_get_saddr(net, &dst, prefs, 
saddr, idev, scores, hiscore_idx);
+   hiscore_idx = __ipv6_dev_get_saddr(net, &dst, idev, 
scores, hiscore_idx);
}
}
rcu_read_unlock();
-- 
1.9.1


-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ipv6: Fix finding best source address in ipv6_dev_get_saddr().

2015-07-14 Thread YOSHIFUJI Hideaki/吉藤英明

Hi,

Tom Herbert wrote:
> I am testing this patch which may be a little simpler. Also idev needs
> to be checked after __in6_dev_get

We have to select source address on *given* interface for link-local/
multicast destinations.

> 
> Tom
> 
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 4ab74d5..d631ac3 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -1363,9 +1363,10 @@ static void __ipv6_dev_get_saddr(struct net *net,
>  unsigned int prefs,
>  const struct in6_addr *saddr,
>  struct inet6_dev *idev,
> -struct ipv6_saddr_score *scores)
> +struct ipv6_saddr_score **in_score,
> +struct ipv6_saddr_score **in_hiscore)
>  {
> -   struct ipv6_saddr_score *score = &scores[0], *hiscore = &scores[1];
> +   struct ipv6_saddr_score *score = *in_score, *hiscore = *in_hiscore;
> 
> read_lock_bh(&idev->lock);
> list_for_each_entry(score->ifa, &idev->addr_list, if_list) {
> @@ -1434,13 +1435,16 @@ static void __ipv6_dev_get_saddr(struct net *net,
> }
>  out:
> read_unlock_bh(&idev->lock);
> +   *in_hiscore = hiscore;
> +   *in_score = score;
>  }
> 
>  int ipv6_dev_get_saddr(struct net *net, const struct net_device *dst_dev,
>const struct in6_addr *daddr, unsigned int prefs,
>struct in6_addr *saddr)
>  {
> -   struct ipv6_saddr_score scores[2], *hiscore = &scores[1];
> +   struct ipv6_saddr_score scores[2];
> +   struct ipv6_saddr_score *score = &scores[0], *hiscore = &scores[1];
> struct ipv6_saddr_dst dst;
> struct inet6_dev *idev;
> struct net_device *dev;
> @@ -1475,18 +1479,19 @@ int ipv6_dev_get_saddr(struct net *net, const
> struct net_device *dst_dev,
> if ((dst_type & IPV6_ADDR_MULTICAST) ||
> dst.scope <= IPV6_ADDR_SCOPE_LINKLOCAL) {
> idev = __in6_dev_get(dst_dev);
> -   use_oif_addr = true;
> +   if (idev)
> +   use_oif_addr = true;
> }
> }
> if (use_oif_addr) {
> -   __ipv6_dev_get_saddr(net, &dst, prefs, saddr, idev, scores);
> +   __ipv6_dev_get_saddr(net, &dst, prefs, saddr, idev,
> &score, &hiscore);
> } else {
> for_each_netdev_rcu(net, dev) {
> idev = __in6_dev_get(dev);
> if (!idev)
> continue;
> -   __ipv6_dev_get_saddr(net, &dst, prefs, saddr,
> idev, scores);
> +   __ipv6_dev_get_saddr(net, &dst, prefs, saddr,
> idev, &score, &hiscore);
> }
> }
> rcu_read_unlock();
> 
> On Mon, Jul 13, 2015 at 7:28 AM, YOSHIFUJI Hideaki/吉藤英明
>  wrote:
>> Commit 9131f3de2 ("ipv6: Do not iterate over all interfaces when
>> finding source address on specific interface.") did not properly
>> update best source address available.  Plus, it introduced
>> possible NULL pointer dereference.
>>
>> Bug was reported by Erik Kline .
>> Based on patch proposed by Hajime Tazaki .
>>
>> Fixes: 9131f3de24db4dc12199aede7d931e6703e97f3b ("ipv6: Do not
>> iterate over all interfaces when finding source address
>> on specific interface.")
>> Signed-off-by: YOSHIFUJI Hideaki 
>> ---
>>  net/ipv6/addrconf.c | 30 ++
>>  1 file changed, 18 insertions(+), 12 deletions(-)
>>
>> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
>> index 4ab74d5..4c9a024 100644
>> --- a/net/ipv6/addrconf.c
>> +++ b/net/ipv6/addrconf.c
>> @@ -1358,14 +1358,15 @@ out:
>> return ret;
>>  }
>>
>> -static void __ipv6_dev_get_saddr(struct net *net,
>> -struct ipv6_saddr_dst *dst,
>> -unsigned int prefs,
>> -const struct in6_addr *saddr,
>> -struct inet6_dev *idev,
>> -struct ipv6_saddr_score *scores)
>> +static int __ipv6_dev_get_saddr(struct net *net,
>> +   struct ipv6_saddr_dst *dst,
>> +   unsigned int prefs,
>> +   const struct in6_addr *saddr,
>&g

[PATCH] ipv6: Fix finding best source address in ipv6_dev_get_saddr().

2015-07-13 Thread YOSHIFUJI Hideaki/吉藤英明

Commit 9131f3de2 ("ipv6: Do not iterate over all interfaces when
finding source address on specific interface.") did not properly
update best source address available.  Plus, it introduced
possible NULL pointer dereference.

Bug was reported by Erik Kline .
Based on patch proposed by Hajime Tazaki .

Fixes: 9131f3de24db4dc12199aede7d931e6703e97f3b ("ipv6: Do not
iterate over all interfaces when finding source address
on specific interface.")
Signed-off-by: YOSHIFUJI Hideaki 
---
 net/ipv6/addrconf.c | 30 ++
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 4ab74d5..4c9a024 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1358,14 +1358,15 @@ out:
return ret;
 }
 
-static void __ipv6_dev_get_saddr(struct net *net,
-struct ipv6_saddr_dst *dst,
-unsigned int prefs,
-const struct in6_addr *saddr,
-struct inet6_dev *idev,
-struct ipv6_saddr_score *scores)
+static int __ipv6_dev_get_saddr(struct net *net,
+   struct ipv6_saddr_dst *dst,
+   unsigned int prefs,
+   const struct in6_addr *saddr,
+   struct inet6_dev *idev,
+   struct ipv6_saddr_score *scores,
+   int hiscore_idx)
 {
-   struct ipv6_saddr_score *score = &scores[0], *hiscore = &scores[1];
+   struct ipv6_saddr_score *score = &scores[1 - hiscore_idx], *hiscore = 
&scores[hiscore_idx];
 
read_lock_bh(&idev->lock);
list_for_each_entry(score->ifa, &idev->addr_list, if_list) {
@@ -1424,6 +1425,7 @@ static void __ipv6_dev_get_saddr(struct net *net,
in6_ifa_hold(score->ifa);
 
swap(hiscore, score);
+   hiscore_idx = 1 - hiscore_idx;
 
/* restore our iterator */
score->ifa = hiscore->ifa;
@@ -1434,18 +1436,20 @@ static void __ipv6_dev_get_saddr(struct net *net,
}
 out:
read_unlock_bh(&idev->lock);
+   return hiscore_idx;
 }
 
 int ipv6_dev_get_saddr(struct net *net, const struct net_device *dst_dev,
   const struct in6_addr *daddr, unsigned int prefs,
   struct in6_addr *saddr)
 {
-   struct ipv6_saddr_score scores[2], *hiscore = &scores[1];
+   struct ipv6_saddr_score scores[2], *hiscore;
struct ipv6_saddr_dst dst;
struct inet6_dev *idev;
struct net_device *dev;
int dst_type;
bool use_oif_addr = false;
+   int hiscore_idx = 0;
 
dst_type = __ipv6_addr_type(daddr);
dst.addr = daddr;
@@ -1454,8 +1458,8 @@ int ipv6_dev_get_saddr(struct net *net, const struct 
net_device *dst_dev,
dst.label = ipv6_addr_label(net, daddr, dst_type, dst.ifindex);
dst.prefs = prefs;
 
-   hiscore->rule = -1;
-   hiscore->ifa = NULL;
+   scores[hiscore_idx].rule = -1;
+   scores[hiscore_idx].ifa = NULL;
 
rcu_read_lock();
 
@@ -1480,17 +1484,19 @@ int ipv6_dev_get_saddr(struct net *net, const struct 
net_device *dst_dev,
}
 
if (use_oif_addr) {
-   __ipv6_dev_get_saddr(net, &dst, prefs, saddr, idev, scores);
+   if (idev)
+   hiscore_idx = __ipv6_dev_get_saddr(net, &dst, prefs, 
saddr, idev, scores, hiscore_idx);
} else {
for_each_netdev_rcu(net, dev) {
idev = __in6_dev_get(dev);
if (!idev)
continue;
-   __ipv6_dev_get_saddr(net, &dst, prefs, saddr, idev, 
scores);
+   hiscore_idx = __ipv6_dev_get_saddr(net, &dst, prefs, 
saddr, idev, scores, hiscore_idx);
}
}
rcu_read_unlock();
 
+   hiscore = &scores[hiscore_idx];
if (!hiscore->ifa)
return -EADDRNOTAVAIL;
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v2] ipv6: Do not iterate over all interfaces when finding source address on specific interface.

2015-07-13 Thread YOSHIFUJI Hideaki

Hi,

Hajime Tazaki wrote:
> 
> Yoshifuji-san,
> 
> At Mon, 13 Jul 2015 17:38:48 +0900,
> Erik Kline wrote:
>>
>> On 13 July 2015 at 15:32, YOSHIFUJI Hideaki
>>  wrote:
>>> Hi,
>>>
>>> Erik Kline wrote:
>>>> Hmm, when I run a UML linux with this patch (which, I'm ashamed to
>>>> say, I failed to do before) I get these kinds of errors:
>>>>
>>>> unregister_netdevice: waiting for  to become free.
>>>> Usage count = 1
>>>> unregister_netdevice: waiting for  to become free.
>>>> Usage count = 1
>>>>
>>>> Perhaps they're unrelated... I'm still investigating.
>>>
>>> Would you test attached patch please?
>>
>> That does look logically correct, so +1 to it regardless, but it does
>> not seem to have fixed the issue I'm seeing.
>>
>> I still haven't produced the smallest possible demo test program.
> 
> sorry to jump-in, but there is a side-effect with this
> patch, which my tcp and dccp tests (ipv6) are failed.
> 
> because newly added function (__ipv6_dev_get_saddr) won't
> update a variable 'hiscore' (it swaps with 'score' in some
> case), the caller (ipv6_dev_get_saddr) can't fill an
> appropriate saddr in the end.
> 
> I don't know if this is a good patch but the following diff
> makes my test happy.

We should update score as well...

> 
> -- Hajime
> 
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 4ab74d5..c4e9416 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -1363,7 +1363,8 @@ static void __ipv6_dev_get_saddr(struct net *net,
>unsigned int prefs,
>const struct in6_addr *saddr,
>struct inet6_dev *idev,
> -  struct ipv6_saddr_score *scores)
> +  struct ipv6_saddr_score *scores,
> +  struct ipv6_saddr_score **in_hiscore)
>  {
>   struct ipv6_saddr_score *score = &scores[0], *hiscore = &scores[1];
>  
> @@ -1424,6 +1425,7 @@ static void __ipv6_dev_get_saddr(struct net *net,
>   in6_ifa_hold(score->ifa);
>  
>   swap(hiscore, score);
> + *in_hiscore = hiscore;
>  
>   /* restore our iterator */
>   score->ifa = hiscore->ifa;
> @@ -1480,13 +1482,15 @@ int ipv6_dev_get_saddr(struct net *net, const struct 
> net_device *dst_dev,
>   }
>  
>   if (use_oif_addr) {
> - __ipv6_dev_get_saddr(net, &dst, prefs, saddr, idev, scores);
> + __ipv6_dev_get_saddr(net, &dst, prefs, saddr, idev,
> +  scores, &hiscore);
>   } else {
>   for_each_netdev_rcu(net, dev) {
>   idev = __in6_dev_get(dev);
>   if (!idev)
>   continue;
> - __ipv6_dev_get_saddr(net, &dst, prefs, saddr, idev, 
> scores);
> + __ipv6_dev_get_saddr(net, &dst, prefs, saddr, idev,
> +  scores, &hiscore);
>   }
>   }
>   rcu_read_unlock();
> 

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC net-next] net: Build IPv6 into kernel by default

2015-07-12 Thread YOSHIFUJI Hideaki

Hi,

David Miller wrote:
> From: Tom Herbert 
> Date: Thu, 9 Jul 2015 13:42:29 -0700
> 
>> This patch makes the default to build IPv6 into the kernel. IPv6
>> now has significant traction and any remaining vestiges of IPv6
>> not being provided parity with IPv4 should be swept away. IPv6 is now
>> core to the Internet and kernel.
> 
> I guess I'm fine with this, just fix up the doc error Dave Jones
> pointed out.

I am deeply moved.

Acked-by: YOSHIFUJI Hideaki 

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v2] ipv6: Do not iterate over all interfaces when finding source address on specific interface.

2015-07-12 Thread YOSHIFUJI Hideaki

Hi,

Erik Kline wrote:
> Hmm, when I run a UML linux with this patch (which, I'm ashamed to
> say, I failed to do before) I get these kinds of errors:
> 
> unregister_netdevice: waiting for  to become free.
> Usage count = 1
> unregister_netdevice: waiting for  to become free.
> Usage count = 1
> 
> Perhaps they're unrelated... I'm still investigating.

Would you test attached patch please?

--yoshfuji

> 
> On 11 July 2015 at 15:19, David Miller  wrote:
>> From: YOSHIFUJI Hideaki/吉藤英明 
>> Date: Fri, 10 Jul 2015 16:58:31 +0900
>>
>>> If outgoing interface is specified and the candidate address is
>>> restricted to the outgoing interface, it is enough to iterate
>>> over that given interface only.
>>>
>>> Signed-off-by: YOSHIFUJI Hideaki 
>>> Acked-by: Erik Kline 
>>
>> Applied, thanks!

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION
>From 38c5a10a5876ea47766ffc05b5a131a210d6e1aa Mon Sep 17 00:00:00 2001
From: YOSHIFUJI Hideaki 
Date: Mon, 13 Jul 2015 15:23:02 +0900
Subject: [PATCH] ipv6: Avoid NULL pointer dereference in
 __ipv6_dev_get_saddr().

Commit 9131f3de2 ("ipv6: Do not iterate over all interfaces when
finding source address on specific interface.") introduced
possible NULL pointer dereference if outgoing device is
specified.

Fixes: 9131f3de24db4dc12199aede7d931e6703e97f3b ("ipv6: Do not
	iterate over all interfaces when finding source address
	on specific interface.")
Signed-off-by: YOSHIFUJI Hideaki 
---
 net/ipv6/addrconf.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 4ab74d5..50ad476 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1480,7 +1480,8 @@ int ipv6_dev_get_saddr(struct net *net, const struct net_device *dst_dev,
 	}
 
 	if (use_oif_addr) {
-		__ipv6_dev_get_saddr(net, &dst, prefs, saddr, idev, scores);
+		if (idev)
+			__ipv6_dev_get_saddr(net, &dst, prefs, saddr, idev, scores);
 	} else {
 		for_each_netdev_rcu(net, dev) {
 			idev = __in6_dev_get(dev);
-- 
1.9.1

[PATCH net-next v2] ipv6: Do not iterate over all interfaces when finding source address on specific interface.

2015-07-10 Thread YOSHIFUJI Hideaki/吉藤英明

If outgoing interface is specified and the candidate address is
restricted to the outgoing interface, it is enough to iterate
over that given interface only.

Signed-off-by: YOSHIFUJI Hideaki 
Acked-by: Erik Kline 
---
 net/ipv6/addrconf.c | 197 
 1 file changed, 107 insertions(+), 90 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 21c2c81..4ab74d5 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1358,15 +1358,94 @@ out:
return ret;
 }
 
+static void __ipv6_dev_get_saddr(struct net *net,
+struct ipv6_saddr_dst *dst,
+unsigned int prefs,
+const struct in6_addr *saddr,
+struct inet6_dev *idev,
+struct ipv6_saddr_score *scores)
+{
+   struct ipv6_saddr_score *score = &scores[0], *hiscore = &scores[1];
+
+   read_lock_bh(&idev->lock);
+   list_for_each_entry(score->ifa, &idev->addr_list, if_list) {
+   int i;
+
+   /*
+* - Tentative Address (RFC2462 section 5.4)
+*  - A tentative address is not considered
+*"assigned to an interface" in the traditional
+*sense, unless it is also flagged as optimistic.
+* - Candidate Source Address (section 4)
+*  - In any case, anycast addresses, multicast
+*addresses, and the unspecified address MUST
+*NOT be included in a candidate set.
+*/
+   if ((score->ifa->flags & IFA_F_TENTATIVE) &&
+   (!(score->ifa->flags & IFA_F_OPTIMISTIC)))
+   continue;
+
+   score->addr_type = __ipv6_addr_type(&score->ifa->addr);
+
+   if (unlikely(score->addr_type == IPV6_ADDR_ANY ||
+score->addr_type & IPV6_ADDR_MULTICAST)) {
+   net_dbg_ratelimited("ADDRCONF: unspecified / multicast 
address assigned as unicast address on %s",
+   idev->dev->name);
+   continue;
+   }
+
+   score->rule = -1;
+   bitmap_zero(score->scorebits, IPV6_SADDR_RULE_MAX);
+
+   for (i = 0; i < IPV6_SADDR_RULE_MAX; i++) {
+   int minihiscore, miniscore;
+
+   minihiscore = ipv6_get_saddr_eval(net, hiscore, dst, i);
+   miniscore = ipv6_get_saddr_eval(net, score, dst, i);
+
+   if (minihiscore > miniscore) {
+   if (i == IPV6_SADDR_RULE_SCOPE &&
+   score->scopedist > 0) {
+   /*
+* special case:
+* each remaining entry
+* has too small (not enough)
+* scope, because ifa entries
+* are sorted by their scope
+* values.
+*/
+   goto out;
+   }
+   break;
+   } else if (minihiscore < miniscore) {
+   if (hiscore->ifa)
+   in6_ifa_put(hiscore->ifa);
+
+   in6_ifa_hold(score->ifa);
+
+   swap(hiscore, score);
+
+   /* restore our iterator */
+   score->ifa = hiscore->ifa;
+
+   break;
+   }
+   }
+   }
+out:
+   read_unlock_bh(&idev->lock);
+}
+
 int ipv6_dev_get_saddr(struct net *net, const struct net_device *dst_dev,
   const struct in6_addr *daddr, unsigned int prefs,
   struct in6_addr *saddr)
 {
-   struct ipv6_saddr_score scores[2],
-   *score = &scores[0], *hiscore = &scores[1];
+   struct ipv6_saddr_score scores[2], *hiscore = &scores[1];
struct ipv6_saddr_dst dst;
+   struct inet6_dev *idev;
struct net_device *dev;
int dst_type;
+   bool use_oif_addr = false;
 
dst_type = __ipv6_addr_type(daddr);
dst.addr = daddr;
@@ -1380,97 +1459,35 @@ int ipv6_dev_get_saddr(struct net *net, const struct 
net_device *dst_dev,
 
rcu_read_lock();
 
-   for_each_netdev_rcu(net, dev) {
-   struct inet6_dev *idev;
-
-   /* Candida

Re: [PATCH net-next] ipv6: Do not iterate over all interfaces when finding source address on specific interface.

2015-07-10 Thread YOSHIFUJI Hideaki

Hi,

Erik Kline wrote:

>> +   /* Candidate Source Address (section 4)
>> +*  - multicast and link-local destination address,
>> +*the set of candidate source address MUST only
>> +*include addresses assigned to interfaces
>> +*belonging to the same link as the outgoing
>> +*interface.
>> +* (- For site-local destination addresses, the
>> +*set of candidate source addresses MUST only
>> +*include addresses assigned to interfaces
>> +*belonging to the same site as the outgoing
>> +*interface.)
>> +*  - "It is RECOMMENDED that the candidate source addresses
>> +*be the set of unicast addresses assigned to the
>> +*interface that will be used to send to the destination
>> +*(the 'outgoing' interface)." (RFC 6724)
>> +*/

Sorry, I forgot to remove 3rd one, which should be removed for now; I'll
resubmit.

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next] ipv6: Do not iterate over all interfaces when finding source address on specific interface.

2015-07-09 Thread YOSHIFUJI Hideaki/吉藤英明

If outgoing interface is specified and the candidate addresses
are restricted to the outgoing interface, it is enough to iterate
over that given interface only.

Signed-off-by: YOSHIFUJI Hideaki 
---
 net/ipv6/addrconf.c | 201 +---
 1 file changed, 111 insertions(+), 90 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 21c2c81..b4c82d8 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1358,15 +1358,94 @@ out:
return ret;
 }
 
+static void __ipv6_dev_get_saddr(struct net *net,
+struct ipv6_saddr_dst *dst,
+unsigned int prefs,
+const struct in6_addr *saddr,
+struct inet6_dev *idev,
+struct ipv6_saddr_score *scores)
+{
+   struct ipv6_saddr_score *score = &scores[0], *hiscore = &scores[1];
+
+   read_lock_bh(&idev->lock);
+   list_for_each_entry(score->ifa, &idev->addr_list, if_list) {
+   int i;
+
+   /*
+* - Tentative Address (RFC2462 section 5.4)
+*  - A tentative address is not considered
+*"assigned to an interface" in the traditional
+*sense, unless it is also flagged as optimistic.
+* - Candidate Source Address (section 4)
+*  - In any case, anycast addresses, multicast
+*addresses, and the unspecified address MUST
+*NOT be included in a candidate set.
+*/
+   if ((score->ifa->flags & IFA_F_TENTATIVE) &&
+   (!(score->ifa->flags & IFA_F_OPTIMISTIC)))
+   continue;
+
+   score->addr_type = __ipv6_addr_type(&score->ifa->addr);
+
+   if (unlikely(score->addr_type == IPV6_ADDR_ANY ||
+score->addr_type & IPV6_ADDR_MULTICAST)) {
+   net_dbg_ratelimited("ADDRCONF: unspecified / multicast 
address assigned as unicast address on %s",
+   idev->dev->name);
+   continue;
+   }
+
+   score->rule = -1;
+   bitmap_zero(score->scorebits, IPV6_SADDR_RULE_MAX);
+
+   for (i = 0; i < IPV6_SADDR_RULE_MAX; i++) {
+   int minihiscore, miniscore;
+
+   minihiscore = ipv6_get_saddr_eval(net, hiscore, dst, i);
+   miniscore = ipv6_get_saddr_eval(net, score, dst, i);
+
+   if (minihiscore > miniscore) {
+   if (i == IPV6_SADDR_RULE_SCOPE &&
+   score->scopedist > 0) {
+   /*
+* special case:
+* each remaining entry
+* has too small (not enough)
+* scope, because ifa entries
+* are sorted by their scope
+* values.
+*/
+   goto out;
+   }
+   break;
+   } else if (minihiscore < miniscore) {
+   if (hiscore->ifa)
+   in6_ifa_put(hiscore->ifa);
+
+   in6_ifa_hold(score->ifa);
+
+   swap(hiscore, score);
+
+   /* restore our iterator */
+   score->ifa = hiscore->ifa;
+
+   break;
+   }
+   }
+   }
+out:
+   read_unlock_bh(&idev->lock);
+}
+
 int ipv6_dev_get_saddr(struct net *net, const struct net_device *dst_dev,
   const struct in6_addr *daddr, unsigned int prefs,
   struct in6_addr *saddr)
 {
-   struct ipv6_saddr_score scores[2],
-   *score = &scores[0], *hiscore = &scores[1];
+   struct ipv6_saddr_score scores[2], *hiscore = &scores[1];
struct ipv6_saddr_dst dst;
+   struct inet6_dev *idev;
struct net_device *dev;
int dst_type;
+   bool use_oif_addr = false;
 
dst_type = __ipv6_addr_type(daddr);
dst.addr = daddr;
@@ -1380,97 +1459,39 @@ int ipv6_dev_get_saddr(struct net *net, const struct 
net_device *dst_dev,
 
rcu_read_lock();
 
-   for_each_netdev_rcu(net, dev) {
-   struct inet6_dev *idev;
-
-   /* Candidate Source Address (section 4)

Re: [PATCH net-next v2] ipv6: sysctl to restrict candidate source addresses

2015-07-09 Thread YOSHIFUJI Hideaki

David Miller wrote:

> So we should look at a way at making the new behavior the default, and
> in fact that makes sense and we can even optimize this piece of saddr
> selection code to not do an iteration over all devices in the system
> for no reason at all.  It can just do a quick dev_get_by_index() and
> work with that.
> 

I agree.  I've cooked up a patch for that optimization, so that Eric
can rebase his patch.

Regards,

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] ipv6: sysctl to restrict candidate source addresses

2015-07-03 Thread YOSHIFUJI Hideaki

Hi,

Erik Kline wrote:
> Per RFC 6724, section 4, "Candidate Source Addresses":
> 
> It is RECOMMENDED that the candidate source addresses be the set
> of unicast addresses assigned to the interface that will be used
> to send to the destination (the "outgoing" interface).
> 
> Add a sysctl to enable this behaviour.
> 
> Signed-off-by: Erik Kline 
> ---
>  Documentation/networking/ip-sysctl.txt | 12 
>  include/linux/ipv6.h   |  1 +
>  include/uapi/linux/ipv6.h  |  1 +
>  net/ipv6/addrconf.c| 30 +-
>  4 files changed, 39 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/networking/ip-sysctl.txt 
> b/Documentation/networking/ip-sysctl.txt
> index 5fae770..d8f3e60 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -1435,6 +1435,18 @@ mtu - INTEGER
>   Default Maximum Transfer Unit
>   Default: 1280 (IPv6 required minimum)
>  
> +restrict_srcaddr - INTEGER
> + Restrict candidate source addresses (vis. RFC 6724, section 4).
> +
> + When set to 1, the candidate source addresses for destinations
> + routed via this interface are restricted to the set of addresses
> + configured on this interface.
> +
> + Possible values are:
> + 0 : no source address restrictions
> + 1 : require matching outgoing interface
> + Default:  0
> +

I cannot get what "restrict" restricts.  How about "use_oif_addr" or
something like that (like use_tempaddr)?

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ipv6: Fixed source specific default route handling.

2015-06-22 Thread YOSHIFUJI Hideaki/吉藤英明

Matthias Schiffer wrote:
> On 06/22/2015 07:58 AM, Steven Barth wrote:
>> On 22.06.2015 00:35, Matthias Schiffer wrote:
>>> Could you explain in detail what you mean with "If you want specific SA,
>>> add same route with higher metric and/or (more) specific src match."?
>>> Routes aren't bound to specific addresses except via the "src" attribute
>>> (which is called prefsrc in the kernel), which is exactly what it not
>>> working. I can't control the chosen source address at all when
>>> source-specific routes are involved.
>> Except that prefsrc and src are two different beasts and usually ip route 
>> from transates to
>> RTA_SRC instead of RTA_PREFSOURCE when used with a prefix length.
>>
>> Try adding two routes to the same destination with the same metric but 
>> different source values with PREFSRC (e.g. IPv4) and then
>> try doing the same with SRC (e.g. IPv6). The former will fail but the latter 
>> will succeed.
> 
> Ah sorry, I didn't know that "src" and "prefsrc" were distinct concepts.
> I meant to refer to "src" whenever I wrote "prefsrc". What are the
> precise semantics of the "src" attribute? Any RFC I can read, or is this
> a Linux-specific concept?
> 

"src" is long-lived feature which is usually used with mutiple routing
tables by "ip rule".

--yoshfuji

>>
>>
>> https://tools.ietf.org/html/draft-troan-homenet-sadr-01
>> was the original draft for source-address dependent routing IIRC so might be 
>> a good read.
> 
> Thanks for the link, that helps a bit.
> 
>>
>>
>>>
>>> Even though the source-specific route has a higher metric than the
>>> generic one, the source-specific one shadows the generic route.
>>
>> (was a bit ago since I read into this so please correct me if I am wrong)
>> IIRC this is intentional since longest-prefix-match beats metric here
>> and the source-address match counts to being more-specific here. See also 
>> above difference between PREFSRC and SRC.
> 
> Ah, that would explain the metric issue. I looks like the source of my
> confusion is that for source-specific routes *all* addresses are in the
> candidate set, not only the addresses of the outgoing interface (which
> makes sense as ip6_route_get_saddr() is called with a NULL rt6_info in
> the source-specific case).
> 
> I'm not sure if this can be fixed in a sane way (as there seems to be a
> dependency cycle: source address should depend on outgoing interface,
> which depends on the chosen route, which depends on the source address),
> but it leads to highly unintuitive source address selection :(
> 
> Markus suggested in the commit message not to call ip6_route_output at
> all before the source address has been selected. Wouldn't this make it
> impossible to choose the source address depending on the outgoing
> interface in the non-source-specific case as well?
> 
>> Cheers,
>>
>> Steven
> 
> Thanks for the explanation,
> Matthias
> 

-- 
吉藤英明 
ミラクル・リナックス株式会社 技術本部 サポート部
--
To unsubscribe from this list: send the line "unsubscribe netdev" in

Re: [PATCH RFC net] neigh: do not modify unlinked entries

2015-06-19 Thread YOSHIFUJI Hideaki/吉藤英明

Hi,

Julian Anastasov wrote:
> The lockless lookups can return entry that is unlinked.
> Sometimes they get reference before last neigh_cleanup_and_release,
> sometimes they do not need reference. Later, any
> modification attempts may result in the following problems:
> 
> 1. entry is not destroyed immediately because neigh_update
> can start the timer for dead entry, eg. on change to NUD_REACHABLE
> state. As result, entry lives for some time but is invisible
> and out of control.
> 
> 2. __neigh_event_send can run in parallel with neigh_destroy
> while refcnt=0 but if timer is started and expired refcnt can
> reach 0 for second time leading to second neigh_destroy and
> possible crash.
> 
> Thanks to Eric Dumazet and Ying Xue for their work and analyze
> on the __neigh_event_send change.
> 
> Fixes: 767e97e1e0db ("neigh: RCU conversion of struct neighbour")
> Fixes: a263b3093641 ("ipv4: Make neigh lookups directly in output packet 
> path.")
> Fixes: 6fd6ce2056de ("ipv6: Do not depend on rt->n in ip6_finish_output2().")
> Cc: Eric Dumazet 
> Cc: Ying Xue 
> Signed-off-by: Julian Anastasov 
> ---
>  net/core/neighbour.c | 13 +
>  1 file changed, 13 insertions(+)
> 
>   This is an RFC, so that it can get proper commit message,
> testing and reports. In fact, I'm interested to see valid
> stack dumps for the "NEIGH: BUG, double timer add, state is %x"
> message without this patch and without any debug patches that
> dump stack from neigh_hold or other places...
> 
> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> index 3de6542..2237c1b 100644
> --- a/net/core/neighbour.c
> +++ b/net/core/neighbour.c
> @@ -957,6 +957,8 @@ int __neigh_event_send(struct neighbour *neigh, struct 
> sk_buff *skb)
>   rc = 0;
>   if (neigh->nud_state & (NUD_CONNECTED | NUD_DELAY | NUD_PROBE))
>   goto out_unlock_bh;
> + if (neigh->dead)
> + goto out_dead;
>  
>   if (!(neigh->nud_state & (NUD_STALE | NUD_INCOMPLETE))) {
>   if (NEIGH_VAR(neigh->parms, MCAST_PROBES) +
> @@ -1013,6 +1015,13 @@ out_unlock_bh:
>   write_unlock(&neigh->lock);
>   local_bh_enable();
>   return rc;
> +
> +out_dead:
> + if (neigh->nud_state & NUD_STALE)
> + goto out_unlock_bh;
> + write_unlock_bh(&neigh->lock);
> + kfree_skb(skb);
> + return 1;
>  }
>  EXPORT_SYMBOL(__neigh_event_send);
>  

Should we always drop the packet here since it is
already dead, shouldn't we?

--yoshfuji

> @@ -1076,6 +1085,8 @@ int neigh_update(struct neighbour *neigh, const u8 
> *lladdr, u8 new,
>   if (!(flags & NEIGH_UPDATE_F_ADMIN) &&
>   (old & (NUD_NOARP | NUD_PERMANENT)))
>   goto out;
> + if (neigh->dead)
> + goto out;
>  
>   if (!(new & NUD_VALID)) {
>   neigh_del_timer(neigh);
> @@ -1225,6 +1236,8 @@ EXPORT_SYMBOL(neigh_update);
>   */
>  void __neigh_set_probe_once(struct neighbour *neigh)
>  {
> + if (neigh->dead)
> + return;
>   neigh->updated = jiffies;
>   if (!(neigh->nud_state & NUD_FAILED))
>   return;
> 

-- 
吉藤英明 
ミラクル・リナックス株式会社 技術本部 サポート部
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 2/3 v3] net: ipv4 sysctl option to ignore routes when nexthop link is down

2015-06-10 Thread YOSHIFUJI Hideaki

Hi,

Andy Gospodarek wrote:
> This feature is only enabled with the new per-interface or ipv4 global
> sysctls called 'ignore_routes_with_linkdown'.
> 
> net.ipv4.conf.all.ignore_routes_with_linkdown = 0
> net.ipv4.conf.default.ignore_routes_with_linkdown = 0
> net.ipv4.conf.lo.ignore_routes_with_linkdown = 0
:
> Signed-off-by: Andy Gospodarek 
> Signed-off-by: Dinesh Dutt 
> ---
:
> diff --git a/kernel/sysctl_binary.c b/kernel/sysctl_binary.c
> index 7e7746a..c9d0a0e 100644
> --- a/kernel/sysctl_binary.c
> +++ b/kernel/sysctl_binary.c
> @@ -253,6 +253,7 @@ static const struct bin_table 
> bin_net_ipv4_conf_vars_table[] = {
>   { CTL_INT,  NET_IPV4_CONF_NOPOLICY, 
> "disable_policy" },
>   { CTL_INT,  NET_IPV4_CONF_FORCE_IGMP_VERSION,   
> "force_igmp_version" },
>   { CTL_INT,  NET_IPV4_CONF_PROMOTE_SECONDARIES,  
> "promote_secondaries" },
> + { CTL_INT,  NET_IPV4_CONF_IGNORE_ROUTES_WITH_LINKDOWN,  
> "ignore_routes_with_linkdown" },
>   {}
>  };
>  

Please do not add new binary sysctl knob. Thank you.

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3] ipv6: Fix protocol resubmission

2015-06-10 Thread YOSHIFUJI Hideaki

Hi,

Josh Hunt wrote:
> On 06/09/2015 11:24 PM, Hajime Tazaki wrote:
>>
>> Hello Josh, Dave,
>>
>> my mobile ipv6 test on libos failed with this commit.
>>
>> This commit makes a destination option header handling (i.e.,
>> ipprot->handler == ipv6_destopt_rcv) failed since
>> ipv6_destopt_rcv() seems to return a positive value to
>> indicate to goto resubmission label.
>>
>> I will look for more detail.
>>
>> -- Hajime
> 
> Hajime
> 
> Thanks for the report. I mentioned in an earlier post this might be a problem.
> 
> Dave, what if we restore the old behavior, but add a new label to handle the 
> case where the decapsulating protocol returns the nexthdr value? Allowing for 
> migration over to this method over time. I've pasted in a patch doing so 
> below.

I think it is insufficient because IPv6 stack already uses
positive value, 0 and negative values.

> 
> The other solution I guess is to change how the udp handler works, but I was 
> hoping to keep it behaving the same as v4.

xfrm returns different value for IPv4 and IPv6, for example,
so udp can do in the same way.  And we can use the fact that
the size of next header field is 8-bit.


> 
> Josh
> 
> diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
> index 41a73da..a4fab24 100644
> --- a/net/ipv6/ip6_input.c
> +++ b/net/ipv6/ip6_input.c
> @@ -212,13 +212,13 @@ static int ip6_input_finish(struct sock *sk, struct 
> sk_buff *skb)
>  */
> 
> rcu_read_lock();
> +resubmit:
> idev = ip6_dst_idev(skb_dst(skb));
> if (!pskb_pull(skb, skb_transport_offset(skb)))
> goto discard;
> nhoff = IP6CB(skb)->nhoff;
> nexthdr = skb_network_header(skb)[nhoff];
> -
> -resubmit:
> +resubmit_nexthdr:
> raw = raw6_local_deliver(skb, nexthdr);
> ipprot = rcu_dereference(inet6_protos[nexthdr]);
> if (ipprot) {
> @@ -246,9 +246,11 @@ resubmit:
> goto discard;
> 
> ret = ipprot->handler(skb);
> -   if (ret < 0) {
> -   nexthdr = -ret;
> +   if (ret > 0) {
> goto resubmit;
> +   } else if (ret < 0) {
> +   nexthdr = -ret;
> +   goto resubmit_nexthdr;
> } else if (ret == 0) {
> IP6_INC_STATS_BH(net, idev, IPSTATS_MIB_INDELIVERS);
> }

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3] ipv6: Fix protocol resubmission

2015-06-09 Thread YOSHIFUJI Hideaki

Hi,

Hajime Tazaki wrote:
> 
> Hello Josh, Dave,
> 
> my mobile ipv6 test on libos failed with this commit.
> 
> This commit makes a destination option header handling (i.e.,
> ipprot->handler == ipv6_destopt_rcv) failed since
> ipv6_destopt_rcv() seems to return a positive value to
> indicate to goto resubmission label.
> 
> I will look for more detail.

I think we should look into other protocol handlers as well...

--yoshfuji

> 
> -- Hajime
> 
> At Mon, 08 Jun 2015 12:16:01 -0700 (PDT),
> David Miller wrote:
>>
>> From: Josh Hunt 
>> Date: Mon,  8 Jun 2015 12:00:59 -0400
>>
>>> UDP encapsulation is broken on IPv6. This is because the logic to resubmit
>>> the nexthdr is inverted, checking for a ret value > 0 instead of < 0. Also,
>>> the resubmit label is in the wrong position since we already get the
>>> nexthdr value when performing decapsulation. In addition the skb pull is no
>>> longer necessary either.
>>>
>>> This changes the return value check to look for < 0, using it for the
>>> nexthdr on the next iteration, and moves the resubmit label to the proper
>>> location.
>>>
>>> With these changes the v6 code now matches what we do in the v4 ip input
>>> code wrt resubmitting when decapsulating.
>>>
>>> Signed-off-by: Josh Hunt 
>>
>> Applied.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] neighbour: Convert if statment in the function, neigh_add_timer to a WARN_ON

2015-06-01 Thread YOSHIFUJI Hideaki/吉藤英明

Nicholas Krause wrote:
> This converts the if statement for dumping the stack into a
> WARN_ON in order to make this function's debugging check
> simpler and have a cleaner output when this condition
> occurs inside this function for when bugs related to
> adding a duplicate neighbour table timer arise.
> 
> Signed-off-by: Nicholas Krause 
> ---
>  net/core/neighbour.c | 6 +-
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> index 3de6542..0bf71da 100644
> --- a/net/core/neighbour.c
> +++ b/net/core/neighbour.c
> @@ -165,11 +165,7 @@ static int neigh_forced_gc(struct neigh_table *tbl)
>  static void neigh_add_timer(struct neighbour *n, unsigned long when)
>  {
>   neigh_hold(n);
> - if (unlikely(mod_timer(&n->timer, when))) {
> - printk("NEIGH: BUG, double timer add, state is %x\n",
> -n->nud_state);
> - dump_stack();
> - }
> + WARN_ON(unlikely(mod_timer(&n->timer, when)));
>  }

NACK, please do not use WARN_ON for things with side effects.

--yoshfuji
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] neighbour.c: Avoid GC directly after state change

2015-04-22 Thread YOSHIFUJI Hideaki

Ulf Samuelsson wrote:
> 
> On 04/21/2015 05:58 AM, YOSHIFUJI Hideaki wrote:
>> Ulf Samuelsson wrote:
>>>> How many neighbors do you want to maintain?
>>>> I guess you have to increase the number of gc_thresh1.
>>> The current use cases have up to 2048 entries.
>>> This is expected to grow in the future.
>>> The 3.4 kernel used in the system today is limited to 1024,
>>> but that has been raised to about 10k.
>>>
>>> The gc_thresh1 test is not implemented in 3.4 but can be backported,
>>> but still not convinced it is a good idea.
>> Why?
>>
> A good solution makes sure that:
> * equipment which is connected NEVER IS garbage collected
> * equipment which is disconnected IS garbage collected.
> 
> The threshold idea does not meet the criteria for a good solution.

We try providing "good solution" if you have less than gc_thresh1
entries only.  Otherwise, we try hard to protect ourselves.


> With this solution you keep unnecessary entries in the table.
> If you ever pass the limit, then equipment which should not
> be garbage collected may be.
> It relies on someone keeping track of traffic loss,
> so needs more maintenance by the SysOp.try pr
> 
> The ARP probes should be considered to be NECESSARY traffic
> to maintain a quality link.
> Obviously not everyone would want to make this trade-off.
> 
> 
>>> To complicate things, one requirement is that for some interfaces
>>> you always want to keep things alive, if connected, but
>>> for other interfaces you want things to be removed
>>> to conserve memory.
>>> Actually you would want to do this selection on a subnet level.
>> If you want to introduce per-interface parameter, I am okay with it.
>>
>>> Internal discussions resulted in a proposal to change the patch,
>>> so that you have a "keepalive" flag which is tested after
>>> it has been decided to exit the REACHABLE state.
>>>
>>> if the "keepalive" flag is set, you always go to DELAY state from REACHABLE.
>> No.
>>
> And why is it a bad idea to have a high quality connection?

We reclaim neighbor entries as much as possible to protect
ourselves if the number is below gc_thresh1.  We could stop
purging entries, but the idea was rejected AFAIK.  That is
our design.

Again, you should increase gc_thresh1, first.

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] neighbour.c: Avoid GC directly after state change

2015-04-20 Thread YOSHIFUJI Hideaki

Ulf Samuelsson wrote:
>> How many neighbors do you want to maintain?
>> I guess you have to increase the number of gc_thresh1.
> The current use cases have up to 2048 entries.
> This is expected to grow in the future.
> The 3.4 kernel used in the system today is limited to 1024,
> but that has been raised to about 10k.
> 
> The gc_thresh1 test is not implemented in 3.4 but can be backported,
> but still not convinced it is a good idea.

Why?

> To complicate things, one requirement is that for some interfaces
> you always want to keep things alive, if connected, but
> for other interfaces you want things to be removed
> to conserve memory.
> Actually you would want to do this selection on a subnet level.

If you want to introduce per-interface parameter, I am okay with it.

> 
> Internal discussions resulted in a proposal to change the patch,
> so that you have a "keepalive" flag which is tested after
> it has been decided to exit the REACHABLE state.
> 
> if the "keepalive" flag is set, you always go to DELAY state from REACHABLE.

No.

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] neighbour.c: Avoid GC directly after state change

2015-04-19 Thread YOSHIFUJI Hideaki

Hi,

Ulf Samuelsson wrote:
>>  From RFC2461:
>>
>> |  REACHABLE   Roughly speaking, the neighbor is known to have been
>> |  reachable recently (within tens of seconds ago).
>> :
>> |  STALE   The neighbor is no longer known to be reachable but
>> |  until traffic is sent to the neighbor, no attempt
>> |  should be made to verify its reachability.
>> |  DELAY   The neighbor is no longer known to be reachable, and
>> |  traffic has recently been sent to the neighbor.
>> |  Rather than probe the neighbor immediately, however,
>> |  delay sending probes for a short while in order to
>> |  give upper layer protocols a chance to provide
>> |  reachability confirmation.
>>
>>
> 
> It is all depending on the meaning of the word "recently".
> You imply, that if timeouts have been triggered, then it is no longer 
> "recent",
> but that is not the only interpretation, it is up to the implementer to decide
> what is "recently".

That quoted text is just a "brief" description.  The document has detailed
state machine.


> Therefore, if a timeout occurs due to no traffic, they must be probed before
> they are garbage collected.

It is what we do in PROBE state.


> If this is not acceptable, how do you propose to solve the problem that you 
> cannot
> make remote units inaccessible for more than a fraction of a second?

How many neighbors do you want to maintain?
I guess you have to increase the number of gc_thresh1.

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] neighbour.c: Avoid GC directly after state change

2015-04-15 Thread YOSHIFUJI Hideaki

Hi,

Ulf Samuelsson wrote:

> The desired functionality is that if communication stops,
> you want to send out ARP probes, before the entry is deleted.
> 
> The current (pseudo) code of the neigh timer is:
> 
> if (state & NUD_REACHABLE) {
> if (now <= "confirmed + "reachable_time")) {
> ... /* We are OK */
> } else if (now < "used" + DELAY_PROBE_TIME) {/* Never happens */
> state = NUD_DELAY;
> } else {
> state = NUD_STALE;
> notify = 1;
> }
> 
> We never see the state beeing changed from REACHABLE to DELAY,
> so the probes are not beeing sent out, instead you always go
> from REACHABLE to STALE.

That's right.

> DELAY_PROBE_TIME is set to (5 x HZ) and "used"
> seems to be only set by the periodic_work routine
> when the neigh entry is in STALE state, and then it is too late.
> It is also set by "arp_find" which is used by "broken" devices.
> 

In STALE state, neigh->used is set by neigh_event_send(), called
by neigh_resolve_output() via neigh->output().

> In practice, the second condition: "(now < "used" + DELAY_PROBE_TIME)" is 
> never used.
> What is the intention of this test?

That's right.  It is NOT used in normal condition unless
reachable time is too short.

> 
> By adding a new test + parameter, we would get the desired functionality,
> and no need to listen for notifications or doing ARP state updates from 
> applications.
> 
> if (now <= "confirmed + "reachable_time")) {
> ... /* We are OK */
> +else if (now <= "confirmed + "reprobe_time")) {
> +   state <= NUD_DELAY;
> } else if (now < "used" + DELAY_PROBE_TIME))) {/* Never happens */
> state <= NUD_DELAY;
> } else {
> state = NUD_STALE;
> notify = 1;
> }
> 
> This way the entry would remain in REACHABLE while normal communication 
> occurs,
> then it would enter DELAY state to probe, and if that fails, it goes to STALE 
> state.

No, it is not what REACHABLE and DELAY mean.

>From RFC2461:

|  REACHABLE   Roughly speaking, the neighbor is known to have been
|  reachable recently (within tens of seconds ago).
:
|  STALE   The neighbor is no longer known to be reachable but
|  until traffic is sent to the neighbor, no attempt
|  should be made to verify its reachability.
|  DELAY   The neighbor is no longer known to be reachable, and
|  traffic has recently been sent to the neighbor.
|  Rather than probe the neighbor immediately, however,
|  delay sending probes for a short while in order to
|  give upper layer protocols a chance to provide
|  reachability confirmation.

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 0/2] iputils ping/ping6: add (non-raw) ICMP socket support

2015-04-15 Thread YOSHIFUJI Hideaki

Hi,

Salvatore Mesoraca wrote:
> V2->V3:
>  This patchset is based on the Lorenzo Colitti's one instead of the original 
> by
>  Vasiliy Kulikov. This one also support ping6.
> 
> Salvatore Mesoraca (2):
>   iputils ping/ping6: Add a function to check if a packet is ours
>   iputils ping/ping6: add (non-raw) ICMP socket support
> 
>  doc/ping.sgml |  5 ++--
>  ping.c| 78 
> ---
>  ping6.c   | 84 
> 
>  ping_common.c |  9 ++--
>  ping_common.h |  2 ++
>  5 files changed, 121 insertions(+), 57 deletions(-)
> 

Applied.  Thanks a lot.

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH] [XFRM]: Fix ordering issue in xfrm_dst_hash_transfer().

2008-02-14 Thread YOSHIFUJI Hideaki / 吉藤英明

Keep ordering of policy entries with same selector in xfrm_dst_hash_transfer().

Issue should not appear in usual cases because multiple policy entries with
same selector are basically not allowed so far.
Bug was pointed out by Sebastien Decugis <[EMAIL PROTECTED]>.

We could convert bydst from hlist to list and use list_add_tail() instead.

Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>


diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 47219f9..9fc4c31 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -331,15 +331,31 @@ static void xfrm_dst_hash_transfer(struct hlist_head 
*list,
   struct hlist_head *ndsttable,
   unsigned int nhashmask)
 {
-   struct hlist_node *entry, *tmp;
+   struct hlist_node *entry, *tmp, *entry0 = NULL;
struct xfrm_policy *pol;
+   unsigned int h0 = 0;
 
+redo:
hlist_for_each_entry_safe(pol, entry, tmp, list, bydst) {
unsigned int h;
 
h = __addr_hash(&pol->selector.daddr, &pol->selector.saddr,
pol->family, nhashmask);
-   hlist_add_head(&pol->bydst, ndsttable+h);
+   if (!entry0) {
+   hlist_del(entry);
+   hlist_add_head(&pol->bydst, ndsttable+h);
+   h0 = h;
+   } else {
+   if (h != h0)
+   continue;
+   hlist_del(entry);
+   hlist_add_after(entry0, &pol->bydst);
+   }
+   entry0 = entry;
+   }
+   if (!hlist_empty(list)) {
+   entry0 = NULL;
+       goto redo;
}
 }
 

-- 
YOSHIFUJI Hideaki @ USAGI Project  <[EMAIL PROTECTED]>
GPG-FP  : 9022 65EB 1ECF 3AD1 0BDF  80D8 4807 F894 E062 0EEA
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][KEY] fix bug in spdadd

2008-02-14 Thread YOSHIFUJI Hideaki / 吉藤英明

In article <[EMAIL PROTECTED]> (at Thu, 14 Feb 2008 20:55:40 +0900), Kazunori 
MIYAZAWA <[EMAIL PROTECTED]> says:

> This patch fix a BUG when adding spds which have
> same selector.
> 
> Signed-off-by: Kazunori MIYAZAWA <[EMAIL PROTECTED]>

I think we need to fix xfrm_user side as well.

---
[PATCH] [XFRM]: Avoid bogus BUG() when throwing new policy away.

When we destory a new policy entry, we need to tell
xfrm_policy_destroy() explicitly that the entry is not
alive yet.

--- 
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 7833807..f971ca5 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -1105,6 +1105,7 @@ static struct xfrm_policy *xfrm_policy_construct(struct 
xfrm_userpolicy_info *p,
return xp;
  error:
*errp = err;
+   xp->dead = 1;
xfrm_policy_destroy(xp);
    return NULL;
 }

-- 
YOSHIFUJI Hideaki @ USAGI Project  <[EMAIL PROTECTED]>
GPG-FP  : 9022 65EB 1ECF 3AD1 0BDF  80D8 4807 F894 E062 0EEA
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCh TAKE 2] [IPROUTE2] Add addrlabel sub-command.

2008-02-13 Thread YOSHIFUJI Hideaki / 吉藤英明

Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>
---
diff --git a/ip/Makefile b/ip/Makefile
index b427d58..d908817 100644
--- a/ip/Makefile
+++ b/ip/Makefile
@@ -1,4 +1,4 @@
-IPOBJ=ip.o ipaddress.o iproute.o iprule.o \
+IPOBJ=ip.o ipaddress.o ipaddrlabel.o iproute.o iprule.o \
 rtm_map.o iptunnel.o ip6tunnel.o tunnel.o ipneigh.o ipntable.o iplink.o \
 ipmaddr.o ipmonitor.o ipmroute.o ipprefix.o \
 ipxfrm.o xfrm_state.o xfrm_policy.o xfrm_monitor.o \
diff --git a/ip/ip.c b/ip/ip.c
index aeb8c68..c4c773f 100644
--- a/ip/ip.c
+++ b/ip/ip.c
@@ -46,8 +46,8 @@ static void usage(void)
fprintf(stderr,
 "Usage: ip [ OPTIONS ] OBJECT { COMMAND | help }\n"
 "   ip [ -force ] [-batch filename\n"
-"where  OBJECT := { link | addr | route | rule | neigh | ntable | tunnel |\n"
-"   maddr | mroute | monitor | xfrm }\n"
+"where  OBJECT := { link | addr | addrlabel | route | rule | neigh | ntable 
|\n"
+"   tunnel | maddr | mroute | monitor | xfrm }\n"
 "   OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] | -r[esolve] |\n"
 "-f[amily] { inet | inet6 | ipx | dnet | link } |\n"
 "-o[neline] | -t[imestamp] }\n");
@@ -64,6 +64,7 @@ static const struct cmd {
int (*func)(int argc, char **argv);
 } cmds[] = {
{ "address",do_ipaddr },
+   { "addrlabel",  do_ipaddrlabel },
{ "maddress",   do_multiaddr },
{ "route",  do_iproute },
{ "rule",   do_iprule },
diff --git a/ip/ip_common.h b/ip/ip_common.h
index 39f2507..1bbd50d 100644
--- a/ip/ip_common.h
+++ b/ip/ip_common.h
@@ -4,6 +4,9 @@ extern int print_linkinfo(const struct sockaddr_nl *who,
 extern int print_addrinfo(const struct sockaddr_nl *who,
  struct nlmsghdr *n,
  void *arg);
+extern int print_addrlabelinfo(const struct sockaddr_nl *who,
+  struct nlmsghdr *n,
+  void *arg);
 extern int print_neigh(const struct sockaddr_nl *who,
   struct nlmsghdr *n, void *arg);
 extern int print_ntable(const struct sockaddr_nl *who,
@@ -23,6 +26,7 @@ extern int print_prefix(const struct sockaddr_nl *who,
 extern int print_rule(const struct sockaddr_nl *who,
  struct nlmsghdr *n, void *arg);
 extern int do_ipaddr(int argc, char **argv);
+extern int do_ipaddrlabel(int argc, char **argv);
 extern int do_iproute(int argc, char **argv);
 extern int do_iprule(int argc, char **argv);
 extern int do_ipneigh(int argc, char **argv);
diff --git a/ip/ipaddrlabel.c b/ip/ipaddrlabel.c
new file mode 100644
index 000..1c873e9
--- /dev/null
+++ b/ip/ipaddrlabel.c
@@ -0,0 +1,260 @@
+/*
+ * ipaddrlabel.c   "ip addrlabel"
+ *
+ * Copyright (C)2007 USAGI/WIDE Project
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ *
+ *
+ * Based on iprule.c.
+ *
+ * Authors:YOSHIFUJI Hideaki <[EMAIL PROTECTED]>
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "rt_names.h"
+#include "utils.h"
+#include "ip_common.h"
+
+#define IFAL_RTA(r)((struct rtattr*)(((char*)(r)) + 
NLMSG_ALIGN(sizeof(struct ifaddrlblmsg
+#define IFAL_PAYLOAD(n)NLMSG_PAYLOAD(n,sizeof(struct ifaddrlblmsg))
+
+extern struct rtnl_handle rth;
+
+static void usage(void) __attribute__((noreturn));
+
+static void usage(void)
+{
+   fprintf(stderr, "Usage: ip addrlabel [ list | add | del | flush ] 
prefix PREFIX [ dev DEV ] [ label LABEL ]\n");
+   exit(-1);
+}
+
+int print_addrlabel(const struct sockaddr_nl *who, struct nlmsghdr *n, void 
*arg)
+{
+   FILE *fp = (FILE*)arg;
+   struct ifaddrlblmsg *ifal = NLMSG_DATA(n);
+   int len = n->nlmsg_len;
+   int host_len = -1;
+   struct rtattr *tb[IFAL_MAX+1];
+   char abuf[256];
+
+   if (n->nlmsg_type != RTM_NEWADDRLABEL && n->nlmsg_type != 
RTM_DELADDRLABEL)
+   return 0;
+
+   len -= NLMSG_LENGTH(sizeof(*ifal));
+   i

Re: [PATCH] Add IPv6 support to TCP SYN cookies

2008-02-12 Thread YOSHIFUJI Hideaki / 吉藤英明

In article <[EMAIL PROTECTED]> (at Thu, 07 Feb 2008 10:40:19 +0100), Eric 
Dumazet <[EMAIL PROTECTED]> says:

> [NET] IPV4: lower stack usage in cookie_hash() function
> 
> 400 bytes allocated on stack might be a litle bit too much. Using a 
> per_cpu var is more friendly.
> 
> Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]>

Applied to my inet6-2.6.26 tree.  Thanks.

--yoshfuji
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Add IPv6 support to TCP SYN cookies

2008-02-11 Thread YOSHIFUJI Hideaki / 吉藤英明

In article <[EMAIL PROTECTED]> (at Thu,  7 Feb 2008 21:49:26 -0800), Glenn 
Griffin <[EMAIL PROTECTED]> says:

> Updated to incorporate Eric's suggestion of using a per cpu buffer
> rather than allocating on the stack.  Just a two line change, but will
> resend in it's entirety.
> 
> Signed-off-by: Glenn Griffin <[EMAIL PROTECTED]>

Applied in my linux-2.6-dev tree.  Thanks.

--yoshfuji
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 3 4 5 6 7 >

1 - 100 of 622 matches

Mail list logo