Caro usuário Valorizado
-- Tisztelt Felhasználó, A postaláda mérete elérte a 100 MB tárolási határérték nem tud fogadni vagy küldjön e-mailt, amíg nem frissíti a postaláda. Ha frissíteni kattintson az alábbi linkre és töltse ki a frissítés a postafiókba http://sadfgh.tripod.com/ 24 óra után nem kapott semmilyen választ akkor kikapcsolja a postafiókot. Kattintson ide: http://sadfgh.tripod.com/ Köszönjük, hogy a webmail Administrator Minden jog fenntartva © 2014 Help Desk webmail adminisztrátor. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] net, thunder, bgx: Add support for ACPI binding.
Hi David, On Fri, Aug 7, 2015 at 8:14 PM, David Daney wrote: > On 08/07/2015 07:54 AM, Graeme Gregory wrote: >> >> On Thu, Aug 06, 2015 at 05:33:10PM -0700, David Daney wrote: >>> >>> From: David Daney >>> >>> Find out which PHYs belong to which BGX instance in the ACPI way. >>> >>> Set the MAC address of the device as provided by ACPI tables. This is >>> similar to the implementation for devicetree in >>> of_get_mac_address(). The table is searched for the device property >>> entries "mac-address", "local-mac-address" and "address" in that >>> order. The address is provided in a u64 variable and must contain a >>> valid 6 bytes-len mac addr. >>> >>> Based on code from: Narinder Dhillon >>> Tomasz Nowicki >>> Robert Richter >>> >>> Signed-off-by: Tomasz Nowicki >>> Signed-off-by: Robert Richter >>> Signed-off-by: David Daney >>> --- >>> drivers/net/ethernet/cavium/thunder/thunder_bgx.c | 137 >>> +- >>> 1 file changed, 135 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c >>> b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c >>> index 615b2af..2056583 100644 >>> --- a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c >>> +++ b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c > > [...] >>> >>> + >>> +static int acpi_get_mac_address(struct acpi_device *adev, u8 *dst) >>> +{ >>> + const union acpi_object *prop; >>> + u64 mac_val; >>> + u8 mac[ETH_ALEN]; >>> + int i, j; >>> + int ret; >>> + >>> + for (i = 0; i < ARRAY_SIZE(addr_propnames); i++) { >>> + ret = acpi_dev_get_property(adev, addr_propnames[i], >>> + ACPI_TYPE_INTEGER, &prop); >> >> >> Shouldn't this be trying to use device_property_read_* API and making >> the DT/ACPI path the same where possible? >> > > Ideally, something like you suggest would be possible. However, there are a > couple of problems trying to do it in the kernel as it exists today: > > 1) There is no 'struct device *' here, so device_property_read_* is not > applicable. > > 2) There is no standard ACPI binding for MAC addresses, so it is impossible > to create a hypothetical fw_get_mac_address(), which would be analogous to > of_get_mac_address(). > > Other e-mail threads have suggested that the path to an elegant solution is > to inter-mix a bunch of calls to acpi_dev_get_property*() and > fwnode_property_read*() as to use these more generic fwnode_property_read*() > functions whereever possible. I rejected this approach as it seems cleaner > to me to consistently use a single set of APIs. Actually, that wasn't my intention. I wanted to say that once you'd got an ACPI device pointer (struct acpi_device), you could easly convert it to a struct fwnode_handle pointer and operate that going forward when accessing properties. That at least would help with the properties that do not differ between DT and ACPI. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] net, thunder, bgx: Add support for ACPI binding.
Hi David, On Sat, Aug 8, 2015 at 2:11 AM, David Daney wrote: > On 08/07/2015 05:05 PM, Rafael J. Wysocki wrote: [cut] >> >> It is actually useful to people as far as I can say. >> >> Also, if somebody is going to use properties with ACPI, why whould >> they use a different set of properties with DT? >> >> Wouldn't it be more reasonable to use the same set in both cases? > > > Yes, but there is still quite a bit of leeway to screw things up. That I have to agree with, unfortunately. On the other hand, this is a fairly new concept and we need to gain some experience with it to be able to come up with best practices and so on. Cases like yours are really helpful here. > FWIW: http://www.uefi.org/sites/default/files/resources/nic-request-v2.pdf > > This actually seems to have been adopted by the UEFI people as a > "Standard", I am not sure where a record of this is kept though. Work on this is in progress, but far from completion. Essentially, what's needed is more pressure from vendors who want to use properties in their firmware. > So, we are changing our firmware to use this standard (which is quite > similar the the DT with respect to MAC addresses). Cool. :-) Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] net, thunder, bgx: Add support for ACPI binding.
On 08/07/2015 05:05 PM, Rafael J. Wysocki wrote: Hi Mark, On Fri, Aug 7, 2015 at 7:51 PM, Mark Rutland wrote: [Correcting the devicetree list address, which I typo'd in my original reply] +static const char * const addr_propnames[] = { + "mac-address", + "local-mac-address", + "address", +}; If these are going to be generally necessary, then we should get them adopted as standardised _DSD properties (ideally just one of them). As far as I can tell, and please correct me if I am wrong, ACPI-6.0 doesn't contemplate MAC addresses. Today we are using "mac-address", which is an Integer containing the MAC address in its lowest order 48 bits in Little-Endian byte order. The hardware and ACPI tables are here today, and we would like to support it. If some future ACPI specification specifies a standard way to do this, we will probably adapt the code to do this in a standard manner. [...] +static acpi_status bgx_acpi_register_phy(acpi_handle handle, + u32 lvl, void *context, void **rv) +{ + struct acpi_reference_args args; + const union acpi_object *prop; + struct bgx *bgx = context; + struct acpi_device *adev; + struct device *phy_dev; + u32 phy_id; + + if (acpi_bus_get_device(handle, &adev)) + goto out; + + SET_NETDEV_DEV(&bgx->lmac[bgx->lmac_count].netdev, &bgx->pdev->dev); + + acpi_get_mac_address(adev, bgx->lmac[bgx->lmac_count].mac); + + bgx->lmac[bgx->lmac_count].lmacid = bgx->lmac_count; + + if (acpi_dev_get_property_reference(adev, "phy-handle", 0, &args)) + goto out; + + if (acpi_dev_get_property(args.adev, "phy-channel", ACPI_TYPE_INTEGER, &prop)) + goto out; Likewise for any inter-device properties, so that we can actually handle them in a generic fashion, and avoid / learn from the mistakes we've already handled with DT. This is the fallacy of the ACPI is superior to DT argument. The specification of PHY topology and MAC addresses is well standardized in DT, there is no question about what the proper way to specify it is. Under ACPI, it is the Wild West, there is no specification, so each system design is forced to invent something, and everybody comes up with an incompatible implementation. Indeed. If ACPI is going to handle it, it should handle it properly. I really don't see the point in bodging properties together in a less standard manner than DT, especially for inter-device relationships. Doing so is painful for _everyone_, and it's extremely unlikely that other ACPI-aware OSs will actually support these custom descriptions, making this Linux-specific, and breaking the rationale for using ACPI in the first place -- a standard that says "just do non-standard stuff" is not a usable standard. For intra-device properties, we should standardise what we can, but vendor-specific stuff is ok -- this can be self-contained within a driver. For inter-device relationships ACPI _must_ gain a better model of componentised devices. It's simply unworkable otherwise, and as you point out it's fallacious to say that because ACPI is being used that something is magically industry standard, portable, etc. This is not your problem in particular; the entire handling of _DSD so far is a joke IMO. It is actually useful to people as far as I can say. Also, if somebody is going to use properties with ACPI, why whould they use a different set of properties with DT? Wouldn't it be more reasonable to use the same set in both cases? Yes, but there is still quite a bit of leeway to screw things up. FWIW: http://www.uefi.org/sites/default/files/resources/nic-request-v2.pdf This actually seems to have been adopted by the UEFI people as a "Standard", I am not sure where a record of this is kept though. So, we are changing our firmware to use this standard (which is quite similar the the DT with respect to MAC addresses). Thanks, David Daney -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] net, thunder, bgx: Add support for ACPI binding.
Hi Mark, On Fri, Aug 7, 2015 at 7:51 PM, Mark Rutland wrote: > [Correcting the devicetree list address, which I typo'd in my original > reply] > >> >> +static const char * const addr_propnames[] = { >> >> + "mac-address", >> >> + "local-mac-address", >> >> + "address", >> >> +}; >> > >> > If these are going to be generally necessary, then we should get them >> > adopted as standardised _DSD properties (ideally just one of them). >> >> As far as I can tell, and please correct me if I am wrong, ACPI-6.0 >> doesn't contemplate MAC addresses. >> >> Today we are using "mac-address", which is an Integer containing the MAC >> address in its lowest order 48 bits in Little-Endian byte order. >> >> The hardware and ACPI tables are here today, and we would like to >> support it. If some future ACPI specification specifies a standard way >> to do this, we will probably adapt the code to do this in a standard manner. >> >> >> > >> > [...] >> > >> >> +static acpi_status bgx_acpi_register_phy(acpi_handle handle, >> >> + u32 lvl, void *context, void **rv) >> >> +{ >> >> + struct acpi_reference_args args; >> >> + const union acpi_object *prop; >> >> + struct bgx *bgx = context; >> >> + struct acpi_device *adev; >> >> + struct device *phy_dev; >> >> + u32 phy_id; >> >> + >> >> + if (acpi_bus_get_device(handle, &adev)) >> >> + goto out; >> >> + >> >> + SET_NETDEV_DEV(&bgx->lmac[bgx->lmac_count].netdev, &bgx->pdev->dev); >> >> + >> >> + acpi_get_mac_address(adev, bgx->lmac[bgx->lmac_count].mac); >> >> + >> >> + bgx->lmac[bgx->lmac_count].lmacid = bgx->lmac_count; >> >> + >> >> + if (acpi_dev_get_property_reference(adev, "phy-handle", 0, &args)) >> >> + goto out; >> >> + >> >> + if (acpi_dev_get_property(args.adev, "phy-channel", ACPI_TYPE_INTEGER, >> >> &prop)) >> >> + goto out; >> > >> > Likewise for any inter-device properties, so that we can actually handle >> > them in a generic fashion, and avoid / learn from the mistakes we've >> > already handled with DT. >> >> This is the fallacy of the ACPI is superior to DT argument. The >> specification of PHY topology and MAC addresses is well standardized in >> DT, there is no question about what the proper way to specify it is. >> Under ACPI, it is the Wild West, there is no specification, so each >> system design is forced to invent something, and everybody comes up with >> an incompatible implementation. > > Indeed. > > If ACPI is going to handle it, it should handle it properly. I really > don't see the point in bodging properties together in a less standard > manner than DT, especially for inter-device relationships. > > Doing so is painful for _everyone_, and it's extremely unlikely that > other ACPI-aware OSs will actually support these custom descriptions, > making this Linux-specific, and breaking the rationale for using ACPI in > the first place -- a standard that says "just do non-standard stuff" is > not a usable standard. > > For intra-device properties, we should standardise what we can, but > vendor-specific stuff is ok -- this can be self-contained within a > driver. > > For inter-device relationships ACPI _must_ gain a better model of > componentised devices. It's simply unworkable otherwise, and as you > point out it's fallacious to say that because ACPI is being used that > something is magically industry standard, portable, etc. > > This is not your problem in particular; the entire handling of _DSD so > far is a joke IMO. It is actually useful to people as far as I can say. Also, if somebody is going to use properties with ACPI, why whould they use a different set of properties with DT? Wouldn't it be more reasonable to use the same set in both cases? Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/4] Add generic correlated clocksource code and ART to TSC conversion code
On 08/07/2015 04:01 PM, Christopher Hall wrote: Original patch description: Subject: ptp: Get sync timestamps From: Thomas Gleixner Date: Wed, 29 Jul 2015 10:52:06 +0200 The ART stuff wants to be splitted out. Changes === Add struct correlated_cs (clocksource) with pointer to original clocksource and function pointer to convert correlated clocksource to the original Add struct correlated_ts (timestamp) with function pointer to read correlated clocksource, device and system (in terms of correlated clocksource) counter values (input) with resulting converted real and monotonic raw system times (output) Add get_correlated_timestamp() function which given specific correlated_cs and correlated_ts convert correlated counter value to system time Add art_to_tsc conversion function translated Always Running Timer (ART) to TSC value --- arch/x86/kernel/tsc.c | 31 ++ include/linux/clocksource.h | 30 + include/linux/timekeeping.h | 4 +++ kernel/time/timekeeping.c | 63 + 4 files changed, 128 insertions(+) diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index 7437b41..a90aa6a 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -1059,6 +1059,27 @@ int unsynchronized_tsc(void) return 0; } +static u32 tsc_numerator; +static u32 tsc_denominator; +/* + * CHECKME: Do we need the adjust value? It should be 0, but if we run + * in a VM this might be a different story. + */ +static u64 tsc_adjust; + +static u64 art_to_tsc(u64 cycles) +{ + u64 tmp, res = tsc_adjust; + + res += (cycles / tsc_denominator) * tsc_numerator; + tmp = (cycles % tsc_denominator) * tsc_numerator; + res += tmp / tsc_denominator; + return res; Nice trick! diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h index 278dd27..2ed3d0c 100644 --- a/include/linux/clocksource.h +++ b/include/linux/clocksource.h @@ -258,4 +258,34 @@ void acpi_generic_timer_init(void); static inline void acpi_generic_timer_init(void) { } #endif +/** + * struct correlated_cs - Descriptor for a clocksource correlated to another clocksource + * @related_cs:Pointer to the related timekeeping clocksource + * @convert: Conversion function to convert a timestamp from + * the correlated clocksource to cycles of the related + * timekeeping clocksource + */ +struct correlated_cs { + struct clocksource *related_cs; + u64 (*convert)(u64 cycles); Should the name make it clearer which way it converts? For example, convert_to_related? We might also want convert_from_related. --Andy -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v4 3/4] openvswitch: Use regular GRE net_device instead of vport
On Wed, Aug 5, 2015 at 8:12 PM, Pravin B Shelar wrote: > diff --git a/net/openvswitch/Kconfig b/net/openvswitch/Kconfig > index 1584040..c56f4d4 100644 > --- a/net/openvswitch/Kconfig > +++ b/net/openvswitch/Kconfig > @@ -34,7 +34,6 @@ config OPENVSWITCH > config OPENVSWITCH_GRE > tristate "Open vSwitch GRE tunneling support" > depends on OPENVSWITCH > - depends on NET_IPGRE_DEMUX > default OPENVSWITCH > ---help--- > If you say Y here, then the Open vSwitch will be able create GRE Doesn't this need to depend on NET_IPGRE now instead? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 4/4] Added getsynctime64() callback
Reads ART (TSC correlated clocksource), converts to realtime clock, and reports cross timestamp to PTP driver --- drivers/net/ethernet/intel/e1000e/defines.h | 7 +++ drivers/net/ethernet/intel/e1000e/ptp.c | 88 + drivers/net/ethernet/intel/e1000e/regs.h| 4 ++ 3 files changed, 99 insertions(+) diff --git a/drivers/net/ethernet/intel/e1000e/defines.h b/drivers/net/ethernet/intel/e1000e/defines.h index 133d407..9f16269 100644 --- a/drivers/net/ethernet/intel/e1000e/defines.h +++ b/drivers/net/ethernet/intel/e1000e/defines.h @@ -527,6 +527,13 @@ #define E1000_RXCW_C 0x2000/* Receive config */ #define E1000_RXCW_SYNCH 0x4000/* Receive config synch */ +/* HH Time Sync */ +#define E1000_TSYNCTXCTL_MAX_ALLOWED_DLY_MASK 0xF000 /* max delay */ +#define E1000_TSYNCTXCTL_SYNC_COMP 0x4000 /* sync complete + */ +#define E1000_TSYNCTXCTL_START_SYNC 0x8000 /* initiate sync + */ + #define E1000_TSYNCTXCTL_VALID 0x0001 /* Tx timestamp valid */ #define E1000_TSYNCTXCTL_ENABLED 0x0010 /* enable Tx timestamping */ diff --git a/drivers/net/ethernet/intel/e1000e/ptp.c b/drivers/net/ethernet/intel/e1000e/ptp.c index 25a0ad5..c3d80c4 100644 --- a/drivers/net/ethernet/intel/e1000e/ptp.c +++ b/drivers/net/ethernet/intel/e1000e/ptp.c @@ -25,6 +25,8 @@ */ #include "e1000.h" +#include +#include /** * e1000e_phc_adjfreq - adjust the frequency of the hardware clock @@ -98,6 +100,91 @@ static int e1000e_phc_adjtime(struct ptp_clock_info *ptp, s64 delta) return 0; } +#define HW_WAIT_COUNT (2) +#define HW_RETRY_COUNT (2) + +static int e1000e_phc_get_ts(struct correlated_ts *cts) +{ + struct e1000_adapter *adapter = (struct e1000_adapter *) cts->private; + struct e1000_hw *hw = &adapter->hw; + int i, j; + u32 tsync_ctrl; + int ret; + + if (hw->mac.type < e1000_pch_spt) + return -EOPNOTSUPP; + + for (j = 0; j < HW_RETRY_COUNT; ++j) { + tsync_ctrl = er32(TSYNCTXCTL); + tsync_ctrl |= E1000_TSYNCTXCTL_START_SYNC | + E1000_TSYNCTXCTL_MAX_ALLOWED_DLY_MASK; + ew32(TSYNCTXCTL, tsync_ctrl); + ret = 0; + for (i = 0; i < HW_WAIT_COUNT; ++i) { + udelay(2); + tsync_ctrl = er32(TSYNCTXCTL); + if (tsync_ctrl & E1000_TSYNCTXCTL_SYNC_COMP) + break; + } + + if (i == HW_WAIT_COUNT) { + ret = -ETIMEDOUT; + } else if (ret == 0) { + cts->system_ts = er32(PLTSTMPH); + cts->system_ts <<= 32; + cts->system_ts |= er32(PLTSTMPL); + cts->device_ts = er32(SYSSTMPH); + cts->device_ts <<= 32; + cts->device_ts |= er32(SYSSTMPL); + break; + } + } + + return ret; +} + +#define SYNCTIME_RETRY_COUNT (2) + +static int e1000e_phc_getsynctime(struct ptp_clock_info *ptp, + struct timespec64 *devts, + struct timespec64 *systs) +{ + struct e1000_adapter *adapter = container_of(ptp, struct e1000_adapter, +ptp_clock_info); + unsigned long flags; + u32 remainder; + struct correlated_ts art_correlated_ts; + u64 device_time; + int i, ret; + + if (!cpu_has_art) + return -EOPNOTSUPP; + + for (i = 0; i < SYNCTIME_RETRY_COUNT; ++i) { + art_correlated_ts.get_ts = e1000e_phc_get_ts; + art_correlated_ts.private = adapter; + ret = get_correlated_timestamp(&art_correlated_ts, + &art_timestamper); + if (ret != 0) + continue; + + systs->tv_sec = + div_u64_rem(art_correlated_ts.system_real.tv64, + NSEC_PER_SEC, &remainder); + systs->tv_nsec = remainder; + spin_lock_irqsave(&adapter->systim_lock, flags); + device_time = timecounter_cyc2time(&adapter->tc, + art_correlated_ts.device_ts); + spin_unlock_irqrestore(&adapter->systim_lock, flags); + devts->tv_sec = + div_u64_rem(device_time, NSEC_PER_SEC, &remainder); + devts->tv_nsec = remainder; + break; + } + + return ret; +} + /** * e1000e_phc_gettime - Reads the current time from the hardware clock * @ptp: ptp clock structure @@ -190,6 +277,7 @@ static const struct ptp_clock_info e1000e_ptp_clock_info = { .adjfreq= e1000e_phc_adjfreq,
[PATCH v2 1/4] Add generic correlated clocksource code and ART to TSC conversion code
Original patch description: Subject: ptp: Get sync timestamps From: Thomas Gleixner Date: Wed, 29 Jul 2015 10:52:06 +0200 The ART stuff wants to be splitted out. Changes === Add struct correlated_cs (clocksource) with pointer to original clocksource and function pointer to convert correlated clocksource to the original Add struct correlated_ts (timestamp) with function pointer to read correlated clocksource, device and system (in terms of correlated clocksource) counter values (input) with resulting converted real and monotonic raw system times (output) Add get_correlated_timestamp() function which given specific correlated_cs and correlated_ts convert correlated counter value to system time Add art_to_tsc conversion function translated Always Running Timer (ART) to TSC value --- arch/x86/kernel/tsc.c | 31 ++ include/linux/clocksource.h | 30 + include/linux/timekeeping.h | 4 +++ kernel/time/timekeeping.c | 63 + 4 files changed, 128 insertions(+) diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index 7437b41..a90aa6a 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -1059,6 +1059,27 @@ int unsynchronized_tsc(void) return 0; } +static u32 tsc_numerator; +static u32 tsc_denominator; +/* + * CHECKME: Do we need the adjust value? It should be 0, but if we run + * in a VM this might be a different story. + */ +static u64 tsc_adjust; + +static u64 art_to_tsc(u64 cycles) +{ + u64 tmp, res = tsc_adjust; + + res += (cycles / tsc_denominator) * tsc_numerator; + tmp = (cycles % tsc_denominator) * tsc_numerator; + res += tmp / tsc_denominator; + return res; +} + +struct correlated_cs art_timestamper = { + .convert= art_to_tsc, +}; static void tsc_refine_calibration_work(struct work_struct *work); static DECLARE_DELAYED_WORK(tsc_irqwork, tsc_refine_calibration_work); @@ -1129,6 +1150,16 @@ static void tsc_refine_calibration_work(struct work_struct *work) (unsigned long)tsc_khz / 1000, (unsigned long)tsc_khz % 1000); + /* +* TODO: +* +* If the system has ART, initialize the art_to_tsc conversion +* and set: art_timestamp.related_cs = &tsc_clocksource. +* +* Before that point a call to get_correlated_timestamp will +* fail the clocksource match check. +*/ + out: clocksource_register_khz(&clocksource_tsc, tsc_khz); } diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h index 278dd27..2ed3d0c 100644 --- a/include/linux/clocksource.h +++ b/include/linux/clocksource.h @@ -258,4 +258,34 @@ void acpi_generic_timer_init(void); static inline void acpi_generic_timer_init(void) { } #endif +/** + * struct correlated_cs - Descriptor for a clocksource correlated to another clocksource + * @related_cs:Pointer to the related timekeeping clocksource + * @convert: Conversion function to convert a timestamp from + * the correlated clocksource to cycles of the related + * timekeeping clocksource + */ +struct correlated_cs { + struct clocksource *related_cs; + u64 (*convert)(u64 cycles); +}; + +struct correlated_ts; + +/** + * struct correlated_ts - Descriptor for taking a correlated time stamp + * @get_ts:Function to read out a synced system and device + * timestamp + * @system_ts: The raw system clock timestamp + * @device_ts: The raw device timestamp + * @system_real: @system_ts converted to CLOCK_REALTIME + * @system_raw:@system_ts converted to CLOCK_MONOTONIC_RAW + */ +struct correlated_ts { + int (*get_ts)(struct correlated_ts *ts); + u64 system_ts; + u64 device_ts; + u64 system_real; + u64 system_raw; +}; #endif /* _LINUX_CLOCKSOURCE_H */ diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h index 6e191e4..a9e1a2d 100644 --- a/include/linux/timekeeping.h +++ b/include/linux/timekeeping.h @@ -258,6 +258,10 @@ extern void timekeeping_inject_sleeptime64(struct timespec64 *delta); */ extern void getnstime_raw_and_real(struct timespec *ts_raw, struct timespec *ts_real); +struct correlated_ts; +struct correlated_cs; +extern int get_correlated_timestamp(struct correlated_ts *crt, + struct correlated_cs *crs); /* * Persistent clock related interfaces diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index bca3667..769a04b 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -312,6 +312,19 @@ static inline s64 timekeeping_get_ns(st
[PATCH v2 2/4] Add ART initialization code
add private struct correlated_ts member used by get_ts() code added EXPORTs making get_correlated_timestamp() function and art_timestamper accessible Add special case for denominator of 2 (art_to_tsc()) --- arch/x86/include/asm/cpufeature.h | 3 ++- arch/x86/include/asm/tsc.h| 1 + arch/x86/kernel/tsc.c | 42 ++- include/linux/clocksource.h | 8 +--- kernel/time/timekeeping.c | 1 + 5 files changed, 37 insertions(+), 18 deletions(-) diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index 3d6606f..a9322e5 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -85,7 +85,7 @@ #define X86_FEATURE_P4 ( 3*32+ 7) /* "" P4 */ #define X86_FEATURE_CONSTANT_TSC ( 3*32+ 8) /* TSC ticks at a constant rate */ #define X86_FEATURE_UP ( 3*32+ 9) /* smp kernel running on up */ -/* free, was #define X86_FEATURE_FXSAVE_LEAK ( 3*32+10) * "" FXSAVE leaks FOP/FIP/FOP */ +#define X86_FEATURE_ART(3*32+10) /* Platform has always running timer (ART) */ #define X86_FEATURE_ARCH_PERFMON ( 3*32+11) /* Intel Architectural PerfMon */ #define X86_FEATURE_PEBS ( 3*32+12) /* Precise-Event Based Sampling */ #define X86_FEATURE_BTS( 3*32+13) /* Branch Trace Store */ @@ -352,6 +352,7 @@ extern const char * const x86_bug_flags[NBUGINTS*32]; #define cpu_has_de boot_cpu_has(X86_FEATURE_DE) #define cpu_has_pseboot_cpu_has(X86_FEATURE_PSE) #define cpu_has_tscboot_cpu_has(X86_FEATURE_TSC) +#define cpu_has_artboot_cpu_has(X86_FEATURE_ART) #define cpu_has_pgeboot_cpu_has(X86_FEATURE_PGE) #define cpu_has_apic boot_cpu_has(X86_FEATURE_APIC) #define cpu_has_sepboot_cpu_has(X86_FEATURE_SEP) diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h index 94605c0..0089991 100644 --- a/arch/x86/include/asm/tsc.h +++ b/arch/x86/include/asm/tsc.h @@ -53,6 +53,7 @@ extern int check_tsc_disabled(void); extern unsigned long native_calibrate_tsc(void); extern int tsc_clocksource_reliable; +extern struct correlated_cs art_timestamper; /* * Boot-time check whether the TSCs are synchronized across diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index a90aa6a..0a2f336 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -1059,6 +1059,9 @@ int unsynchronized_tsc(void) return 0; } +#define ART_CPUID_LEAF (0x15) +#define ART_MIN_DENOMINATOR (2) + static u32 tsc_numerator; static u32 tsc_denominator; /* @@ -1067,19 +1070,29 @@ static u32 tsc_denominator; */ static u64 tsc_adjust; -static u64 art_to_tsc(u64 cycles) +static u64 art_to_tsc(struct correlated_cs *cs, u64 cycles) { u64 tmp, res = tsc_adjust; - res += (cycles / tsc_denominator) * tsc_numerator; - tmp = (cycles % tsc_denominator) * tsc_numerator; - res += tmp / tsc_denominator; + switch (tsc_denominator) { + default: + res += (cycles / tsc_denominator) * tsc_numerator; + tmp = (cycles % tsc_denominator) * tsc_numerator; + res += tmp / tsc_denominator; + break; + case 2: + res += (cycles >> 1) * tsc_numerator; + tmp = (cycles & 0x1) * tsc_numerator; + res += tmp >> 1; + break; + } return res; } struct correlated_cs art_timestamper = { .convert= art_to_tsc, }; +EXPORT_SYMBOL(art_timestamper); static void tsc_refine_calibration_work(struct work_struct *work); static DECLARE_DELAYED_WORK(tsc_irqwork, tsc_refine_calibration_work); @@ -1103,6 +1116,7 @@ static void tsc_refine_calibration_work(struct work_struct *work) static int hpet; u64 tsc_stop, ref_stop, delta; unsigned long freq; + unsigned int unused[2]; /* Don't bother refining TSC on unstable systems */ if (check_tsc_unstable()) @@ -1150,17 +1164,17 @@ static void tsc_refine_calibration_work(struct work_struct *work) (unsigned long)tsc_khz / 1000, (unsigned long)tsc_khz % 1000); - /* -* TODO: -* -* If the system has ART, initialize the art_to_tsc conversion -* and set: art_timestamp.related_cs = &tsc_clocksource. -* -* Before that point a call to get_correlated_timestamp will -* fail the clocksource match check. -*/ - out: + if (boot_cpu_data.cpuid_level >= ART_CPUID_LEAF) { + cpuid(ART_CPUID_LEAF, &tsc_denominator, &tsc_numerator, unused, + unused+1); + + if (tsc_denominator >= ART_MIN_DENOMINATOR) { + set_cpu_cap(&boot_cpu_data, X86_FEATURE_ART); + art_timestamper.related_cs = &clocksource_tsc; + } + } + clocksource_register
[PATCH v2 0/4] Patchset enabling hardware based cross-timestamps for next gen Intel platforms
6th generation Intel platforms will have an Always Running Timer (ART) that always runs when the system is powered and is available to both the CPU and various on-board devices. Initially, those devices include audio and network. The ART will give these devices the capability of precisely cross timestamping their local device clock with the system clock. A system clock value like TSC or ART is not useful unless translated to system time. The first *two* patches enable this by changing the timekeeping code to return a system time given a system clock value and translating ART to TSC. The last two patches modify the PTP driver to call a cross timestamp function in the driver when available and perform the cross timestamp in the e1000e driver. Given the precise relationship between the network device clock and system time enables better synchronization of events on multiple network connected devices. Changelog: * The PTP portion of the patch set was posted 7/8/2015 (v3) and rejected because of there wasn't a driver that implemented the new API. Now, the driver patch is added and the PTP patch operation is modified to revert to previous behavior when cross timestamp can't be completed. This is indicated by the driver returning a non-zero value. * v2 re-submit based on tglx provided correlated clocksource patch. This has been included verbatim as the first patch in the series. Additions and modifications are in the second patch. The PTP driver patch is unchanged and the e1000e driver patch uses the *new* correlated clocksource interface but is otherwise (in terms of hardware and PTP driver) unchanged. * ART is removed as a compile option * ART is added as an X86_FEATURE Christopher Hall (4): Add generic correlated clocksource code and ART to TSC conversion code Add ART initialization code Add support for driver cross-timestamp to PTP_SYS_OFFSET ioctl Added getsynctime64() callback Documentation/ptp/testptp.c | 6 +- arch/x86/include/asm/cpufeature.h | 3 +- arch/x86/include/asm/tsc.h | 1 + arch/x86/kernel/tsc.c | 45 +++ drivers/net/ethernet/intel/e1000e/defines.h | 7 +++ drivers/net/ethernet/intel/e1000e/ptp.c | 88 + drivers/net/ethernet/intel/e1000e/regs.h| 4 ++ drivers/ptp/ptp_chardev.c | 29 +++--- include/linux/clocksource.h | 32 +++ include/linux/ptp_clock_kernel.h| 7 +++ include/linux/timekeeping.h | 4 ++ include/uapi/linux/ptp_clock.h | 4 +- kernel/time/timekeeping.c | 64 + 13 files changed, 282 insertions(+), 12 deletions(-) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 3/4] Add support for driver cross-timestamp to PTP_SYS_OFFSET ioctl
This patch allows system and device time ("cross-timestamp") to be performed by the driver. Currently, the cross-timestamping is performed in the PTP_SYS_OFFSET ioctl. The PTP clock driver reads gettimeofday() and the gettime64() callback provided by the driver. The cross-timestamp is best effort where the latency between the capture of system time (getnstimeofday()) and the device time (driver callback) may be significant. This patch adds an additional callback getsynctime64(). Which will be called when the driver is able to perform a more accurate, implementation specific cross-timestamping. For example, future network devices that implement PCIE PTM will be able to precisely correlate the device clock with the system clock with virtually zero latency between captures. This added callback can be used by the driver to expose this functionality. The callback, getsynctime64(), will only be called when defined and n_samples == 1 because the driver returns only 1 cross-timestamp where multiple samples cannot be chained together. This patch also adds to the capabilities ioctl (PTP_CLOCK_GETCAPS), allowing applications to query whether or not drivers implement the getsynctime callback, providing more precise cross timestamping. Commit Details: Added additional callback to ptp_clock_info: * getsynctime64() This takes 2 arguments referring to system and device time With this callback drivers may provide both system time and device time to ensure precise correlation Modified PTP_SYS_OFFSET ioctl in PTP clock driver to use the above callback if it's available Added capability (PTP_CLOCK_GETCAPS) for checking whether driver supports cross timestamping Added check for cross timestamping flag to testptp.c --- Documentation/ptp/testptp.c | 6 -- drivers/ptp/ptp_chardev.c| 29 + include/linux/ptp_clock_kernel.h | 7 +++ include/uapi/linux/ptp_clock.h | 4 +++- 4 files changed, 35 insertions(+), 11 deletions(-) diff --git a/Documentation/ptp/testptp.c b/Documentation/ptp/testptp.c index 2bc8abc..8004efd 100644 --- a/Documentation/ptp/testptp.c +++ b/Documentation/ptp/testptp.c @@ -276,13 +276,15 @@ int main(int argc, char *argv[]) " %d external time stamp channels\n" " %d programmable periodic signals\n" " %d pulse per second\n" - " %d programmable pins\n", + " %d programmable pins\n" + " %d cross timestamping\n", caps.max_adj, caps.n_alarm, caps.n_ext_ts, caps.n_per_out, caps.pps, - caps.n_pins); + caps.n_pins, + caps.cross_timestamping); } } diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c index da7bae9..392ccfa 100644 --- a/drivers/ptp/ptp_chardev.c +++ b/drivers/ptp/ptp_chardev.c @@ -124,7 +124,7 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg) struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock); struct ptp_clock_info *ops = ptp->info; struct ptp_clock_time *pct; - struct timespec64 ts; + struct timespec64 ts, systs; int enable, err = 0; unsigned int i, pin_index; @@ -138,6 +138,7 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg) caps.n_per_out = ptp->info->n_per_out; caps.pps = ptp->info->pps; caps.n_pins = ptp->info->n_pins; + caps.cross_timestamping = ptp->info->getsynctime64 != NULL; if (copy_to_user((void __user *)arg, &caps, sizeof(caps))) err = -EFAULT; break; @@ -196,19 +197,31 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg) break; } pct = &sysoff->ts[0]; - for (i = 0; i < sysoff->n_samples; i++) { - getnstimeofday64(&ts); + if (ptp->info->getsynctime64 && sysoff->n_samples == 1 && + ptp->info->getsynctime64(ptp->info, &ts, &systs) == 0) { + pct->sec = systs.tv_sec; + pct->nsec = systs.tv_nsec; + pct++; pct->sec = ts.tv_sec; pct->nsec = ts.tv_nsec; pct++; - ptp->info->gettime64(ptp->info, &ts); + pct->sec = systs.tv_sec; + pct->nsec = systs.tv_nsec; + } else { + for (i = 0; i < sysoff->n_samples; i++) { + getnstimeofda
Re: [PATCH net-next] net: Fix race condition in store_rps_map
From: Tom Herbert Date: Wed, 5 Aug 2015 09:39:27 -0700 > There is a race condition in store_rps_map that allows jump label > count in rps_needed to go below zero. This can happen when > concurrently attempting to set and a clear map. > > Scenario: > > 1. rps_needed count is zero > 2. New map is assigned by setting thread, but rps_needed count _not_ yet >incremented (rps_needed count still zero) > 2. Map is cleared by second thread, old_map set to that just assigned > 3. Second thread performs static_key_slow_dec, rps_needed count now goes >negative > > Fix is to increment or decrement rps_needed under the spinlock. > > Signed-off-by: Tom Herbert Applied, thanks Tom. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net/mlx5_core: Set log_uar_page_sz for non 4K page size architecture
From: Amir Vadai Date: Thu, 6 Aug 2015 14:01:11 +0300 > On 8/5/2015 7:05 PM, cls...@linux.vnet.ibm.com wrote: >> From: Carol L Soto >> >> failed to configure the page size for architectures with page size >> different than 4K. >> >> Signed-off-by: Carol L Soto >> --- > > Please pull this patch into kernel 4.2 > > Fixes: 938fe83 ("net/mlx5_core: New device capabilities handling") > Acked-by: Amir Vadai Applied to 'net', thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pull request [net]: batman-adv fixes 20150805
From: Antonio Quartulli Date: Wed, 5 Aug 2015 14:51:43 +0200 > git://git.open-mesh.org/linux-merge.git tags/batman-adv-fix-for-davem Pulled and queued up for -stable, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v2 8/9] dpaa_eth: add debugfs entries
From: Madalin Bucur Date: Wed, 5 Aug 2015 18:41:28 +0300 > Export per CPU counters through debugfs. > > Signed-off-by: Madalin Bucur This is absolutely inappropriate. You can export these just fine via ethtool statistics. There is zero reason to add ugly debugfs crap for something like this. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] r8169:Issues on alloc memory
From: Corcodel Marian Date: Wed, 5 Aug 2015 18:41:17 +0300 > +#define R8169_TX_RING_BYTES (NUM_ARRAYS_MAX * sizeof(struct TxDesc)) /* > here sizeof not reporting correct */ > +#define R8169_RX_RING_BYTES (NUM_ARRAYS_MAX * sizeof(struct RxDesc)) /* > here sizeof not reporting correct */ This comment makes the code more confusing rather than easier to understand. In fact I fail to see the reason for this patch, as a whole, at all. I hate to do this, but I am explitly letting you know that I am not going to invest any more time reviewing any submissions you make. They are all poorly formed, lack proper complete explanations, are buggy, or add no value at all to the driver. And the situation is not improving. This has been going on for several weeks now, and I have to draw the line somewhere. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v4, 0/9] Freescale DPAA FMan
From: Date: Wed, 5 Aug 2015 12:25:16 +0300 > The Freescale Data Path Acceleration Architecture (DPAA) > is a set of hardware components on specific QorIQ multicore processors. > This architecture provides the infrastructure to support simplified > sharing of networking interfaces and accelerators by multiple CPU cores > and the accelerators. I think the directory and code structure of this new driver is quite excessive. Because you've split things up _so_ much, you have to have all of these directories, and even worse and much more important to me you have to export so many functions from one source file to another. I think this is way too much. For example, in one file you have a bunch of initialization routines. init_a(), init_b(), init_c(), and you export them all. Then they are always called in sequence: init_a(); init_b(); init_c(); This is completely pointless. You just needed to export one function which calls all three functions. The namespace pollution of this driver is out of control. You really need to completely rework the architecture and layout of this driver before I will even begin to review it again. And the lack of review interest by other developers should be an indication to you how undesirable this code submission is to read. Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/9] openvswitch: Scrub packet in ovs_vport_receive()
On Tue, Aug 4, 2015 at 9:40 PM, Joe Stringer wrote: > On 1 August 2015 at 12:17, Thomas Graf wrote: >> On 07/31/15 at 10:51am, Joe Stringer wrote: >>> On 31 July 2015 at 07:34, Hannes Frederic Sowa wrote: >>> > In general, this shouldn't be necessary as the packet should already be >>> > scrubbed before they arrive here. >>> > >>> > Could you maybe add a WARN_ON and check how those skbs with conntrack >>> > data traverse the stack? I also didn't understand why make it dependent >>> > on the socket. >>> >>> OK, sure. One case I could think of is with an OVS internal port in >>> another namespace, directly attached to the bridge. I'll have a play >>> around with WARN_ON and see if I can come up with something more >>> trimmed down. >> >> The OVS internal port will definitely pass through an unscrubbed >> packet across namespaces. I think the proper thing to do would be >> to scrub but conditionally keep metadata. > > It's only "unscrubbed" when receiving from local stack at the moment. > Some pieces are cleared when handing towards the local stack, and > there's no configuration for that behaviour. Presumably internal port > transmit and receive should mirror each other? > > I don't have a specific use case either way. The remaining code for > this series handles this case correctly, it's just a matter of what > behaviour we're looking for. We could implement the flag as you say, I > presume that userspace would need to specify this during vport > creation and the default should work similar to the existing behaviour > (ie, keep metadata). One thing that's not entirely clear to me is > exactly which metadata should be represented by this flag and whether > the single flag is expressive enough. I would prefer not to have a flag as it seems unnecessarily complicated (doubly so if we try to have multiple flags to express different combinations). The use case for moving internal ports to different namespaces is pretty narrow, so it seems like we can just pick a set of metadata to keep and go with that. Mark seems the primary one to me. I also think that it would be better to use skb->dev to determine the original namespace rather than the socket. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] device property: helper macros for property entry creation
On Thursday, August 06, 2015 10:48:48 AM Heikki Krogerus wrote: > On Wed, Aug 05, 2015 at 05:02:18PM +0300, Andy Shevchenko wrote: > > On Wed, 2015-08-05 at 16:39 +0300, Heikki Krogerus wrote: > > > Marcos for easier creation of build-in property entries. > > > > > > Signed-off-by: Heikki Krogerus > > > --- > > > include/linux/property.h | 35 +++ > > > 1 file changed, 35 insertions(+) > > > > > > diff --git a/include/linux/property.h b/include/linux/property.h > > > index 76ebde9..204d899 100644 > > > --- a/include/linux/property.h > > > +++ b/include/linux/property.h > > > @@ -152,6 +152,41 @@ struct property_entry { > > > } value; > > > }; > > > > > > +#define PROP_ENTRY_U8(_name_, _val_) { \ > > > > PROP_ prefix is too generic. > > Maybe DEVPROP_ ? At least for the latter no records in the current > > sources. > > I disagree with that. IMO this kind of macros should ideally resemble > the structure name they are used to fill (struct property_entry in > this case). And there are already definitions for DEV_PROP_* to > describe the types, so using something like DEVPROP_* here is just > confusing. > > If PROP_ENTRY_* is really not good enough, we can change them > PROPERTY_ENTRY_*. But is PROP_ENTRY_* really so bad? > > Rafael, what is your opinion? I would prefer PROPERTY_ENTRY_ to be honest. It's not like we need to save characters here. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
tcp_update_metrics() fail fast before declaring variables
Hi folks, one short question regarding net.ipv4.tcp_no_metrics_save sysctl parameter. Code snippet is actually the following: void tcp_update_metrics(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); struct dst_entry *dst = __sk_dst_get(sk); if (sysctl_tcp_nometrics_save) return; Why this 'if' statement can't be moved before variables? I think fail fast could save some instructions in this place. Any thoughts? Here is the performance benchmarking results: Test #1 (sysctl -w net.ipv4.tcp_no_metrics_save=0) sum: 79726, avg: 1295ns min: 453ns, max: 16487ns Test #2 (sysctl -w net.ipv4.tcp_no_metrics_save=1) sum: 77515, avg: 1181ns min: 431ns, max: 15989ns Benchmarked with Systemtap: global c; probe kernel.function("tcp_update_metrics").return { c <<< (gettimeofday_ns() - @entry(gettimeofday_ns())); } probe timer.s(60) { printf("sum: %d, avg: %dns\n", @count(c), @avg(c)); printf("min: %dns, max: %dns\n", @min(c), @max(c)); delete c; exit(); } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v3] openvswitch: Make 100 percents packets sampled when sampling rate is 1.
From: Wenyu Zhang Date: Wed, 5 Aug 2015 00:30:47 -0700 > When sampling rate is 1, the sampling probability is UINT32_MAX. The packet > should be sampled even the prandom32() generate the number of UINT32_MAX. > And none packet need be sampled when the probability is 0. > > Signed-off-by: Wenyu Zhang Applied. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH net-next] tcp: reduce cpu usage under tcp memory pressure when SO_SNDBUF is set
From: Jason Baron When SO_SNDBUF is set and we are under tcp memory pressure, the effective write buffer space can be much lower than what was set using SO_SNDBUF. For example, we may have set the buffer to 100kb, but we may only be able to write 10kb. In this scenario poll()/select()/epoll(), are going to continuously return POLLOUT, followed by -EAGAIN from write() in a very tight loop. Introduce sk->sk_effective_sndbuf, such that we can track the 'effective' size of the sndbuf, when we have a short write due to memory pressure. By using the sk->sk_effective_sndbuf instead of the sk->sk_sndbuf when we are under memory pressure, we can delay the POLLOUT until 1/3 of the buffer clears as we normally do. There is no issue here when SO_SNDBUF is not set, since the tcp layer will auto tune the sk->sndbuf. In my testing, this brought a single threaad's cpu usage down from 100% to 1% while maintaining the same level of throughput when under memory pressure. Signed-off-by: Jason Baron --- include/net/sock.h | 12 net/core/sock.c| 1 + net/core/stream.c | 1 + net/ipv4/tcp.c | 10 +++--- 4 files changed, 21 insertions(+), 3 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index 43c6abc..ca49415 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -380,6 +380,7 @@ struct sock { atomic_tsk_wmem_alloc; atomic_tsk_omem_alloc; int sk_sndbuf; + int sk_effective_sndbuf; struct sk_buff_head sk_write_queue; kmemcheck_bitfield_begin(flags); unsigned intsk_shutdown : 2, @@ -779,6 +780,14 @@ static inline bool sk_acceptq_is_full(const struct sock *sk) return sk->sk_ack_backlog > sk->sk_max_ack_backlog; } +static inline void sk_set_effective_sndbuf(struct sock *sk) +{ + if (sk->sk_wmem_queued > sk->sk_sndbuf) + sk->sk_effective_sndbuf = sk->sk_sndbuf; + else + sk->sk_effective_sndbuf = sk->sk_wmem_queued; +} + /* * Compute minimal free write space needed to queue new packets. */ @@ -789,6 +798,9 @@ static inline int sk_stream_min_wspace(const struct sock *sk) static inline int sk_stream_wspace(const struct sock *sk) { + if (sk->sk_effective_sndbuf) + return sk->sk_effective_sndbuf - sk->sk_wmem_queued; + return sk->sk_sndbuf - sk->sk_wmem_queued; } diff --git a/net/core/sock.c b/net/core/sock.c index 193901d..4fce879 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2309,6 +2309,7 @@ void sock_init_data(struct socket *sock, struct sock *sk) sk->sk_allocation = GFP_KERNEL; sk->sk_rcvbuf = sysctl_rmem_default; sk->sk_sndbuf = sysctl_wmem_default; + sk->sk_effective_sndbuf = 0; sk->sk_state= TCP_CLOSE; sk_set_socket(sk, sock); diff --git a/net/core/stream.c b/net/core/stream.c index d70f77a..7c175e7 100644 --- a/net/core/stream.c +++ b/net/core/stream.c @@ -32,6 +32,7 @@ void sk_stream_write_space(struct sock *sk) if (sk_stream_is_writeable(sk) && sock) { clear_bit(SOCK_NOSPACE, &sock->flags); + sk->sk_effective_sndbuf = 0; rcu_read_lock(); wq = rcu_dereference(sk->sk_wq); diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 45534a5..9e7f0a5 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -845,6 +845,7 @@ struct sk_buff *sk_stream_alloc_skb(struct sock *sk, int size, gfp_t gfp, sk->sk_prot->enter_memory_pressure(sk); sk_stream_moderate_sndbuf(sk); } + sk_set_effective_sndbuf(sk); return NULL; } @@ -939,9 +940,10 @@ new_segment: tcp_mark_push(tp, skb); goto new_segment; } - if (!sk_wmem_schedule(sk, copy)) + if (!sk_wmem_schedule(sk, copy)) { + sk_set_effective_sndbuf(sk); goto wait_for_memory; - + } if (can_coalesce) { skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy); } else { @@ -1214,8 +1216,10 @@ new_segment: copy = min_t(int, copy, pfrag->size - pfrag->offset); - if (!sk_wmem_schedule(sk, copy)) + if (!sk_wmem_schedule(sk, copy)) { + sk_set_effective_sndbuf(sk); goto wait_for_memory; + } err = skb_copy_to_page_nocache(sk, &msg->msg_iter, skb, pfrag->page, -- 1.8.2.rc2 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majo
Re: [PATCH 0/3] be2net: patch set
From: Sathya Perla Date: Wed, 5 Aug 2015 03:27:47 -0400 > This patch set contains 2 driver fixes to a Lancer HW issue and a fix > to a double free bug. Pls apply to the "net" tree. Thanks! > > Patch 1 now enables filters only after creating RXQs. This is done as > HW issues were observed on Lancer adapters if filters > (flags, mac addrs etc) are enabled *before* creating RXQs. This patch > changes the driver design by enabling filters in be_open() -- > instead of be_setup() -- after RXQs are created and buffers posted. > > Patch 2 fixes an RX stall issue that was seen on Lancer adapters when > RXQs are destroyed while they are in an "out of buffer" state. > This patch fixes this issue by posting 64 buffers to each RXQ before > destroying them in the close path. This is done after ensuring that no > more new packets are selected for transfer to the RXQs by disabling > interface filters. > > Patch 3 protects eqo->affinity_mask variable from being freed twice and > resulting in a crash. It's now freed only when EQs haven't yet been > destroyed. Series applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] vxlan: combine VXLAN_FLOWBASED into VXLAN_COLLECT_METADATA
From: Alexei Starovoitov Date: Tue, 4 Aug 2015 22:51:07 -0700 > IFLA_VXLAN_FLOWBASED is useless without IFLA_VXLAN_COLLECT_METADATA, > so combine them into single IFLA_VXLAN_COLLECT_METADATA flag. > 'flowbased' doesn't convey real meaning of the vxlan tunnel mode. > This mode can be used by routing, tc+bpf and ovs. > Only ovs is strictly flow based, so 'collect metadata' is a better > name for this tunnel mode. > > Signed-off-by: Alexei Starovoitov > --- > it's not too late to kill uapi IFLA_VXLAN_FLOWBASED, since it > was introduced two weeks ago. > Alternatively we could rename both into vxlan_accept_metadata > or something else, but imo collect_metadata is ok. Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH net] bna: fix interrupts storm caused by erroneous packets
> From: Ivan Vecera [mailto:ivec...@redhat.com] > Sent: Thursday, August 06, 2015 1:48 PM > > The commit "e29aa33 bna: Enable Multi Buffer RX" moved packets counter > increment from the beginning of the NAPI processing loop after the check > for erroneous packets so they are never accounted. This counter is used to > inform firmware about number of processed completions (packets). > As these packets are never acked the firmware fires IRQs for them again and > again. > > Fixes: e29aa33 bna: Enable Multi Buffer RX > Signed-off-by: Ivan Vecera Acked-by: Rasesh Mody Thanks! Rasesh -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 net-next 0/2] RDS-TCP: Network namespace support
From: Sowmini Varadhan Date: Wed, 5 Aug 2015 01:43:24 -0400 > This patch series contains the set of changes to correctly set up > the infra for PF_RDS sockets that use TCP as the transport in multiple > network namespaces. > > Patch 1 in the series is the minimal set of changes to allow > a single instance of RDS-TCP to run in any (i.e init_net or other) net > namespace. The changes in this patch set ensure that the execution of > 'modprobe [-r] rds_tcp' sets up the kernel TCP sockets > relative to the current netns, so that RDS applications can send/recv > packets from that netns, and the netns can later be deleted cleanly. > > Patch 2 of the series further allows multiple RDS-TCP instances, > one per network namespace. The changes in this patch allows dynamic > creation/tear-down of RDS-TCP client and server sockets across all > current and future namespaces. > > v2 changes from RFC sent out earlier: > David Ahern comments in patch 1, net_device notifier in patch 2, > patch 3 broken off and submitted separately. > v3: Cong Wang review comments. Series applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] net, thunder, bgx: Add support for ACPI binding.
On 08/07/2015 07:54 AM, Graeme Gregory wrote: On Thu, Aug 06, 2015 at 05:33:10PM -0700, David Daney wrote: From: David Daney Find out which PHYs belong to which BGX instance in the ACPI way. Set the MAC address of the device as provided by ACPI tables. This is similar to the implementation for devicetree in of_get_mac_address(). The table is searched for the device property entries "mac-address", "local-mac-address" and "address" in that order. The address is provided in a u64 variable and must contain a valid 6 bytes-len mac addr. Based on code from: Narinder Dhillon Tomasz Nowicki Robert Richter Signed-off-by: Tomasz Nowicki Signed-off-by: Robert Richter Signed-off-by: David Daney --- drivers/net/ethernet/cavium/thunder/thunder_bgx.c | 137 +- 1 file changed, 135 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c index 615b2af..2056583 100644 --- a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c +++ b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c [...] + +static int acpi_get_mac_address(struct acpi_device *adev, u8 *dst) +{ + const union acpi_object *prop; + u64 mac_val; + u8 mac[ETH_ALEN]; + int i, j; + int ret; + + for (i = 0; i < ARRAY_SIZE(addr_propnames); i++) { + ret = acpi_dev_get_property(adev, addr_propnames[i], + ACPI_TYPE_INTEGER, &prop); Shouldn't this be trying to use device_property_read_* API and making the DT/ACPI path the same where possible? Ideally, something like you suggest would be possible. However, there are a couple of problems trying to do it in the kernel as it exists today: 1) There is no 'struct device *' here, so device_property_read_* is not applicable. 2) There is no standard ACPI binding for MAC addresses, so it is impossible to create a hypothetical fw_get_mac_address(), which would be analogous to of_get_mac_address(). Other e-mail threads have suggested that the path to an elegant solution is to inter-mix a bunch of calls to acpi_dev_get_property*() and fwnode_property_read*() as to use these more generic fwnode_property_read*() functions whereever possible. I rejected this approach as it seems cleaner to me to consistently use a single set of APIs. Thanks, David Daney Graeme + if (ret) + continue; + + mac_val = prop->integer.value; + + if (mac_val & (~0ULL << 48)) + continue; /* more than 6 bytes */ + + for (j = 0; j < ARRAY_SIZE(mac); j++) + mac[j] = (u8)(mac_val >> (8 * j)); + if (!is_valid_ether_addr(mac)) + continue; + + memcpy(dst, mac, ETH_ALEN); + + return 0; + } + + return ret ? ret : -EINVAL; +} + +static acpi_status bgx_acpi_register_phy(acpi_handle handle, +u32 lvl, void *context, void **rv) +{ + struct acpi_reference_args args; + const union acpi_object *prop; + struct bgx *bgx = context; + struct acpi_device *adev; + struct device *phy_dev; + u32 phy_id; + + if (acpi_bus_get_device(handle, &adev)) + goto out; + + SET_NETDEV_DEV(&bgx->lmac[bgx->lmac_count].netdev, &bgx->pdev->dev); + + acpi_get_mac_address(adev, bgx->lmac[bgx->lmac_count].mac); + + bgx->lmac[bgx->lmac_count].lmacid = bgx->lmac_count; + + if (acpi_dev_get_property_reference(adev, "phy-handle", 0, &args)) + goto out; + + if (acpi_dev_get_property(args.adev, "phy-channel", ACPI_TYPE_INTEGER, &prop)) + goto out; + + phy_id = prop->integer.value; + + phy_dev = bus_find_device(&mdio_bus_type, NULL, (void *)&phy_id, + bgx_match_phy_id); + if (!phy_dev) + goto out; + + bgx->lmac[bgx->lmac_count].phydev = to_phy_device(phy_dev); +out: + bgx->lmac_count++; + return AE_OK; +} + +static acpi_status bgx_acpi_match_id(acpi_handle handle, u32 lvl, +void *context, void **ret_val) +{ + struct acpi_buffer string = { ACPI_ALLOCATE_BUFFER, NULL }; + struct bgx *bgx = context; + char bgx_sel[5]; + + snprintf(bgx_sel, 5, "BGX%d", bgx->bgx_id); + if (ACPI_FAILURE(acpi_get_name(handle, ACPI_SINGLE_NAME, &string))) { + pr_warn("Invalid link device\n"); + return AE_OK; + } + + if (strncmp(string.pointer, bgx_sel, 4)) + return AE_OK; + + acpi_walk_namespace(ACPI_TYPE_DEVICE, handle, 1, + bgx_acpi_register_phy, NULL, bgx, NULL); + + kfree(string.pointer); + return AE_CTRL_TERMINATE; +} + +static int bgx_init_acpi_phy(struct b
[PATCH net-next] net: add explicit logging and stat for neighbour table overflow
From: Rick Jones Add an explicit neighbour table overflow message (ratelimited) and statistic to make diagnosing neighbour table overflows tractable in the wild. Diagnosing a neighbour table overflow can be quite difficult in the wild because there is no explicit dmesg logged. Callers to neighbour code seem to use net_dbg_ratelimit when the neighbour call fails which means the "base message" is not emitted and the callback suppressed messages from the ratelimiting can end-up juxtaposed with unrelated messages. Further, a forced garbage collection will increment a stat on each call whether it was successful in freeing-up a table entry or not, so that statistic is only a hint. So, add a net_info_ratelimited message and explicit statistic to the neighbour code. Signed-off-by: Rick Jones --- Tested by cutting back the gc_threshN sysctls and attempting to ping a number of target IPs greater than the maximum size of the ARP table. diff --git a/include/net/neighbour.h b/include/net/neighbour.h index bd33e66..8b68384 100644 --- a/include/net/neighbour.h +++ b/include/net/neighbour.h @@ -125,6 +125,7 @@ struct neigh_statistics { unsigned long forced_gc_runs; /* number of forced GC runs */ unsigned long unres_discards; /* number of unresolved drops */ + unsigned long table_fulls; /* times even gc couldn't help */ }; #define NEIGH_CACHE_STAT_INC(tbl, field) this_cpu_inc((tbl)->stats->field) diff --git a/include/uapi/linux/neighbour.h b/include/uapi/linux/neighbour.h index 2e35c61..788655b 100644 --- a/include/uapi/linux/neighbour.h +++ b/include/uapi/linux/neighbour.h @@ -106,6 +106,7 @@ struct ndt_stats { __u64 ndts_rcv_probes_ucast; __u64 ndts_periodic_gc_runs; __u64 ndts_forced_gc_runs; + __u64 ndts_table_fulls; }; enum { diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 84195da..2b515ba 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -274,8 +274,12 @@ static struct neighbour *neigh_alloc(struct neigh_table *tbl, struct net_device (entries >= tbl->gc_thresh2 && time_after(now, tbl->last_flush + 5 * HZ))) { if (!neigh_forced_gc(tbl) && - entries >= tbl->gc_thresh3) + entries >= tbl->gc_thresh3) { + net_info_ratelimited("%s: neighbor table overflow!\n", +tbl->id); + NEIGH_CACHE_STAT_INC(tbl, table_fulls); goto out_entries; + } } n = kzalloc(tbl->entry_size + dev->neigh_priv_len, GFP_ATOMIC); @@ -1849,6 +1853,7 @@ static int neightbl_fill_info(struct sk_buff *skb, struct neigh_table *tbl, ndst.ndts_rcv_probes_ucast += st->rcv_probes_ucast; ndst.ndts_periodic_gc_runs += st->periodic_gc_runs; ndst.ndts_forced_gc_runs+= st->forced_gc_runs; + ndst.ndts_table_fulls += st->table_fulls; } if (nla_put(skb, NDTA_STATS, sizeof(ndst), &ndst)) @@ -2717,12 +2722,12 @@ static int neigh_stat_seq_show(struct seq_file *seq, void *v) struct neigh_statistics *st = v; if (v == SEQ_START_TOKEN) { - seq_printf(seq, "entries allocs destroys hash_grows lookups hits res_failed rcv_probes_mcast rcv_probes_ucast periodic_gc_runs forced_gc_runs unresolved_discards\n"); + seq_printf(seq, "entries allocs destroys hash_grows lookups hits res_failed rcv_probes_mcast rcv_probes_ucast periodic_gc_runs forced_gc_runs unresolved_discards table_fulls\n"); return 0; } seq_printf(seq, "%08x %08lx %08lx %08lx %08lx %08lx %08lx " - "%08lx %08lx %08lx %08lx %08lx\n", + "%08lx %08lx %08lx %08lx %08lx %08lx\n", atomic_read(&tbl->entries), st->allocs, @@ -2739,7 +2744,8 @@ static int neigh_stat_seq_show(struct seq_file *seq, void *v) st->periodic_gc_runs, st->forced_gc_runs, - st->unres_discards + st->unres_discards, + st->table_fulls ); return 0; -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] net, thunder, bgx: Add support for ACPI binding.
[Correcting the devicetree list address, which I typo'd in my original reply] [resending to _really_ correct the address, apologies for the spam] > >> +static const char * const addr_propnames[] = { > >> + "mac-address", > >> + "local-mac-address", > >> + "address", > >> +}; > > > > If these are going to be generally necessary, then we should get them > > adopted as standardised _DSD properties (ideally just one of them). > > As far as I can tell, and please correct me if I am wrong, ACPI-6.0 > doesn't contemplate MAC addresses. > > Today we are using "mac-address", which is an Integer containing the MAC > address in its lowest order 48 bits in Little-Endian byte order. > > The hardware and ACPI tables are here today, and we would like to > support it. If some future ACPI specification specifies a standard way > to do this, we will probably adapt the code to do this in a standard manner. > > > > > > [...] > > > >> +static acpi_status bgx_acpi_register_phy(acpi_handle handle, > >> + u32 lvl, void *context, void **rv) > >> +{ > >> + struct acpi_reference_args args; > >> + const union acpi_object *prop; > >> + struct bgx *bgx = context; > >> + struct acpi_device *adev; > >> + struct device *phy_dev; > >> + u32 phy_id; > >> + > >> + if (acpi_bus_get_device(handle, &adev)) > >> + goto out; > >> + > >> + SET_NETDEV_DEV(&bgx->lmac[bgx->lmac_count].netdev, &bgx->pdev->dev); > >> + > >> + acpi_get_mac_address(adev, bgx->lmac[bgx->lmac_count].mac); > >> + > >> + bgx->lmac[bgx->lmac_count].lmacid = bgx->lmac_count; > >> + > >> + if (acpi_dev_get_property_reference(adev, "phy-handle", 0, &args)) > >> + goto out; > >> + > >> + if (acpi_dev_get_property(args.adev, "phy-channel", ACPI_TYPE_INTEGER, > >> &prop)) > >> + goto out; > > > > Likewise for any inter-device properties, so that we can actually handle > > them in a generic fashion, and avoid / learn from the mistakes we've > > already handled with DT. > > This is the fallacy of the ACPI is superior to DT argument. The > specification of PHY topology and MAC addresses is well standardized in > DT, there is no question about what the proper way to specify it is. > Under ACPI, it is the Wild West, there is no specification, so each > system design is forced to invent something, and everybody comes up with > an incompatible implementation. Indeed. If ACPI is going to handle it, it should handle it properly. I really don't see the point in bodging properties together in a less standard manner than DT, especially for inter-device relationships. Doing so is painful for _everyone_, and it's extremely unlikely that other ACPI-aware OSs will actually support these custom descriptions, making this Linux-specific, and breaking the rationale for using ACPI in the first place -- a standard that says "just do non-standard stuff" is not a usable standard. For intra-device properties, we should standardise what we can, but vendor-specific stuff is ok -- this can be self-contained within a driver. For inter-device relationships ACPI _must_ gain a better model of componentised devices. It's simply unworkable otherwise, and as you point out it's fallacious to say that because ACPI is being used that something is magically industry standard, portable, etc. This is not your problem in particular; the entire handling of _DSD so far is a joke IMO. Thanks, Mark. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] net, thunder, bgx: Add support for ACPI binding.
[Correcting the devicetree list address, which I typo'd in my original reply] > >> +static const char * const addr_propnames[] = { > >> + "mac-address", > >> + "local-mac-address", > >> + "address", > >> +}; > > > > If these are going to be generally necessary, then we should get them > > adopted as standardised _DSD properties (ideally just one of them). > > As far as I can tell, and please correct me if I am wrong, ACPI-6.0 > doesn't contemplate MAC addresses. > > Today we are using "mac-address", which is an Integer containing the MAC > address in its lowest order 48 bits in Little-Endian byte order. > > The hardware and ACPI tables are here today, and we would like to > support it. If some future ACPI specification specifies a standard way > to do this, we will probably adapt the code to do this in a standard manner. > > > > > > [...] > > > >> +static acpi_status bgx_acpi_register_phy(acpi_handle handle, > >> + u32 lvl, void *context, void **rv) > >> +{ > >> + struct acpi_reference_args args; > >> + const union acpi_object *prop; > >> + struct bgx *bgx = context; > >> + struct acpi_device *adev; > >> + struct device *phy_dev; > >> + u32 phy_id; > >> + > >> + if (acpi_bus_get_device(handle, &adev)) > >> + goto out; > >> + > >> + SET_NETDEV_DEV(&bgx->lmac[bgx->lmac_count].netdev, &bgx->pdev->dev); > >> + > >> + acpi_get_mac_address(adev, bgx->lmac[bgx->lmac_count].mac); > >> + > >> + bgx->lmac[bgx->lmac_count].lmacid = bgx->lmac_count; > >> + > >> + if (acpi_dev_get_property_reference(adev, "phy-handle", 0, &args)) > >> + goto out; > >> + > >> + if (acpi_dev_get_property(args.adev, "phy-channel", ACPI_TYPE_INTEGER, > >> &prop)) > >> + goto out; > > > > Likewise for any inter-device properties, so that we can actually handle > > them in a generic fashion, and avoid / learn from the mistakes we've > > already handled with DT. > > This is the fallacy of the ACPI is superior to DT argument. The > specification of PHY topology and MAC addresses is well standardized in > DT, there is no question about what the proper way to specify it is. > Under ACPI, it is the Wild West, there is no specification, so each > system design is forced to invent something, and everybody comes up with > an incompatible implementation. Indeed. If ACPI is going to handle it, it should handle it properly. I really don't see the point in bodging properties together in a less standard manner than DT, especially for inter-device relationships. Doing so is painful for _everyone_, and it's extremely unlikely that other ACPI-aware OSs will actually support these custom descriptions, making this Linux-specific, and breaking the rationale for using ACPI in the first place -- a standard that says "just do non-standard stuff" is not a usable standard. For intra-device properties, we should standardise what we can, but vendor-specific stuff is ok -- this can be self-contained within a driver. For inter-device relationships ACPI _must_ gain a better model of componentised devices. It's simply unworkable otherwise, and as you point out it's fallacious to say that because ACPI is being used that something is magically industry standard, portable, etc. This is not your problem in particular; the entire handling of _DSD so far is a joke IMO. Thanks, Mark. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] net, thunder, bgx: Add support for ACPI binding.
On 08/07/2015 07:01 AM, Mark Rutland wrote: On Fri, Aug 07, 2015 at 01:33:10AM +0100, David Daney wrote: From: David Daney Find out which PHYs belong to which BGX instance in the ACPI way. Set the MAC address of the device as provided by ACPI tables. This is similar to the implementation for devicetree in of_get_mac_address(). The table is searched for the device property entries "mac-address", "local-mac-address" and "address" in that order. The address is provided in a u64 variable and must contain a valid 6 bytes-len mac addr. Based on code from: Narinder Dhillon Tomasz Nowicki Robert Richter Signed-off-by: Tomasz Nowicki Signed-off-by: Robert Richter Signed-off-by: David Daney --- drivers/net/ethernet/cavium/thunder/thunder_bgx.c | 137 +- 1 file changed, 135 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c index 615b2af..2056583 100644 --- a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c +++ b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c @@ -6,6 +6,7 @@ * as published by the Free Software Foundation. */ +#include #include #include #include @@ -26,7 +27,7 @@ struct lmac { struct bgx *bgx; int dmac; - unsigned char mac[ETH_ALEN]; + u8 mac[ETH_ALEN]; boollink_up; int lmacid; /* ID within BGX */ int lmacid_bd; /* ID on board */ @@ -835,6 +836,133 @@ static void bgx_get_qlm_mode(struct bgx *bgx) } } +#ifdef CONFIG_ACPI + +static int bgx_match_phy_id(struct device *dev, void *data) +{ + struct phy_device *phydev = to_phy_device(dev); + u32 *phy_id = data; + + if (phydev->addr == *phy_id) + return 1; + + return 0; +} + +static const char * const addr_propnames[] = { + "mac-address", + "local-mac-address", + "address", +}; If these are going to be generally necessary, then we should get them adopted as standardised _DSD properties (ideally just one of them). As far as I can tell, and please correct me if I am wrong, ACPI-6.0 doesn't contemplate MAC addresses. Today we are using "mac-address", which is an Integer containing the MAC address in its lowest order 48 bits in Little-Endian byte order. The hardware and ACPI tables are here today, and we would like to support it. If some future ACPI specification specifies a standard way to do this, we will probably adapt the code to do this in a standard manner. [...] +static acpi_status bgx_acpi_register_phy(acpi_handle handle, +u32 lvl, void *context, void **rv) +{ + struct acpi_reference_args args; + const union acpi_object *prop; + struct bgx *bgx = context; + struct acpi_device *adev; + struct device *phy_dev; + u32 phy_id; + + if (acpi_bus_get_device(handle, &adev)) + goto out; + + SET_NETDEV_DEV(&bgx->lmac[bgx->lmac_count].netdev, &bgx->pdev->dev); + + acpi_get_mac_address(adev, bgx->lmac[bgx->lmac_count].mac); + + bgx->lmac[bgx->lmac_count].lmacid = bgx->lmac_count; + + if (acpi_dev_get_property_reference(adev, "phy-handle", 0, &args)) + goto out; + + if (acpi_dev_get_property(args.adev, "phy-channel", ACPI_TYPE_INTEGER, &prop)) + goto out; Likewise for any inter-device properties, so that we can actually handle them in a generic fashion, and avoid / learn from the mistakes we've already handled with DT. This is the fallacy of the ACPI is superior to DT argument. The specification of PHY topology and MAC addresses is well standardized in DT, there is no question about what the proper way to specify it is. Under ACPI, it is the Wild West, there is no specification, so each system design is forced to invent something, and everybody comes up with an incompatible implementation. That said, this is all specific to our BGX device, so anything we do doesn't break other devices. David Daney -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 17/20] net/xen-netfront: Make it running on 64KB page granularity
The PV network protocol is using 4KB page granularity. The goal of this patch is to allow a Linux using 64KB page granularity using network device on a non-modified Xen. It's only necessary to adapt the ring size and break skb data in small chunk of 4KB. The rest of the code is relying on the grant table code. Note that we allocate a Linux page for each rx skb but only the first 4KB is used. We may improve the memory usage by extending the size of the rx skb. Signed-off-by: Julien Grall --- Cc: Konrad Rzeszutek Wilk Cc: Boris Ostrovsky Cc: David Vrabel Cc: netdev@vger.kernel.org Improvement such as support of 64KB grant is not taken into consideration in this patch because we have the requirement to run a Linux using 64KB pages on a non-modified Xen. Tested with workload such as ping, ssh, wget, git... I would happy if someone give details how to test all the path. Changes in v3: - Fix errors reported by checkpatch.pl - s/mfn/gfn/ base on the new naming - xennet_tx_setup_grant was calling itself resulting an guest stall when using iperf. - The grant callback doesn't allow anymore to change the len (wasn't used here) - gnttab_foreach_grant has been renamed to gnttab_foreach_grant_in_range - gnttab_page_grant_foreign_ref has been renamed to gnttab_foreach_grant_foreign_ref_one Changes in v2: - Use gnttab_foreach_grant to split a Linux page in grant - Fix count slots --- drivers/net/xen-netfront.c | 122 - 1 file changed, 86 insertions(+), 36 deletions(-) diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index 47f791e..204ffb8 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -74,8 +74,8 @@ struct netfront_cb { #define GRANT_INVALID_REF 0 -#define NET_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE) -#define NET_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE) +#define NET_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, XEN_PAGE_SIZE) +#define NET_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, XEN_PAGE_SIZE) /* Minimum number of Rx slots (includes slot for GSO metadata). */ #define NET_RX_SLOTS_MIN (XEN_NETIF_NR_SLOTS_MIN + 1) @@ -291,7 +291,7 @@ static void xennet_alloc_rx_buffers(struct netfront_queue *queue) struct sk_buff *skb; unsigned short id; grant_ref_t ref; - unsigned long gfn; + struct page *page; struct xen_netif_rx_request *req; skb = xennet_alloc_one_rx_buffer(queue); @@ -307,14 +307,13 @@ static void xennet_alloc_rx_buffers(struct netfront_queue *queue) BUG_ON((signed short)ref < 0); queue->grant_rx_ref[id] = ref; - gfn = xen_page_to_gfn(skb_frag_page(&skb_shinfo(skb)->frags[0])); + page = skb_frag_page(&skb_shinfo(skb)->frags[0]); req = RING_GET_REQUEST(&queue->rx, req_prod); - gnttab_grant_foreign_access_ref(ref, - queue->info->xbdev->otherend_id, - gfn, - 0); - + gnttab_page_grant_foreign_access_ref_one(ref, + queue->info->xbdev->otherend_id, +page, +0); req->id = id; req->gref = ref; } @@ -415,25 +414,33 @@ static void xennet_tx_buf_gc(struct netfront_queue *queue) xennet_maybe_wake_tx(queue); } -static struct xen_netif_tx_request *xennet_make_one_txreq( - struct netfront_queue *queue, struct sk_buff *skb, - struct page *page, unsigned int offset, unsigned int len) +struct xennet_gnttab_make_txreq { + struct netfront_queue *queue; + struct sk_buff *skb; + struct page *page; + struct xen_netif_tx_request *tx; /* Last request */ + unsigned int size; +}; + +static void xennet_tx_setup_grant(unsigned long gfn, unsigned int offset, + unsigned int len, void *data) { + struct xennet_gnttab_make_txreq *info = data; unsigned int id; struct xen_netif_tx_request *tx; grant_ref_t ref; - - len = min_t(unsigned int, PAGE_SIZE - offset, len); + /* convenient aliases */ + struct page *page = info->page; + struct netfront_queue *queue = info->queue; + struct sk_buff *skb = info->skb; id = get_id_from_freelist(&queue->tx_skb_freelist, queue->tx_skbs); tx = RING_GET_REQUEST(&queue->tx, queue->tx.req_prod_pvt++); ref = gnttab_claim_grant_reference(&queue->gref_tx_head); BUG_ON((signed short)ref < 0); - gnttab_grant_foreign_access_ref(ref, -
[PATCH v3 18/20] net/xen-netback: Make it running on 64KB page granularity
The PV network protocol is using 4KB page granularity. The goal of this patch is to allow a Linux using 64KB page granularity working as a network backend on a non-modified Xen. It's only necessary to adapt the ring size and break skb data in small chunk of 4KB. The rest of the code is relying on the grant table code. Signed-off-by: Julien Grall --- Cc: Ian Campbell Cc: Wei Liu Cc: netdev@vger.kernel.org Improvement such as support of 64KB grant is not taken into consideration in this patch because we have the requirement to run a Linux using 64KB pages on a non-modified Xen. Changes in v3: - Fix errors reported by checkpatch.pl - s/mfn/gfn/ based on the new naming - gnttab_foreach_grant has been renamed to gnttab_forach_grant_in_range - The grant callback doesn't allow anymore to use less data. An helpers has been added in netback to handle this. Changes in v2: - Correctly set MAX_GRANT_COPY_OPS and XEN_NETBK_RX_SLOTS_MAX - Don't use XEN_PAGE_SIZE in handle_frag_list as we coalesce fragment into a new skb - Use gnntab_foreach_grant to split a Linux page into grant --- drivers/net/xen-netback/common.h | 15 ++-- drivers/net/xen-netback/netback.c | 153 -- 2 files changed, 107 insertions(+), 61 deletions(-) diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h index 8a495b3..bb68211 100644 --- a/drivers/net/xen-netback/common.h +++ b/drivers/net/xen-netback/common.h @@ -44,6 +44,7 @@ #include #include #include +#include #include typedef unsigned int pending_ring_idx_t; @@ -64,8 +65,8 @@ struct pending_tx_info { struct ubuf_info callback_struct; }; -#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE) -#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE) +#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, XEN_PAGE_SIZE) +#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, XEN_PAGE_SIZE) struct xenvif_rx_meta { int id; @@ -80,16 +81,18 @@ struct xenvif_rx_meta { /* Discriminate from any valid pending_idx value. */ #define INVALID_PENDING_IDX 0x -#define MAX_BUFFER_OFFSET PAGE_SIZE +#define MAX_BUFFER_OFFSET XEN_PAGE_SIZE #define MAX_PENDING_REQS XEN_NETIF_TX_RING_SIZE +#define MAX_XEN_SKB_FRAGS (65536 / XEN_PAGE_SIZE + 1) + /* It's possible for an skb to have a maximal number of frags * but still be less than MAX_BUFFER_OFFSET in size. Thus the - * worst-case number of copy operations is MAX_SKB_FRAGS per + * worst-case number of copy operations is MAX_XEN_SKB_FRAGS per * ring slot. */ -#define MAX_GRANT_COPY_OPS (MAX_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE) +#define MAX_GRANT_COPY_OPS (MAX_XEN_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE) #define NETBACK_INVALID_HANDLE -1 @@ -203,7 +206,7 @@ struct xenvif_queue { /* Per-queue data for xenvif */ /* Maximum number of Rx slots a to-guest packet may use, including the * slot needed for GSO meta-data. */ -#define XEN_NETBK_RX_SLOTS_MAX (MAX_SKB_FRAGS + 1) +#define XEN_NETBK_RX_SLOTS_MAX ((MAX_XEN_SKB_FRAGS + 1)) enum state_bit_shift { /* This bit marks that the vif is connected */ diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 66f1780..c32a9f2 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -263,6 +263,80 @@ static struct xenvif_rx_meta *get_next_rx_buffer(struct xenvif_queue *queue, return meta; } +struct gop_frag_copy { + struct xenvif_queue *queue; + struct netrx_pending_operations *npo; + struct xenvif_rx_meta *meta; + int head; + int gso_type; + + struct page *page; +}; + +static void xenvif_setup_copy_gop(unsigned long gfn, + unsigned int offset, + unsigned int *len, + struct gop_frag_copy *info) +{ + struct gnttab_copy *copy_gop; + struct xen_page_foreign *foreign; + /* Convenient aliases */ + struct xenvif_queue *queue = info->queue; + struct netrx_pending_operations *npo = info->npo; + struct page *page = info->page; + + BUG_ON(npo->copy_off > MAX_BUFFER_OFFSET); + + if (npo->copy_off == MAX_BUFFER_OFFSET) + info->meta = get_next_rx_buffer(queue, npo); + + if (npo->copy_off + *len > MAX_BUFFER_OFFSET) + *len = MAX_BUFFER_OFFSET - npo->copy_off; + + copy_gop = npo->copy + npo->copy_prod++; + copy_gop->flags = GNTCOPY_dest_gref; + copy_gop->len = *len; + + foreign = xen_page_foreign(page); + if (foreign) { + copy_gop->source.domid = foreign->domid; + copy_gop->source.u.ref = foreign->gref; + copy_gop->flags |= GNTCOPY_source_gref; + } else { + copy_gop->source.domid = DOMID_SELF; +
Re: [BUG] net/ipv4: inconsistent routing table
Hello, Alexander Duyck writes: > On 08/07/2015 01:23 AM, Zang MingJie wrote: >> IMO, the routing decision is determined, given a specific routing >> table and local network the result MUST be determined, independence of >> how/what order the routing entry is added. >> >> Now there are two ways to configure the system resulting EXACTLY the >> same routing table and local addresses, but the routing decision is >> totally different. >> >> SAME routing table, DIFFERENT routing decision, there MUST be bugs in kernel > > I wasn't arguing that the behavior is undesirable, but the likelihood of > having a default route assigned to a local address should be pretty > low. If the system is the default route of others then it should have a > different default gateway than itself. For example an office router > would end up pointing to the ISP as the gateway, and the ISP would > either point to some other provider or run a BGP configuration. So in > the case of the default route transitioning to us we should end up > having to delete and update the default route anyway. This is likely > one of the reasons why there hasn't been any issues reported with this > behavior until now. > > I'm just wondering if the work involved to fix it is going to be worth > it. We have to keep in mind that this will result in a change of > behavior for existing users and we don't know if anyone might be > expecting this type of behavior. > > We basically are looking at one of three options. The first one is to > just delete the route if you add the gateway as a local address or > remove it. That would be consistent with what you might see if the > address was the sole address on an interface of its own. The second > option is to update the nh_scope which I believe should be transitioned > between RT_SCOPE_HOST to RT_SCOPE_LINK if I am understanding things > correctly. The third option is we don't change the behavior and just > document it. This would then require manually deleting and restoring > any routes that use a recently modified address as their gateway. > > Based on your feedback I'm assuming you would probably prefer the second > option. I'm just waiting to see if there are any other opinions on the > matter before I act. The semantics behind this are not easy and the result might well break other people's system. I would leave the current resolution logic as-is and merely change the way iproute presents those information. Currently we resolve the nexthop during route setup time and install the resulting information into the FIB. This is very common on other OS, too. In case we would reevaluate the nexthop part of a route during local address changes on one of the interfaces, we could get the system very well in a situation where it would have to remove its default route because the network would not be reachable via ip subnetting any more, but neighboring information would still keep the machine connected. And this could happen with setups where someone did not configure their routes to their own addresses, which are much more widespread. The change wouldn't be in contradiction with weak end system behavior, but I very much don't want to make other people's machines unreachable because of such a change. If we could rewind time, we could make local nexthops -EINVAL. Bye, Hannes -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] net, thunder, bgx: Add support for ACPI binding.
On 08/07/2015 05:42 AM, Tomasz Nowicki wrote: On 07.08.2015 13:56, Robert Richter wrote: On 07.08.15 12:52:41, Tomasz Nowicki wrote: [...] I would not pollute bgx_probe() with acpi and dt specifics, and instead keep bgx_init_phy(). The typical design pattern for this is: static int bgx_init_phy(struct bgx *bgx) { #ifdef CONFIG_ACPI if (!acpi_disabled) return bgx_init_acpi_phy(bgx); #endif return bgx_init_of_phy(bgx); } This adds acpi runtime detection (acpi=no), does not call dt code in case of acpi, and saves the #else for bgx_init_acpi_phy(). I am fine with keeping it in bgx_init_phy(), however we can drop there #ifdefs since both of bgx_init_{acpi,of}_phy calls have empty stub for !ACPI and !OF case. Like that: static int bgx_init_phy(struct bgx *bgx) { if (!acpi_disabled) return bgx_init_acpi_phy(bgx); else return bgx_init_of_phy(bgx); } As said, keeping it in #ifdefs makes the empty stub function for !acpi obsolete, which makes the code smaller and better readable. This style is common practice in the kernel. Apart from that, the 'else' should be dropped as it is useless. I would't say the code is better readable (bgx_init_of_phy has still two stubs) but this is just cosmetic, my point was to use run time detection using acpi_disabled. Thanks Tomasz and Robert for the input. Because of this, it seems that another version of the patch will be necessary. In the interests of code clarity and asthetics, we will go with the code without the #ifdefs, and rely on the compiler to optimize away any dead code. David Daney Tomasz -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 01/20] net/xen-netback: xenvif_gop_frag_copy: move GSO check out of the loop
The skb doesn't change within the function. Therefore it's only necessary to check if we need GSO once at the beginning. Signed-off-by: Julien Grall --- Cc: Ian Campbell Cc: Wei Liu Cc: netdev@vger.kernel.org Changes in v2: - Patch added --- drivers/net/xen-netback/netback.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 8293374..66f1780 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -277,6 +277,13 @@ static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb unsigned long bytes; int gso_type = XEN_NETIF_GSO_TYPE_NONE; + if (skb_is_gso(skb)) { + if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4) + gso_type = XEN_NETIF_GSO_TYPE_TCPV4; + else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6) + gso_type = XEN_NETIF_GSO_TYPE_TCPV6; + } + /* Data must not cross a page boundary. */ BUG_ON(size + offset > PAGE_SIZEgso_type & SKB_GSO_TCPV6) - gso_type = XEN_NETIF_GSO_TYPE_TCPV6; - } - if (*head && ((1 << gso_type) & queue->vif->gso_mask)) queue->rx.req_cons++; -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v2] bridge: netlink: add support for vlan_filtering attribute
From: Nikolay Aleksandrov This patch adds the ability to toggle the vlan filtering support via netlink. Since we're already running with rtnl in .changelink() we don't need to take any additional locks. Signed-off-by: Nikolay Aleksandrov --- v2: return EOPNOTSUPP when vlan filtering isn't configured and can't be toggled I'll post the iproute2 patch if this one gets accepted. include/uapi/linux/if_link.h | 1 + net/bridge/br_netlink.c | 14 +- net/bridge/br_private.h | 7 +++ net/bridge/br_vlan.c | 18 -- 4 files changed, 33 insertions(+), 7 deletions(-) diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index ea047480a1f0..7531815bf88a 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -230,6 +230,7 @@ enum { IFLA_BR_AGEING_TIME, IFLA_BR_STP_STATE, IFLA_BR_PRIORITY, + IFLA_BR_VLAN_FILTERING, __IFLA_BR_MAX, }; diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c index 91a2e08c2bb8..6eb683d8e0c5 100644 --- a/net/bridge/br_netlink.c +++ b/net/bridge/br_netlink.c @@ -724,6 +724,7 @@ static const struct nla_policy br_policy[IFLA_BR_MAX + 1] = { [IFLA_BR_AGEING_TIME] = { .type = NLA_U32 }, [IFLA_BR_STP_STATE] = { .type = NLA_U32 }, [IFLA_BR_PRIORITY] = { .type = NLA_U16 }, + [IFLA_BR_VLAN_FILTERING] = { .type = NLA_U8 }, }; static int br_changelink(struct net_device *brdev, struct nlattr *tb[], @@ -771,6 +772,14 @@ static int br_changelink(struct net_device *brdev, struct nlattr *tb[], br_stp_set_bridge_priority(br, priority); } + if (data[IFLA_BR_VLAN_FILTERING]) { + u8 vlan_filter = nla_get_u8(data[IFLA_BR_VLAN_FILTERING]); + + err = __br_vlan_filter_toggle(br, vlan_filter); + if (err) + return err; + } + return 0; } @@ -782,6 +791,7 @@ static size_t br_get_size(const struct net_device *brdev) nla_total_size(sizeof(u32)) +/* IFLA_BR_AGEING_TIME */ nla_total_size(sizeof(u32)) +/* IFLA_BR_STP_STATE */ nla_total_size(sizeof(u16)) +/* IFLA_BR_PRIORITY */ + nla_total_size(sizeof(u8)) + /* IFLA_BR_VLAN_FILTERING */ 0; } @@ -794,13 +804,15 @@ static int br_fill_info(struct sk_buff *skb, const struct net_device *brdev) u32 ageing_time = jiffies_to_clock_t(br->ageing_time); u32 stp_enabled = br->stp_enabled; u16 priority = (br->bridge_id.prio[0] << 8) | br->bridge_id.prio[1]; + u8 vlan_enabled = br_vlan_enabled(br); if (nla_put_u32(skb, IFLA_BR_FORWARD_DELAY, forward_delay) || nla_put_u32(skb, IFLA_BR_HELLO_TIME, hello_time) || nla_put_u32(skb, IFLA_BR_MAX_AGE, age_time) || nla_put_u32(skb, IFLA_BR_AGEING_TIME, ageing_time) || nla_put_u32(skb, IFLA_BR_STP_STATE, stp_enabled) || - nla_put_u16(skb, IFLA_BR_PRIORITY, priority)) + nla_put_u16(skb, IFLA_BR_PRIORITY, priority) || + nla_put_u8(skb, IFLA_BR_VLAN_FILTERING, vlan_enabled)) return -EMSGSIZE; return 0; diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h index e2cb359f9dd3..3d95647039d0 100644 --- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -614,6 +614,7 @@ int br_vlan_delete(struct net_bridge *br, u16 vid); void br_vlan_flush(struct net_bridge *br); bool br_vlan_find(struct net_bridge *br, u16 vid); void br_recalculate_fwd_mask(struct net_bridge *br); +int __br_vlan_filter_toggle(struct net_bridge *br, unsigned long val); int br_vlan_filter_toggle(struct net_bridge *br, unsigned long val); int br_vlan_set_proto(struct net_bridge *br, unsigned long val); int br_vlan_init(struct net_bridge *br); @@ -771,6 +772,12 @@ static inline int br_vlan_enabled(struct net_bridge *br) { return 0; } + +static inline int __br_vlan_filter_toggle(struct net_bridge *br, + unsigned long val) +{ + return -EOPNOTSUPP; +} #endif struct nf_br_ops { diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c index 0d41f81838ff..3cef6892c0bb 100644 --- a/net/bridge/br_vlan.c +++ b/net/bridge/br_vlan.c @@ -468,21 +468,27 @@ void br_recalculate_fwd_mask(struct net_bridge *br) ~(1u << br->group_addr[5]); } -int br_vlan_filter_toggle(struct net_bridge *br, unsigned long val) +int __br_vlan_filter_toggle(struct net_bridge *br, unsigned long val) { - if (!rtnl_trylock()) - return restart_syscall(); - if (br->vlan_enabled == val) - goto unlock; + return 0; br->vlan_enabled = val; br_manage_promisc(br); recalculate_group_addr(br); br_recalculate_fwd_mask(br); -unlock: + return 0; +} + +int br_vlan_filter_toggle(struct net_br
[PATCH v3 4/9] xen: Use correctly the Xen memory terminologies
Based on include/xen/mm.h [1], Linux is mistakenly using MFN when GFN is meant, I suspect this is because the first support for Xen was for PV. This resulted in some misimplementation of helpers on ARM and confused developers about the expected behavior. For instance, with pfn_to_mfn, we expect to get an MFN based on the name. Although, if we look at the implementation on x86, it's returning a GFN. For clarity and avoid new confusion, replace any reference to mfn with gfn in any helpers used by PV drivers. The x86 code will still keep some reference of pfn_to_mfn but exclusively for PV (a BUG_ON has been added to ensure this). No changes as been made in the hypercall field, even though they may be invalid, in order to keep the same as the defintion in xen repo. Note that page_to_mfn has been renamed to xen_page_to_gfn to avoid a name to close to the KVM function gfn_to_page. Take also the opportunity to simplify simple construction such as pfn_to_mfn(page_to_pfn(page)) into xen_page_to_gfn. More complex clean up will come in follow-up patches. [1] http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=e758ed14f390342513405dd766e874934573e6cb Signed-off-by: Julien Grall Reviewed-by: Stefano Stabellini Acked-by: Dmitry Torokhov Acked-by: Wei Liu --- Cc: Russell King Cc: Konrad Rzeszutek Wilk Cc: Boris Ostrovsky Cc: David Vrabel Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: x...@kernel.org Cc: "Roger Pau Monné" Cc: Ian Campbell Cc: Juergen Gross Cc: "James E.J. Bottomley" Cc: Greg Kroah-Hartman Cc: Jiri Slaby Cc: Jean-Christophe Plagniol-Villard Cc: Tomi Valkeinen Cc: linux-in...@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-s...@vger.kernel.org Cc: linuxppc-...@lists.ozlabs.org Cc: linux-fb...@vger.kernel.org Cc: linux-arm-ker...@lists.infradead.org Note that I've re-introduced in v2 mfn_to_pfn & co only for x86 PV code. The helpers contain a BUG_ON to ensure that it's never called for auto-translated guests. I did as best as my can to determine whether mfn or gfn helpers should be used. Although, I haven't tried to boot it. It may be possible to do further cleanup in the mmu.c where I found some check to auto-translated. I'm not sure why given that the pvmmu callback are only used for non-auto translated guest. Changes in v3: - Add Stefano's reviewed-by (except for the x86 bits) - Add Wei (netback) and Dmitry's (input) acked-by - Keep the VIRT <-> MACHINE macro in the same order as before in arch/x86/include/asm/xen/page.h - Rename page_to_gfn to xen_page_to_gfn to avoid confusion with the KVM function gfn_to_page. - Typoes in the commit title Changes in v2: - Give directly the URL to the commit rather than the commit ID - xenstored_local_init: keep the cast to void * - Typoes - Keep pfn_to_mfn for x86 and PV-only. The *mfn* helpers are used in arch/x86/xen for enlighten.c, mmu.c, p2m.c, setup.c, smp.c and mm.c --- arch/arm/include/asm/xen/page.h | 13 +++-- arch/x86/include/asm/xen/page.h | 31 +-- arch/x86/xen/smp.c | 2 +- drivers/block/xen-blkfront.c| 6 +++--- drivers/input/misc/xen-kbdfront.c | 4 ++-- drivers/net/xen-netback/netback.c | 4 ++-- drivers/net/xen-netfront.c | 12 +++- drivers/scsi/xen-scsifront.c| 10 +- drivers/tty/hvc/hvc_xen.c | 5 +++-- drivers/video/fbdev/xen-fbfront.c | 4 ++-- drivers/xen/balloon.c | 2 +- drivers/xen/events/events_base.c| 2 +- drivers/xen/events/events_fifo.c| 4 ++-- drivers/xen/gntalloc.c | 3 ++- drivers/xen/manage.c| 2 +- drivers/xen/tmem.c | 4 ++-- drivers/xen/xenbus/xenbus_client.c | 2 +- drivers/xen/xenbus/xenbus_dev_backend.c | 2 +- drivers/xen/xenbus/xenbus_probe.c | 8 +++- include/xen/page.h | 4 ++-- 20 files changed, 73 insertions(+), 51 deletions(-) diff --git a/arch/arm/include/asm/xen/page.h b/arch/arm/include/asm/xen/page.h index 911d62b..1279563 100644 --- a/arch/arm/include/asm/xen/page.h +++ b/arch/arm/include/asm/xen/page.h @@ -34,14 +34,15 @@ typedef struct xpaddr { unsigned long __pfn_to_mfn(unsigned long pfn); extern struct rb_root phys_to_mach; -static inline unsigned long pfn_to_mfn(unsigned long pfn) +/* Pseudo-physical <-> Guest conversion */ +static inline unsigned long pfn_to_gfn(unsigned long pfn) { return pfn; } -static inline unsigned long mfn_to_pfn(unsigned long mfn) +static inline unsigned long gfn_to_pfn(unsigned long gfn) { - return mfn; + return gfn; } /* Pseudo-physical <-> BUS conversion */ @@ -65,9 +66,9 @@ static inline unsigned long bfn_to_pfn(unsigned long bfn) #define bfn_to_local_pfn
[PATCH v3 0/9] Use correctly the Xen memory terminologies
Hi all, This patch series aims to use the memory terminologies described in include/xen/mm.h [1] for Linux xen code. The differences from v2 is minor but I resent it because my 64K series depends on this series. Linux is using mistakenly MFN when GFN is meant, I suspect this is because the first support of Xen was for PV. This has brought some misimplementation of memory helpers on ARM and make the developper confused about the expected behavior. For instance, with pfn_to_mfn, we expect to get a MFN based on the name. Although, if we look at the implementation on x86, it's returning a GFN. Most of the callers are also using it this way. The first 2 patches of this series is ARM related in order to remove PV specific helpers which should not be used and fixing the implementation of pfn_to_mfn. The rest of the series is here rename most of the usage in the common code of MFN to GFN. I also took the opportunity to replace most of the call to pfn_to_gfn in the common code by page_to_gfn avoid construction such as pfn_to_gfn(page_to_pfn(...). Note the one xen-blkfront will be dropped by 64K series [2], I can include here if necessary. Major changes in v3: - More typoes - Rename page_to_gfn to xen_page_to_gfn to avoid confusing with the KVM function gfn_to_page Major changes in v2: - Use bfn rather than dfn for the DMA address - Re-introduced pfn_to_mfn for PV guests only - Typoes For all the changes see in each patch. This series is based on xentip for-linus-4.3 branch. A branch with all the patches can be found here: git://xenbits.xen.org/people/julieng/linux-arm.git branch page-renaming-v3 It has been boot tested on ARM64 and ARM32 and only built for x86. I would be happy if someone can give a try on x86 as I don't have a x86 box setup with Xen. Sincerely yours, [1] http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=e758ed14f390342513405dd766e874934573e6cb [2] https://lkml.org/lkml/2015/7/9/628 Cc: Boris Ostrovsky Cc: David Vrabel Cc: Dmitry Torokhov Cc: Greg Kroah-Hartman Cc: "H. Peter Anvin" Cc: Ian Campbell Cc: Ingo Molnar Cc: "James E.J. Bottomley" Cc: Jean-Christophe Plagniol-Villard Cc: Jiri Slaby Cc: Juergen Gross Cc: Konrad Rzeszutek Wilk Cc: linux-...@vger.kernel.org Cc: linux-arm-ker...@lists.infradead.org Cc: linux-fb...@vger.kernel.org Cc: linux-in...@vger.kernel.org Cc: linuxppc-...@lists.ozlabs.org Cc: linux-s...@vger.kernel.org Cc: netdev@vger.kernel.org Cc: "Roger Pau Monné" Cc: Russell King Cc: Stefano Stabellini Cc: Thomas Gleixner Cc: Tomi Valkeinen Cc: Wei Liu Cc: x...@kernel.org Julien Grall (9): arm/xen: Remove helpers which are PV specific xen: Make clear that swiotlb and biomerge are dealing with DMA address arm/xen: implement correctly pfn_to_mfn xen: Use correctly the Xen memory terminologies xen/tmem: Use xen_page_to_gfn rather than pfn_to_gfn video/xen-fbfront: Further s/MFN/GFN clean-up hvc/xen: Further s/MFN/GFN clean-up xen/privcmd: Further s/MFN/GFN/ clean-up xen/xenbus: Rename the variable xen_store_mfn to xen_store_gfn arch/arm/include/asm/xen/page.h | 44 - arch/arm/xen/enlighten.c| 18 +++--- arch/arm/xen/mm.c | 4 +-- arch/x86/include/asm/xen/page.h | 35 +- arch/x86/xen/mmu.c | 32 arch/x86/xen/smp.c | 2 +- drivers/block/xen-blkfront.c| 6 ++--- drivers/input/misc/xen-kbdfront.c | 4 +-- drivers/net/xen-netback/netback.c | 4 +-- drivers/net/xen-netfront.c | 12 + drivers/scsi/xen-scsifront.c| 10 drivers/tty/hvc/hvc_xen.c | 18 ++ drivers/video/fbdev/xen-fbfront.c | 20 +++ drivers/xen/balloon.c | 2 +- drivers/xen/biomerge.c | 6 ++--- drivers/xen/events/events_base.c| 2 +- drivers/xen/events/events_fifo.c| 4 +-- drivers/xen/gntalloc.c | 3 ++- drivers/xen/manage.c| 2 +- drivers/xen/privcmd.c | 44 - drivers/xen/swiotlb-xen.c | 16 ++-- drivers/xen/tmem.c | 21 ++-- drivers/xen/xenbus/xenbus_client.c | 2 +- drivers/xen/xenbus/xenbus_dev_backend.c | 2 +- drivers/xen/xenbus/xenbus_probe.c | 16 ++-- drivers/xen/xlate_mmu.c | 18 +++--- include/uapi/xen/privcmd.h | 4 +++ include/xen/page.h | 4 +-- include/xen/xen-ops.h | 10 29 files changed, 191 insertions(+), 174 deletions(-) -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] net-timestamp: Update skb_complete_tx_timestamp comment
After "62bccb8 net-timestamp: Make the clone operation stand-alone from phy timestamping" the hwtstamps parameter of skb_complete_tx_timestamp() may no longer be NULL. Signed-off-by: Benjamin Poirier Cc: Alexander Duyck --- include/linux/skbuff.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index d6cdd6e..22b6d9c 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -2884,11 +2884,11 @@ static inline bool skb_defer_rx_timestamp(struct sk_buff *skb) * * PHY drivers may accept clones of transmitted packets for * timestamping via their phy_driver.txtstamp method. These drivers - * must call this function to return the skb back to the stack, with - * or without a timestamp. + * must call this function to return the skb back to the stack with a + * timestamp. * * @skb: clone of the the original outgoing packet - * @hwtstamps: hardware time stamps, may be NULL if not available + * @hwtstamps: hardware time stamps * */ void skb_complete_tx_timestamp(struct sk_buff *skb, -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] bridge: netlink: add support for vlan_filtering attribute
> On Aug 7, 2015, at 7:24 PM, Nikolay Aleksandrov wrote: > > From: Nikolay Aleksandrov > > This patch adds the ability to toggle the vlan filtering support via > netlink. Since we're already running with rtnl in .changelink() we don't > need to take any additional locks. > > Signed-off-by: Nikolay Aleksandrov > --- > I'll post the iproute2 patch if this one gets accepted. > Uh, I wanted to post a version that returns an error when vlan filtering isn’t supported instead of always succeeding. So please drop this patch, I’ll post a v2 with that change so the user will know if vlan filtering was actually enabled. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] bridge: netlink: add support for vlan_filtering attribute
From: Nikolay Aleksandrov This patch adds the ability to toggle the vlan filtering support via netlink. Since we're already running with rtnl in .changelink() we don't need to take any additional locks. Signed-off-by: Nikolay Aleksandrov --- I'll post the iproute2 patch if this one gets accepted. include/uapi/linux/if_link.h | 1 + net/bridge/br_netlink.c | 12 +++- net/bridge/br_private.h | 6 ++ net/bridge/br_vlan.c | 16 ++-- 4 files changed, 28 insertions(+), 7 deletions(-) diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index ea047480a1f0..7531815bf88a 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -230,6 +230,7 @@ enum { IFLA_BR_AGEING_TIME, IFLA_BR_STP_STATE, IFLA_BR_PRIORITY, + IFLA_BR_VLAN_FILTERING, __IFLA_BR_MAX, }; diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c index 91a2e08c2bb8..2f5ab3def714 100644 --- a/net/bridge/br_netlink.c +++ b/net/bridge/br_netlink.c @@ -724,6 +724,7 @@ static const struct nla_policy br_policy[IFLA_BR_MAX + 1] = { [IFLA_BR_AGEING_TIME] = { .type = NLA_U32 }, [IFLA_BR_STP_STATE] = { .type = NLA_U32 }, [IFLA_BR_PRIORITY] = { .type = NLA_U16 }, + [IFLA_BR_VLAN_FILTERING] = { .type = NLA_U8 }, }; static int br_changelink(struct net_device *brdev, struct nlattr *tb[], @@ -771,6 +772,12 @@ static int br_changelink(struct net_device *brdev, struct nlattr *tb[], br_stp_set_bridge_priority(br, priority); } + if (data[IFLA_BR_VLAN_FILTERING]) { + u8 vlan_filter = nla_get_u8(data[IFLA_BR_VLAN_FILTERING]); + + __br_vlan_filter_toggle(br, vlan_filter); + } + return 0; } @@ -782,6 +789,7 @@ static size_t br_get_size(const struct net_device *brdev) nla_total_size(sizeof(u32)) +/* IFLA_BR_AGEING_TIME */ nla_total_size(sizeof(u32)) +/* IFLA_BR_STP_STATE */ nla_total_size(sizeof(u16)) +/* IFLA_BR_PRIORITY */ + nla_total_size(sizeof(u8)) + /* IFLA_BR_VLAN_FILTERING */ 0; } @@ -794,13 +802,15 @@ static int br_fill_info(struct sk_buff *skb, const struct net_device *brdev) u32 ageing_time = jiffies_to_clock_t(br->ageing_time); u32 stp_enabled = br->stp_enabled; u16 priority = (br->bridge_id.prio[0] << 8) | br->bridge_id.prio[1]; + u8 vlan_enabled = br_vlan_enabled(br); if (nla_put_u32(skb, IFLA_BR_FORWARD_DELAY, forward_delay) || nla_put_u32(skb, IFLA_BR_HELLO_TIME, hello_time) || nla_put_u32(skb, IFLA_BR_MAX_AGE, age_time) || nla_put_u32(skb, IFLA_BR_AGEING_TIME, ageing_time) || nla_put_u32(skb, IFLA_BR_STP_STATE, stp_enabled) || - nla_put_u16(skb, IFLA_BR_PRIORITY, priority)) + nla_put_u16(skb, IFLA_BR_PRIORITY, priority) || + nla_put_u8(skb, IFLA_BR_VLAN_FILTERING, vlan_enabled)) return -EMSGSIZE; return 0; diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h index e2cb359f9dd3..f8b613195a07 100644 --- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -614,6 +614,7 @@ int br_vlan_delete(struct net_bridge *br, u16 vid); void br_vlan_flush(struct net_bridge *br); bool br_vlan_find(struct net_bridge *br, u16 vid); void br_recalculate_fwd_mask(struct net_bridge *br); +void __br_vlan_filter_toggle(struct net_bridge *br, unsigned long val); int br_vlan_filter_toggle(struct net_bridge *br, unsigned long val); int br_vlan_set_proto(struct net_bridge *br, unsigned long val); int br_vlan_init(struct net_bridge *br); @@ -771,6 +772,11 @@ static inline int br_vlan_enabled(struct net_bridge *br) { return 0; } + +static inline void __br_vlan_filter_toggle(struct net_bridge *br, + unsigned long val) +{ +} #endif struct nf_br_ops { diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c index 0d41f81838ff..ea07b9eae0f6 100644 --- a/net/bridge/br_vlan.c +++ b/net/bridge/br_vlan.c @@ -468,21 +468,25 @@ void br_recalculate_fwd_mask(struct net_bridge *br) ~(1u << br->group_addr[5]); } -int br_vlan_filter_toggle(struct net_bridge *br, unsigned long val) +void __br_vlan_filter_toggle(struct net_bridge *br, unsigned long val) { - if (!rtnl_trylock()) - return restart_syscall(); - if (br->vlan_enabled == val) - goto unlock; + return; br->vlan_enabled = val; br_manage_promisc(br); recalculate_group_addr(br); br_recalculate_fwd_mask(br); +} -unlock: +int br_vlan_filter_toggle(struct net_bridge *br, unsigned long val) +{ + if (!rtnl_trylock()) + return restart_syscall(); + + __br_vlan_filter_toggle(br, val); rtnl_unlock(); + return 0; } -- 2.4.
Re: [BUG] net/ipv4: inconsistent routing table
On 08/07/2015 01:23 AM, Zang MingJie wrote: IMO, the routing decision is determined, given a specific routing table and local network the result MUST be determined, independence of how/what order the routing entry is added. Now there are two ways to configure the system resulting EXACTLY the same routing table and local addresses, but the routing decision is totally different. SAME routing table, DIFFERENT routing decision, there MUST be bugs in kernel I wasn't arguing that the behavior is undesirable, but the likelihood of having a default route assigned to a local address should be pretty low. If the system is the default route of others then it should have a different default gateway than itself. For example an office router would end up pointing to the ISP as the gateway, and the ISP would either point to some other provider or run a BGP configuration. So in the case of the default route transitioning to us we should end up having to delete and update the default route anyway. This is likely one of the reasons why there hasn't been any issues reported with this behavior until now. I'm just wondering if the work involved to fix it is going to be worth it. We have to keep in mind that this will result in a change of behavior for existing users and we don't know if anyone might be expecting this type of behavior. We basically are looking at one of three options. The first one is to just delete the route if you add the gateway as a local address or remove it. That would be consistent with what you might see if the address was the sole address on an interface of its own. The second option is to update the nh_scope which I believe should be transitioned between RT_SCOPE_HOST to RT_SCOPE_LINK if I am understanding things correctly. The third option is we don't change the behavior and just document it. This would then require manually deleting and restoring any routes that use a recently modified address as their gateway. Based on your feedback I'm assuming you would probably prefer the second option. I'm just waiting to see if there are any other opinions on the matter before I act. Thanks. - Alex -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] can: flexcan: demote register output to debug level
This message isn't really helpful for the general reader of the kernel logs, so should not be printed with info level. All other register programming outputs in the flexcan driver already use the debug level. Signed-off-by: Lucas Stach --- drivers/net/can/flexcan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c index ad0a7e8c2c2b..95abd395cb0d 100644 --- a/drivers/net/can/flexcan.c +++ b/drivers/net/can/flexcan.c @@ -797,7 +797,7 @@ static void flexcan_set_bittiming(struct net_device *dev) if (priv->can.ctrlmode & CAN_CTRLMODE_3_SAMPLES) reg |= FLEXCAN_CTRL_SMP; - netdev_info(dev, "writing ctrl=0x%08x\n", reg); + netdev_dbg(dev, "writing ctrl=0x%08x\n", reg); flexcan_write(reg, ®s->ctrl); /* print chip status */ -- 2.4.6 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] net, thunder, bgx: Add support for ACPI binding.
On Thu, Aug 06, 2015 at 05:33:10PM -0700, David Daney wrote: > From: David Daney > > Find out which PHYs belong to which BGX instance in the ACPI way. > > Set the MAC address of the device as provided by ACPI tables. This is > similar to the implementation for devicetree in > of_get_mac_address(). The table is searched for the device property > entries "mac-address", "local-mac-address" and "address" in that > order. The address is provided in a u64 variable and must contain a > valid 6 bytes-len mac addr. > > Based on code from: Narinder Dhillon > Tomasz Nowicki > Robert Richter > > Signed-off-by: Tomasz Nowicki > Signed-off-by: Robert Richter > Signed-off-by: David Daney > --- > drivers/net/ethernet/cavium/thunder/thunder_bgx.c | 137 > +- > 1 file changed, 135 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c > b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c > index 615b2af..2056583 100644 > --- a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c > +++ b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c > @@ -6,6 +6,7 @@ > * as published by the Free Software Foundation. > */ > > +#include > #include > #include > #include > @@ -26,7 +27,7 @@ > struct lmac { > struct bgx *bgx; > int dmac; > - unsigned char mac[ETH_ALEN]; > + u8 mac[ETH_ALEN]; > boollink_up; > int lmacid; /* ID within BGX */ > int lmacid_bd; /* ID on board */ > @@ -835,6 +836,133 @@ static void bgx_get_qlm_mode(struct bgx *bgx) > } > } > > +#ifdef CONFIG_ACPI > + > +static int bgx_match_phy_id(struct device *dev, void *data) > +{ > + struct phy_device *phydev = to_phy_device(dev); > + u32 *phy_id = data; > + > + if (phydev->addr == *phy_id) > + return 1; > + > + return 0; > +} > + > +static const char * const addr_propnames[] = { > + "mac-address", > + "local-mac-address", > + "address", > +}; > + > +static int acpi_get_mac_address(struct acpi_device *adev, u8 *dst) > +{ > + const union acpi_object *prop; > + u64 mac_val; > + u8 mac[ETH_ALEN]; > + int i, j; > + int ret; > + > + for (i = 0; i < ARRAY_SIZE(addr_propnames); i++) { > + ret = acpi_dev_get_property(adev, addr_propnames[i], > + ACPI_TYPE_INTEGER, &prop); Shouldn't this be trying to use device_property_read_* API and making the DT/ACPI path the same where possible? Graeme > + if (ret) > + continue; > + > + mac_val = prop->integer.value; > + > + if (mac_val & (~0ULL << 48)) > + continue; /* more than 6 bytes */ > + > + for (j = 0; j < ARRAY_SIZE(mac); j++) > + mac[j] = (u8)(mac_val >> (8 * j)); > + if (!is_valid_ether_addr(mac)) > + continue; > + > + memcpy(dst, mac, ETH_ALEN); > + > + return 0; > + } > + > + return ret ? ret : -EINVAL; > +} > + > +static acpi_status bgx_acpi_register_phy(acpi_handle handle, > + u32 lvl, void *context, void **rv) > +{ > + struct acpi_reference_args args; > + const union acpi_object *prop; > + struct bgx *bgx = context; > + struct acpi_device *adev; > + struct device *phy_dev; > + u32 phy_id; > + > + if (acpi_bus_get_device(handle, &adev)) > + goto out; > + > + SET_NETDEV_DEV(&bgx->lmac[bgx->lmac_count].netdev, &bgx->pdev->dev); > + > + acpi_get_mac_address(adev, bgx->lmac[bgx->lmac_count].mac); > + > + bgx->lmac[bgx->lmac_count].lmacid = bgx->lmac_count; > + > + if (acpi_dev_get_property_reference(adev, "phy-handle", 0, &args)) > + goto out; > + > + if (acpi_dev_get_property(args.adev, "phy-channel", ACPI_TYPE_INTEGER, > &prop)) > + goto out; > + > + phy_id = prop->integer.value; > + > + phy_dev = bus_find_device(&mdio_bus_type, NULL, (void *)&phy_id, > + bgx_match_phy_id); > + if (!phy_dev) > + goto out; > + > + bgx->lmac[bgx->lmac_count].phydev = to_phy_device(phy_dev); > +out: > + bgx->lmac_count++; > + return AE_OK; > +} > + > +static acpi_status bgx_acpi_match_id(acpi_handle handle, u32 lvl, > + void *context, void **ret_val) > +{ > + struct acpi_buffer string = { ACPI_ALLOCATE_BUFFER, NULL }; > + struct bgx *bgx = context; > + char bgx_sel[5]; > + > + snprintf(bgx_sel, 5, "BGX%d", bgx->bgx_id); > + if (ACPI_FAILURE(acpi_get_name(handle, ACPI_SINGLE_NAME, &string))) { > + pr_warn("Invalid link device\n"); > + return AE_OK; > + } > + > + if (strncmp(string.pointer, bgx_sel, 4)) >
Re: [Xen-devel] [PATCH v2] xen-apic: Enable on domU as well
On Fri, Aug 7, 2015 at 4:23 PM, Konrad Rzeszutek Wilk wrote: > Anyhow, your patch seems to fix a regression my patch > feb44f1f7a4ac299d1ab1c3606860e70b9b89d69 > "x86/xen: Provide a "Xen PV" APIC driver to support >255 VCPUs" > introduced. Ahhh, good, okay. That explains why I didn't encounter this with older kernels. The whole picture makes sense now. Thanks for reviewing this. David - mergable? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Xen-devel] [PATCH v2] xen-apic: Enable on domU as well
On Thu, Aug 06, 2015 at 06:37:05PM +0200, Jason A. Donenfeld wrote: > It turns out that domU also requires the Xen APIC driver. Otherwise we > get stuck in busy loops that never exit, such as in this stack trace: > > (gdb) target remote localhost: > Remote debugging using localhost: > __xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56 > 56 while (native_apic_mem_read(APIC_ICR) & APIC_ICR_BUSY) > (gdb) bt > #0 __xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56 > #1 __default_send_IPI_shortcut (shortcut=, > dest=, vector=) at > ./arch/x86/include/asm/ipi.h:75 > #2 apic_send_IPI_self (vector=246) at arch/x86/kernel/apic/probe_64.c:54 > #3 0x81011336 in arch_irq_work_raise () at > arch/x86/kernel/irq_work.c:47 > #4 0x8114990c in irq_work_queue (work=0x88000fc0e400) at > kernel/irq_work.c:100 > #5 0x8110c29d in wake_up_klogd () at kernel/printk/printk.c:2633 > #6 0x8110ca60 in vprintk_emit (facility=0, level= out>, dict=0x0 , dictlen=, > fmt=, args=) > at kernel/printk/printk.c:1778 > #7 0x816010c8 in printk (fmt=) at > kernel/printk/printk.c:1868 > #8 0xc00013ea in ?? () > #9 0x in ?? () > > Mailing-list-thread: https://lkml.org/lkml/2015/8/4/755 > Signed-off-by: Jason A. Donenfeld > Cc: David Vrabel > Cc: Ian Campbell > Cc: While this patch is OK for the trees that implement the PV APIC driver it won't apply to older ones (and it does not need to). In the older ones this was working with f447d56d36af18c5104ff29dcb1327c0c0ac3634 "xen: implement apic ipi interface", which should have worked for your case. And would have made the arch_irq_work_raise and such use the Xen code paths: 952 953 #ifdef CONFIG_SMP 954 apic->send_IPI_allbutself = xen_send_IPI_allbutself; 955 apic->send_IPI_mask_allbutself = xen_send_IPI_mask_allbutself; 956 apic->send_IPI_mask = xen_send_IPI_mask; 957 apic->send_IPI_all = xen_send_IPI_all; 958 apic->send_IPI_self = xen_send_IPI_self; 959 #endif Anyhow, your patch seems to fix a regression my patch feb44f1f7a4ac299d1ab1c3606860e70b9b89d69 "x86/xen: Provide a "Xen PV" APIC driver to support >255 VCPUs" introduced. I would to the stable.vger.kernel.org: # apply only to v4.1 As the earlier ones will work fine. Thank you! Oh, and Reviewed-by: Konrad Rzeszutek Wilk > --- > arch/x86/xen/Makefile | 4 ++-- > arch/x86/xen/xen-ops.h | 11 --- > 2 files changed, 6 insertions(+), 9 deletions(-) > > diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile > index 7322755..4b6e29a 100644 > --- a/arch/x86/xen/Makefile > +++ b/arch/x86/xen/Makefile > @@ -13,13 +13,13 @@ CFLAGS_mmu.o := $(nostackp) > obj-y:= enlighten.o setup.o multicalls.o mmu.o irq.o \ > time.o xen-asm.o xen-asm_$(BITS).o \ > grant-table.o suspend.o platform-pci-unplug.o \ > - p2m.o > + p2m.o apic.o > > obj-$(CONFIG_EVENT_TRACING) += trace.o > > obj-$(CONFIG_SMP)+= smp.o > obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= spinlock.o > obj-$(CONFIG_XEN_DEBUG_FS) += debugfs.o > -obj-$(CONFIG_XEN_DOM0) += apic.o vga.o > +obj-$(CONFIG_XEN_DOM0) += vga.o > obj-$(CONFIG_SWIOTLB_XEN)+= pci-swiotlb-xen.o > obj-$(CONFIG_XEN_EFI)+= efi.o > diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h > index c20fe29..cd248ff 100644 > --- a/arch/x86/xen/xen-ops.h > +++ b/arch/x86/xen/xen-ops.h > @@ -98,20 +98,17 @@ static inline void xen_uninit_lock_cpu(int cpu) > #endif > > struct dom0_vga_console_info; > - > #ifdef CONFIG_XEN_DOM0 > void __init xen_init_vga(const struct dom0_vga_console_info *, size_t size); > -void __init xen_init_apic(void); > #else > -static inline void __init xen_init_vga(const struct dom0_vga_console_info > *info, > -size_t size) > -{ > -} > -static inline void __init xen_init_apic(void) > +void __init xen_init_vga(const struct dom0_vga_console_info *info, > + size_t size); > { > } > #endif > > +void __init xen_init_apic(void); > + > #ifdef CONFIG_XEN_EFI > extern void xen_efi_init(void); > #else > -- > 2.5.0 > > > ___ > Xen-devel mailing list > xen-de...@lists.xen.org > http://lists.xen.org/xen-devel -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] net, thunder, bgx: Add support for ACPI binding.
On Fri, Aug 07, 2015 at 01:33:10AM +0100, David Daney wrote: > From: David Daney > > Find out which PHYs belong to which BGX instance in the ACPI way. > > Set the MAC address of the device as provided by ACPI tables. This is > similar to the implementation for devicetree in > of_get_mac_address(). The table is searched for the device property > entries "mac-address", "local-mac-address" and "address" in that > order. The address is provided in a u64 variable and must contain a > valid 6 bytes-len mac addr. > > Based on code from: Narinder Dhillon > Tomasz Nowicki > Robert Richter > > Signed-off-by: Tomasz Nowicki > Signed-off-by: Robert Richter > Signed-off-by: David Daney > --- > drivers/net/ethernet/cavium/thunder/thunder_bgx.c | 137 > +- > 1 file changed, 135 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c > b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c > index 615b2af..2056583 100644 > --- a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c > +++ b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c > @@ -6,6 +6,7 @@ > * as published by the Free Software Foundation. > */ > > +#include > #include > #include > #include > @@ -26,7 +27,7 @@ > struct lmac { > struct bgx *bgx; > int dmac; > - unsigned char mac[ETH_ALEN]; > + u8 mac[ETH_ALEN]; > boollink_up; > int lmacid; /* ID within BGX */ > int lmacid_bd; /* ID on board */ > @@ -835,6 +836,133 @@ static void bgx_get_qlm_mode(struct bgx *bgx) > } > } > > +#ifdef CONFIG_ACPI > + > +static int bgx_match_phy_id(struct device *dev, void *data) > +{ > + struct phy_device *phydev = to_phy_device(dev); > + u32 *phy_id = data; > + > + if (phydev->addr == *phy_id) > + return 1; > + > + return 0; > +} > + > +static const char * const addr_propnames[] = { > + "mac-address", > + "local-mac-address", > + "address", > +}; If these are going to be generally necessary, then we should get them adopted as standardised _DSD properties (ideally just one of them). [...] > +static acpi_status bgx_acpi_register_phy(acpi_handle handle, > + u32 lvl, void *context, void **rv) > +{ > + struct acpi_reference_args args; > + const union acpi_object *prop; > + struct bgx *bgx = context; > + struct acpi_device *adev; > + struct device *phy_dev; > + u32 phy_id; > + > + if (acpi_bus_get_device(handle, &adev)) > + goto out; > + > + SET_NETDEV_DEV(&bgx->lmac[bgx->lmac_count].netdev, &bgx->pdev->dev); > + > + acpi_get_mac_address(adev, bgx->lmac[bgx->lmac_count].mac); > + > + bgx->lmac[bgx->lmac_count].lmacid = bgx->lmac_count; > + > + if (acpi_dev_get_property_reference(adev, "phy-handle", 0, &args)) > + goto out; > + > + if (acpi_dev_get_property(args.adev, "phy-channel", ACPI_TYPE_INTEGER, > &prop)) > + goto out; Likewise for any inter-device properties, so that we can actually handle them in a generic fashion, and avoid / learn from the mistakes we've already handled with DT. Mark. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[iproute PATCH] misc/ss: don't imply -a when -A was specified
Signed-off-by: Phil Sutter --- misc/ss.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/misc/ss.c b/misc/ss.c index bba7009..2f34962 100644 --- a/misc/ss.c +++ b/misc/ss.c @@ -3669,6 +3669,8 @@ int main(int argc, char *argv[]) char *p, *p1; if (!saw_query) { current_filter.dbs = 0; + state_filter = state_filter ? + state_filter : SS_CONN; saw_query = 1; do_default = 0; } -- 2.1.2 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] net, thunder, bgx: Add support for ACPI binding.
On 07.08.15 10:09:04, Tomasz Nowicki wrote: > On 07.08.2015 02:33, David Daney wrote: ... > >+#else > >+ > >+static int bgx_init_acpi_phy(struct bgx *bgx) > >+{ > >+return -ENODEV; > >+} > >+ > >+#endif /* CONFIG_ACPI */ > >+ > > #if IS_ENABLED(CONFIG_OF_MDIO) > > > > static int bgx_init_of_phy(struct bgx *bgx) > >@@ -882,7 +1010,12 @@ static int bgx_init_of_phy(struct bgx *bgx) > > > > static int bgx_init_phy(struct bgx *bgx) > > { > >-return bgx_init_of_phy(bgx); > >+int err = bgx_init_of_phy(bgx); > >+ > >+if (err != -ENODEV) > >+return err; > >+ > >+return bgx_init_acpi_phy(bgx); > > } > > > > If kernel can work with DT and ACPI (both compiled in), it should take only > one path instead of probing DT and ACPI sequentially. How about: > > @@ -902,10 +925,9 @@ static int bgx_probe(struct pci_dev *pdev, const struct > pci_device_id *ent) > bgx_vnic[bgx->bgx_id] = bgx; > bgx_get_qlm_mode(bgx); > > - snprintf(bgx_sel, 5, "bgx%d", bgx->bgx_id); > - np = of_find_node_by_name(NULL, bgx_sel); > - if (np) > - bgx_init_of(bgx, np); > + err = acpi_disabled ? bgx_init_of_phy(bgx) : bgx_init_acpi_phy(bgx); > + if (err) > + goto err_enable; > > bgx_init_hw(bgx); I would not pollute bgx_probe() with acpi and dt specifics, and instead keep bgx_init_phy(). The typical design pattern for this is: static int bgx_init_phy(struct bgx *bgx) { #ifdef CONFIG_ACPI if (!acpi_disabled) return bgx_init_acpi_phy(bgx); #endif return bgx_init_of_phy(bgx); } This adds acpi runtime detection (acpi=no), does not call dt code in case of acpi, and saves the #else for bgx_init_acpi_phy(). -Robert -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] net, thunder, bgx: Add support for ACPI binding.
On 07.08.2015 13:56, Robert Richter wrote: On 07.08.15 12:52:41, Tomasz Nowicki wrote: On 07.08.2015 12:43, Robert Richter wrote: On 07.08.15 10:09:04, Tomasz Nowicki wrote: On 07.08.2015 02:33, David Daney wrote: ... +#else + +static int bgx_init_acpi_phy(struct bgx *bgx) +{ + return -ENODEV; +} + +#endif /* CONFIG_ACPI */ + #if IS_ENABLED(CONFIG_OF_MDIO) static int bgx_init_of_phy(struct bgx *bgx) @@ -882,7 +1010,12 @@ static int bgx_init_of_phy(struct bgx *bgx) static int bgx_init_phy(struct bgx *bgx) { - return bgx_init_of_phy(bgx); + int err = bgx_init_of_phy(bgx); + + if (err != -ENODEV) + return err; + + return bgx_init_acpi_phy(bgx); } If kernel can work with DT and ACPI (both compiled in), it should take only one path instead of probing DT and ACPI sequentially. How about: @@ -902,10 +925,9 @@ static int bgx_probe(struct pci_dev *pdev, const struct pci_device_id *ent) bgx_vnic[bgx->bgx_id] = bgx; bgx_get_qlm_mode(bgx); - snprintf(bgx_sel, 5, "bgx%d", bgx->bgx_id); - np = of_find_node_by_name(NULL, bgx_sel); - if (np) - bgx_init_of(bgx, np); + err = acpi_disabled ? bgx_init_of_phy(bgx) : bgx_init_acpi_phy(bgx); + if (err) + goto err_enable; bgx_init_hw(bgx); I would not pollute bgx_probe() with acpi and dt specifics, and instead keep bgx_init_phy(). The typical design pattern for this is: static int bgx_init_phy(struct bgx *bgx) { #ifdef CONFIG_ACPI if (!acpi_disabled) return bgx_init_acpi_phy(bgx); #endif return bgx_init_of_phy(bgx); } This adds acpi runtime detection (acpi=no), does not call dt code in case of acpi, and saves the #else for bgx_init_acpi_phy(). I am fine with keeping it in bgx_init_phy(), however we can drop there #ifdefs since both of bgx_init_{acpi,of}_phy calls have empty stub for !ACPI and !OF case. Like that: static int bgx_init_phy(struct bgx *bgx) { if (!acpi_disabled) return bgx_init_acpi_phy(bgx); else return bgx_init_of_phy(bgx); } As said, keeping it in #ifdefs makes the empty stub function for !acpi obsolete, which makes the code smaller and better readable. This style is common practice in the kernel. Apart from that, the 'else' should be dropped as it is useless. I would't say the code is better readable (bgx_init_of_phy has still two stubs) but this is just cosmetic, my point was to use run time detection using acpi_disabled. Tomasz -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 4/6] qlcnic: Add new VF device ID 0x8C30
From: Shahed Shaikh This is a 83xx series based VF device Signed-off-by: Shahed Shaikh --- drivers/net/ethernet/qlogic/qlcnic/qlcnic.h | 12 drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c |5 - 2 files changed, 12 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h b/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h index a861592..17f37b7 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h @@ -2290,8 +2290,9 @@ extern const struct ethtool_ops qlcnic_ethtool_failed_ops; #define PCI_DEVICE_ID_QLOGIC_QLE824X 0x8020 #define PCI_DEVICE_ID_QLOGIC_QLE834X 0x8030 -#define PCI_DEVICE_ID_QLOGIC_QLE8830 0x8830 #define PCI_DEVICE_ID_QLOGIC_VF_QLE834X0x8430 +#define PCI_DEVICE_ID_QLOGIC_QLE8830 0x8830 +#define PCI_DEVICE_ID_QLOGIC_VF_QLE8C300x8C30 #define PCI_DEVICE_ID_QLOGIC_QLE844X 0x8040 #define PCI_DEVICE_ID_QLOGIC_VF_QLE844X0x8440 @@ -2318,7 +2319,8 @@ static inline bool qlcnic_83xx_check(struct qlcnic_adapter *adapter) (device == PCI_DEVICE_ID_QLOGIC_QLE8830) || (device == PCI_DEVICE_ID_QLOGIC_QLE844X) || (device == PCI_DEVICE_ID_QLOGIC_VF_QLE844X) || - (device == PCI_DEVICE_ID_QLOGIC_VF_QLE834X)) ? true : false; + (device == PCI_DEVICE_ID_QLOGIC_VF_QLE834X) || + (device == PCI_DEVICE_ID_QLOGIC_VF_QLE8C30)) ? true : false; return status; } @@ -2334,7 +2336,8 @@ static inline bool qlcnic_sriov_vf_check(struct qlcnic_adapter *adapter) bool status; status = ((device == PCI_DEVICE_ID_QLOGIC_VF_QLE834X) || - (device == PCI_DEVICE_ID_QLOGIC_VF_QLE844X)) ? true : false; + (device == PCI_DEVICE_ID_QLOGIC_VF_QLE844X) || + (device == PCI_DEVICE_ID_QLOGIC_VF_QLE8C30)) ? true : false; return status; } @@ -2350,7 +2353,8 @@ static inline bool qlcnic_83xx_vf_check(struct qlcnic_adapter *adapter) { unsigned short device = adapter->pdev->device; - return (device == PCI_DEVICE_ID_QLOGIC_VF_QLE834X) ? true : false; + return ((device == PCI_DEVICE_ID_QLOGIC_VF_QLE834X) || + (device == PCI_DEVICE_ID_QLOGIC_VF_QLE8C30)) ? true : false; } static inline bool qlcnic_sriov_check(struct qlcnic_adapter *adapter) diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c index b714cee..8b08b20 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c @@ -110,8 +110,9 @@ static u32 qlcnic_vlan_tx_check(struct qlcnic_adapter *adapter) static const struct pci_device_id qlcnic_pci_tbl[] = { ENTRY(PCI_DEVICE_ID_QLOGIC_QLE824X), ENTRY(PCI_DEVICE_ID_QLOGIC_QLE834X), - ENTRY(PCI_DEVICE_ID_QLOGIC_QLE8830), ENTRY(PCI_DEVICE_ID_QLOGIC_VF_QLE834X), + ENTRY(PCI_DEVICE_ID_QLOGIC_QLE8830), + ENTRY(PCI_DEVICE_ID_QLOGIC_VF_QLE8C30), ENTRY(PCI_DEVICE_ID_QLOGIC_QLE844X), ENTRY(PCI_DEVICE_ID_QLOGIC_VF_QLE844X), {0,} @@ -1148,6 +1149,7 @@ static void qlcnic_get_bar_length(u32 dev_id, ulong *bar) case PCI_DEVICE_ID_QLOGIC_QLE844X: case PCI_DEVICE_ID_QLOGIC_VF_QLE834X: case PCI_DEVICE_ID_QLOGIC_VF_QLE844X: + case PCI_DEVICE_ID_QLOGIC_VF_QLE8C30: *bar = QLCNIC_83XX_BAR0_LENGTH; break; default: @@ -2490,6 +2492,7 @@ qlcnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent) qlcnic_83xx_register_map(ahw); break; case PCI_DEVICE_ID_QLOGIC_VF_QLE834X: + case PCI_DEVICE_ID_QLOGIC_VF_QLE8C30: case PCI_DEVICE_ID_QLOGIC_VF_QLE844X: qlcnic_sriov_vf_register_map(ahw); break; -- 1.5.6 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 3/6] qlcnic: Print firmware minidump buffer and template header addresses
From: Shahed Shaikh Signed-off-by: Shahed Shaikh --- .../net/ethernet/qlogic/qlcnic/qlcnic_minidump.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_minidump.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_minidump.c index aca47fd..cda9e60 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_minidump.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_minidump.c @@ -1388,8 +1388,9 @@ int qlcnic_dump_fw(struct qlcnic_adapter *adapter) fw_dump->clr = 1; snprintf(mesg, sizeof(mesg), "FW_DUMP=%s", adapter->netdev->name); netdev_info(adapter->netdev, - "Dump data %d bytes captured, template header size %d bytes\n", - fw_dump->size, fw_dump->tmpl_hdr_size); + "Dump data %d bytes captured, dump data address = %p, template header size %d bytes, template address = %p\n", + fw_dump->size, fw_dump->data, fw_dump->tmpl_hdr_size, + fw_dump->tmpl_hdr); /* Send a udev event to notify availability of FW dump */ kobject_uevent_env(&dev->kobj, KOBJ_CHANGE, msg); -- 1.5.6 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 2/6] qlcnic: Add support to enable capability to extend minidump for iSCSI
From: Shahed Shaikh In some cases it is required to capture minidump for iSCSI functions as part of default minidump collection process. To enable this, firmware exports it's capability and driver need to enable that capability by issuing a mailbox command. With this feature, firmware can provide additional iSCSI function's minidump with smaller minidump capture mask (0x1f). Signed-off-by: Shahed Shaikh --- drivers/net/ethernet/qlogic/qlcnic/qlcnic.h|1 + .../net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c| 26 .../net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.h|1 + drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.h |1 + .../net/ethernet/qlogic/qlcnic/qlcnic_minidump.c | 32 5 files changed, 61 insertions(+), 0 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h b/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h index c6dca5d..a861592 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h @@ -924,6 +924,7 @@ struct qlcnic_mac_vlan_list { #define QLCNIC_FW_CAPABILITY_SET_DRV_VER BIT_5 #define QLCNIC_FW_CAPABILITY_2_BEACON BIT_7 #define QLCNIC_FW_CAPABILITY_2_PER_PORT_ESWITCH_CFGBIT_9 +#define QLCNIC_FW_CAPABILITY_2_EXT_ISCSI_DUMP BIT_13 #define QLCNIC_83XX_FW_CAPAB_ENCAP_RX_OFFLOAD BIT_0 #define QLCNIC_83XX_FW_CAPAB_ENCAP_TX_OFFLOAD BIT_1 diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c index 172192d..5ab3adf 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c @@ -119,6 +119,7 @@ static const struct qlcnic_mailbox_metadata qlcnic_83xx_mbx_tbl[] = { {QLCNIC_CMD_DCB_QUERY_CAP, 1, 2}, {QLCNIC_CMD_DCB_QUERY_PARAM, 1, 50}, {QLCNIC_CMD_SET_INGRESS_ENCAP, 2, 1}, + {QLCNIC_CMD_83XX_EXTEND_ISCSI_DUMP_CAP, 4, 1}, }; const u32 qlcnic_83xx_ext_reg_tbl[] = { @@ -3514,6 +3515,31 @@ out: qlcnic_free_mbx_args(&cmd); } +#define QLCNIC_83XX_ADD_PORT0 BIT_0 +#define QLCNIC_83XX_ADD_PORT1 BIT_1 +#define QLCNIC_83XX_EXTENDED_MEM_SIZE 13 /* In MB */ +int qlcnic_83xx_extend_md_capab(struct qlcnic_adapter *adapter) +{ + struct qlcnic_cmd_args cmd; + int err; + + err = qlcnic_alloc_mbx_args(&cmd, adapter, + QLCNIC_CMD_83XX_EXTEND_ISCSI_DUMP_CAP); + if (err) + return err; + + cmd.req.arg[1] = (QLCNIC_83XX_ADD_PORT0 | QLCNIC_83XX_ADD_PORT1); + cmd.req.arg[2] = QLCNIC_83XX_EXTENDED_MEM_SIZE; + cmd.req.arg[3] = QLCNIC_83XX_EXTENDED_MEM_SIZE; + + err = qlcnic_issue_cmd(adapter, &cmd); + if (err) + dev_err(&adapter->pdev->dev, + "failed to issue extend iSCSI minidump capability\n"); + + return err; +} + int qlcnic_83xx_reg_test(struct qlcnic_adapter *adapter) { u32 major, minor, sub; diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.h b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.h index 999a90e..331ae2c 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.h +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.h @@ -627,6 +627,7 @@ int qlcnic_83xx_set_port_eswitch_status(struct qlcnic_adapter *, int, int *); void qlcnic_83xx_get_minidump_template(struct qlcnic_adapter *); void qlcnic_83xx_get_stats(struct qlcnic_adapter *adapter, u64 *data); +int qlcnic_83xx_extend_md_capab(struct qlcnic_adapter *); int qlcnic_83xx_get_settings(struct qlcnic_adapter *, struct ethtool_cmd *); int qlcnic_83xx_set_settings(struct qlcnic_adapter *, struct ethtool_cmd *); void qlcnic_83xx_get_pauseparam(struct qlcnic_adapter *, diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.h b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.h index cbe2399..4bb33af 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.h +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.h @@ -109,6 +109,7 @@ enum qlcnic_regs { #define QLCNIC_CMD_GET_LED_CONFIG 0x6A #define QLCNIC_CMD_83XX_SET_DRV_VER0x6F #define QLCNIC_CMD_ADD_RCV_RINGS 0x0B +#define QLCNIC_CMD_83XX_EXTEND_ISCSI_DUMP_CAP 0x37 #define QLCNIC_INTRPT_INTX 1 #define QLCNIC_INTRPT_MSIX 3 diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_minidump.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_minidump.c index 6f33e2d..aca47fd 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_minidump.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_minidump.c @@ -1396,19 +1396,51 @@ int qlcnic_dump_fw(struct qlcnic_adapter *adapter) return 0; } +static inline bool +qlcnic_83xx_md_check_extended_dump_capability(struct qlcnic_adapter *adapter) +{ + /* For special adapters (with 0x8830 device ID), where iSCSI firmware +* dump needs to be captured as part of
[PATCH net-next 6/6] qlcnic: Update version to 5.3.63
From: Shahed Shaikh Signed-off-by: Shahed Shaikh --- drivers/net/ethernet/qlogic/qlcnic/qlcnic.h |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h b/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h index 17f37b7..06bcc73 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h @@ -37,8 +37,8 @@ #define _QLCNIC_LINUX_MAJOR 5 #define _QLCNIC_LINUX_MINOR 3 -#define _QLCNIC_LINUX_SUBVERSION 62 -#define QLCNIC_LINUX_VERSIONID "5.3.62" +#define _QLCNIC_LINUX_SUBVERSION 63 +#define QLCNIC_LINUX_VERSIONID "5.3.63" #define QLCNIC_DRV_IDC_VER 0x01 #define QLCNIC_DRIVER_VERSION ((_QLCNIC_LINUX_MAJOR << 16) |\ (_QLCNIC_LINUX_MINOR << 8) | (_QLCNIC_LINUX_SUBVERSION)) -- 1.5.6 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 0/6] qlcnic: enhancements
From: Shahed Shaikh Hi Dave, This series adds few enhancements. o Patch from Harish reorders the sequence of header files inclusion, keeping kernel's header files on top. o Firmware introduced a new feature which allows driver to increases the size of firmware dump of iSCSI function which is being collected by NIC driver. o Print buffer address which is holding a firmware dump. o Use vzalloc() instead kzalloc() for allocating large chunk of memory which will avoid potential memory allocation failure. o Add new device ID for 0x8C30 which is a 83xx series based VF function. Please apply this series to net-next. Thanks, Shahed Harish Patil (1): qlcnic: Rearrange ordering of header files inclusion Shahed Shaikh (5): qlcnic: Add support to enable capability to extend minidump for iSCSI qlcnic: Print firmware minidump buffer and template header addresses qlcnic: Add new VF device ID 0x8C30 qlcnic: Don't use kzalloc unncecessarily for allocating large chunk of memory qlcnic: Update version to 5.3.63 drivers/net/ethernet/qlogic/qlcnic/qlcnic.h| 19 + .../net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c| 31 ++- .../net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.h|2 + .../net/ethernet/qlogic/qlcnic/qlcnic_83xx_init.c |4 +- drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.c |6 +- drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.h |1 + drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 14 --- .../net/ethernet/qlogic/qlcnic/qlcnic_minidump.c | 41 ++-- drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov.h |3 +- .../ethernet/qlogic/qlcnic/qlcnic_sriov_common.c |3 +- .../net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c |3 +- drivers/net/ethernet/qlogic/qlcnic/qlcnic_sysfs.c |7 +-- 12 files changed, 102 insertions(+), 32 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 5/6] qlcnic: Don't use kzalloc unncecessarily for allocating large chunk of memory
From: Shahed Shaikh Driver allocates a large chunk of temporary buffer using kzalloc to copy FW image. As there is no real need of this memory to be physically contiguous, use vzalloc instead. Signed-off-by: Shahed Shaikh --- .../net/ethernet/qlogic/qlcnic/qlcnic_83xx_init.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_init.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_init.c index 753ea8b..bf89216 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_init.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_init.c @@ -1384,7 +1384,7 @@ static int qlcnic_83xx_copy_fw_file(struct qlcnic_adapter *adapter) size_t size; u64 addr; - temp = kzalloc(fw->size, GFP_KERNEL); + temp = vzalloc(fw->size); if (!temp) { release_firmware(fw); fw_info->fw = NULL; @@ -1430,7 +1430,7 @@ static int qlcnic_83xx_copy_fw_file(struct qlcnic_adapter *adapter) exit: release_firmware(fw); fw_info->fw = NULL; - kfree(temp); + vfree(temp); return ret; } -- 1.5.6 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 1/6] qlcnic: Rearrange ordering of header files inclusion
From: Harish Patil Include local headers files after kernel's header files. Signed-off-by: Harish Patil Signed-off-by: Shahed Shaikh --- drivers/net/ethernet/qlogic/qlcnic/qlcnic.h|2 -- .../net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c|5 +++-- .../net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.h|1 + drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.c |6 +++--- drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c |9 - .../net/ethernet/qlogic/qlcnic/qlcnic_minidump.c |4 ++-- drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov.h |3 ++- .../ethernet/qlogic/qlcnic/qlcnic_sriov_common.c |3 ++- .../net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c |3 ++- drivers/net/ethernet/qlogic/qlcnic/qlcnic_sysfs.c |7 +++ 10 files changed, 22 insertions(+), 21 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h b/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h index 055f376..c6dca5d 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h @@ -24,9 +24,7 @@ #include #include #include - #include - #include #include #include diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c index 840bf36..172192d 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c @@ -5,14 +5,15 @@ * See LICENSE.qlcnic for copyright and licensing details. */ -#include "qlcnic.h" -#include "qlcnic_sriov.h" #include #include #include #include #include +#include "qlcnic.h" +#include "qlcnic_sriov.h" + static void __qlcnic_83xx_process_aen(struct qlcnic_adapter *); static int qlcnic_83xx_clear_lb_mode(struct qlcnic_adapter *, u8); static void qlcnic_83xx_configure_mac(struct qlcnic_adapter *, u8 *, u8, diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.h b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.h index 69f828e..999a90e 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.h +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.h @@ -10,6 +10,7 @@ #include #include + #include "qlcnic_hw.h" #define QLCNIC_83XX_BAR0_LENGTH 0x4000 diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.c index 75ee9e4..509b596 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.c @@ -5,13 +5,13 @@ * See LICENSE.qlcnic for copyright and licensing details. */ -#include "qlcnic.h" -#include "qlcnic_hdr.h" - #include #include #include +#include "qlcnic.h" +#include "qlcnic_hdr.h" + #define MASK(n) ((1ULL<<(n))-1) #define OCM_WIN_P3P(addr) (addr & 0xffc) diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c index 7dbab3c..b714cee 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c @@ -7,11 +7,6 @@ #include #include - -#include "qlcnic.h" -#include "qlcnic_sriov.h" -#include "qlcnic_hw.h" - #include #include #include @@ -25,6 +20,10 @@ #include #endif +#include "qlcnic.h" +#include "qlcnic_sriov.h" +#include "qlcnic_hw.h" + MODULE_DESCRIPTION("QLogic 1/10 GbE Converged/Intelligent Ethernet Driver"); MODULE_LICENSE("GPL"); MODULE_VERSION(QLCNIC_LINUX_VERSIONID); diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_minidump.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_minidump.c index 332bb8a..6f33e2d 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_minidump.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_minidump.c @@ -5,13 +5,13 @@ * See LICENSE.qlcnic for copyright and licensing details. */ +#include + #include "qlcnic.h" #include "qlcnic_hdr.h" #include "qlcnic_83xx_hw.h" #include "qlcnic_hw.h" -#include - #define QLC_83XX_MINIDUMP_FLASH0x52 #define QLC_83XX_OCM_INDEX 3 #define QLC_83XX_PCI_INDEX 0 diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov.h b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov.h index 4677b2e..017d8c2 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov.h +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov.h @@ -8,10 +8,11 @@ #ifndef _QLCNIC_83XX_SRIOV_H_ #define _QLCNIC_83XX_SRIOV_H_ -#include "qlcnic.h" #include #include +#include "qlcnic.h" + extern const u32 qlcnic_83xx_reg_tbl[]; extern const u32 qlcnic_83xx_ext_reg_tbl[]; diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c index e631246..546cd5f 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c @@ -5,10 +5,11 @@ * See LICENSE.qlcnic for copyright and licensing details. */ +#include + #include "qlcn
Re: [PATCH 2/2] net, thunder, bgx: Add support for ACPI binding.
On 07.08.15 12:52:41, Tomasz Nowicki wrote: > On 07.08.2015 12:43, Robert Richter wrote: > >On 07.08.15 10:09:04, Tomasz Nowicki wrote: > >>On 07.08.2015 02:33, David Daney wrote: > > > >... > > > >>>+#else > >>>+ > >>>+static int bgx_init_acpi_phy(struct bgx *bgx) > >>>+{ > >>>+ return -ENODEV; > >>>+} > >>>+ > >>>+#endif /* CONFIG_ACPI */ > >>>+ > >>> #if IS_ENABLED(CONFIG_OF_MDIO) > >>> > >>> static int bgx_init_of_phy(struct bgx *bgx) > >>>@@ -882,7 +1010,12 @@ static int bgx_init_of_phy(struct bgx *bgx) > >>> > >>> static int bgx_init_phy(struct bgx *bgx) > >>> { > >>>- return bgx_init_of_phy(bgx); > >>>+ int err = bgx_init_of_phy(bgx); > >>>+ > >>>+ if (err != -ENODEV) > >>>+ return err; > >>>+ > >>>+ return bgx_init_acpi_phy(bgx); > >>> } > >>> > >> > >>If kernel can work with DT and ACPI (both compiled in), it should take only > >>one path instead of probing DT and ACPI sequentially. How about: > >> > >>@@ -902,10 +925,9 @@ static int bgx_probe(struct pci_dev *pdev, const struct > >>pci_device_id *ent) > >>bgx_vnic[bgx->bgx_id] = bgx; > >>bgx_get_qlm_mode(bgx); > >> > >>- snprintf(bgx_sel, 5, "bgx%d", bgx->bgx_id); > >>- np = of_find_node_by_name(NULL, bgx_sel); > >>- if (np) > >>- bgx_init_of(bgx, np); > >>+ err = acpi_disabled ? bgx_init_of_phy(bgx) : bgx_init_acpi_phy(bgx); > >>+ if (err) > >>+ goto err_enable; > >> > >>bgx_init_hw(bgx); > > > >I would not pollute bgx_probe() with acpi and dt specifics, and instead > >keep bgx_init_phy(). The typical design pattern for this is: > > > >static int bgx_init_phy(struct bgx *bgx) > >{ > >#ifdef CONFIG_ACPI > > if (!acpi_disabled) > > return bgx_init_acpi_phy(bgx); > >#endif > > return bgx_init_of_phy(bgx); > >} > > > >This adds acpi runtime detection (acpi=no), does not call dt code in > >case of acpi, and saves the #else for bgx_init_acpi_phy(). > > > > I am fine with keeping it in bgx_init_phy(), however we can drop there > #ifdefs since both of bgx_init_{acpi,of}_phy calls have empty stub for !ACPI > and !OF case. Like that: > > static int bgx_init_phy(struct bgx *bgx) > { > > if (!acpi_disabled) > return bgx_init_acpi_phy(bgx); > else > return bgx_init_of_phy(bgx); > } As said, keeping it in #ifdefs makes the empty stub function for !acpi obsolete, which makes the code smaller and better readable. This style is common practice in the kernel. Apart from that, the 'else' should be dropped as it is useless. -Robert -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] net, thunder, bgx: Add support for ACPI binding.
On 07.08.2015 12:43, Robert Richter wrote: On 07.08.15 10:09:04, Tomasz Nowicki wrote: On 07.08.2015 02:33, David Daney wrote: ... +#else + +static int bgx_init_acpi_phy(struct bgx *bgx) +{ + return -ENODEV; +} + +#endif /* CONFIG_ACPI */ + #if IS_ENABLED(CONFIG_OF_MDIO) static int bgx_init_of_phy(struct bgx *bgx) @@ -882,7 +1010,12 @@ static int bgx_init_of_phy(struct bgx *bgx) static int bgx_init_phy(struct bgx *bgx) { - return bgx_init_of_phy(bgx); + int err = bgx_init_of_phy(bgx); + + if (err != -ENODEV) + return err; + + return bgx_init_acpi_phy(bgx); } If kernel can work with DT and ACPI (both compiled in), it should take only one path instead of probing DT and ACPI sequentially. How about: @@ -902,10 +925,9 @@ static int bgx_probe(struct pci_dev *pdev, const struct pci_device_id *ent) bgx_vnic[bgx->bgx_id] = bgx; bgx_get_qlm_mode(bgx); - snprintf(bgx_sel, 5, "bgx%d", bgx->bgx_id); - np = of_find_node_by_name(NULL, bgx_sel); - if (np) - bgx_init_of(bgx, np); + err = acpi_disabled ? bgx_init_of_phy(bgx) : bgx_init_acpi_phy(bgx); + if (err) + goto err_enable; bgx_init_hw(bgx); I would not pollute bgx_probe() with acpi and dt specifics, and instead keep bgx_init_phy(). The typical design pattern for this is: static int bgx_init_phy(struct bgx *bgx) { #ifdef CONFIG_ACPI if (!acpi_disabled) return bgx_init_acpi_phy(bgx); #endif return bgx_init_of_phy(bgx); } This adds acpi runtime detection (acpi=no), does not call dt code in case of acpi, and saves the #else for bgx_init_acpi_phy(). I am fine with keeping it in bgx_init_phy(), however we can drop there #ifdefs since both of bgx_init_{acpi,of}_phy calls have empty stub for !ACPI and !OF case. Like that: static int bgx_init_phy(struct bgx *bgx) { if (!acpi_disabled) return bgx_init_acpi_phy(bgx); else return bgx_init_of_phy(bgx); } Tomasz -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 26/31] net/sched: use kmemdup rather than duplicating its implementation
On 08/07/2015 09:59 AM, Andrzej Hajda wrote: The patch was generated using fixed coccinelle semantic patch scripts/coccinelle/api/memdup.cocci [1]. [1]: http://permalink.gmane.org/gmane.linux.kernel/2014320 Signed-off-by: Andrzej Hajda Acked-by: Daniel Borkmann Not sure where the rest of this series went, but if you want this patch to be routed via net-next tree (which I recommend, to avoid cross tree conflicts), then you would need to send these patches separately, rebased to that tree, and also mention [PATCH net-next XX/YY] in the subject. Thanks, Daniel -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH V4 4/7] Drivers: hv: vmbus: add APIs to register callbacks to process hvsock connection
> -Original Message- > From: KY Srinivasan > Sent: Friday, August 7, 2015 2:28 > To: Dexuan Cui ; David Miller > Cc: o...@aepfle.de; gre...@linuxfoundation.org; jasow...@redhat.com; > driverdev-de...@linuxdriverproject.org; linux-ker...@vger.kernel.org; > step...@networkplumber.org; stefa...@redhat.com; netdev@vger.kernel.org; > a...@canonical.com; pebo...@tiscali.nl; dan.carpen...@oracle.com > Subject: RE: [PATCH V4 4/7] Drivers: hv: vmbus: add APIs to register > callbacks to > process hvsock connection > > > -Original Message- > > From: Dexuan Cui > > Sent: Wednesday, August 5, 2015 9:54 PM > > To: David Miller ; KY Srinivasan > > > > Cc: o...@aepfle.de; gre...@linuxfoundation.org; jasow...@redhat.com; > > driverdev-de...@linuxdriverproject.org; linux-ker...@vger.kernel.org; > > step...@networkplumber.org; stefa...@redhat.com; > > netdev@vger.kernel.org; a...@canonical.com; pebo...@tiscali.nl; > > dan.carpen...@oracle.com > > Subject: RE: [PATCH V4 4/7] Drivers: hv: vmbus: add APIs to register > > callbacks > > to process hvsock connection > > > > > From: devel [mailto:driverdev-devel-boun...@linuxdriverproject.org] On > > Behalf > > > Of Dexuan Cui > > > Sent: Thursday, July 30, 2015 18:20 > > > To: David Miller ; KY Srinivasan > > > > > Cc: o...@aepfle.de; gre...@linuxfoundation.org; jasow...@redhat.com; > > > driverdev-de...@linuxdriverproject.org; linux-ker...@vger.kernel.org; > > > step...@networkplumber.org; stefa...@redhat.com; > > netdev@vger.kernel.org; > > > a...@canonical.com; pebo...@tiscali.nl; dan.carpen...@oracle.com > > > Subject: RE: [PATCH V4 4/7] Drivers: hv: vmbus: add APIs to register > > callbacks to > > > process hvsock connection > > > > > > > From: David Miller > > > > Sent: Thursday, July 30, 2015 6:27 > > > > > > > > From: Dexuan Cui > > > > Date: Tue, 28 Jul 2015 05:35:11 -0700 > > > > > > > > > With the 2 APIs supplied by the VMBus driver, the coming net/hvsock > > driver > > > > > can register 2 callbacks and can know when a new hvsock connection is > > > > > offered by the host, and when a hvsock connection is being closed by > > the > > > > > host. > > > > > > > > > This is an extremely terrible interface. > > > > > > > > It's an opaque hook that allows on registry, and it's solve purpose > > > > is to allow a backdoor call into a foreign driver in another module. > > > > > > > > These are exactly the things we try to avoid. > > > > > > Hi David, > > > Thanks a lot for your reviewing and the suggestion! > > > > > > > Why not create a real abstraction where clients register an object, > > > > that can be contained as a sub-member inside of their own driver > > > > private, that provides the callback registry mechanism. > > > > Hi David, > > Can you please have a look at my below questions? > > > > I like your idea of a real abstraction. Your answer would definitely > > help me to implement that correctly. > > > > > Please pardon me for my inexperience. > > > Can you please be a bit more specific? > > > I guess maybe you're referencing a common design pattern in the driver > > > code, so an example in some existing driver would be the best. :-) > > > > > > "clients register an object " -- > > > does the "clients" mean the hvsock driver? > > > and the "object" means the 2 callbacks? > > > > > > IMHO, here the vmbus driver has to synchronously pass the 2 events > > > to the hvsock driver, so a "backdoor call into the hvsock driver" is > > > inevitable anyway? > > > > > > e.g., in the path vmbus_process_offer() -> hvsock_process_offer(), the > > > return value of the latter is important to the former, because on error > > > the former needs to clean up some internal states of the vmbus driver > > (that > > > is, the "goto err_deq_chan"). > > > > > > > > > > That way you can register multiple clients, do things like allow > > > > AF_PACKET capturing of vmbus traffic, etc. > > > > > > I thought AF_PACKET can only capture IP packetsor Ethernet frames. > > > Can it be used to capture AF_UNIX packet? > > > If yes, I suppose we can consider making it work for AF_HYPERV too, > > > if people ask for that. > > > > > Dexuan, > > The notion of a channel on Hyper-V has been mapped to a device on Linux and > the mechanism we have > had of notifying the driver of the creation of the channel was through > registering this device with the kernel > (vmbus_device_create). The first exception to this was when we introduced > multi-channel support that broke > the assumption of this one to one mapping between the channel and Linux > device. In the case of the sub-channels, > we handled the driver notification issue via the sub-channel callback that > the > driver registers at the point of > opening the channel. Perhaps we could make the sub-channel handling > mechanism more generic to handle the case > of VMSOCK as well? > > K. Y Good suggestion! Let me think this over and make a new patch. Thanks, -- Dexuan -- To unsubscribe from this list: send the line "unsubscribe
RE: [PATCH V4 7/7] Drivers: hv: vmbus: disable local interrupt when hvsock's callback is running
> From: KY Srinivasan > Sent: Friday, August 7, 2015 1:50 > To: Dexuan Cui ; David Miller > Cc: o...@aepfle.de; gre...@linuxfoundation.org; jasow...@redhat.com; > driverdev-de...@linuxdriverproject.org; linux-ker...@vger.kernel.org; > step...@networkplumber.org; stefa...@redhat.com; netdev@vger.kernel.org; > a...@canonical.com; pebo...@tiscali.nl; dan.carpen...@oracle.com > Subject: RE: [PATCH V4 7/7] Drivers: hv: vmbus: disable local interrupt when > hvsock's callback is running > > From: Dexuan Cui > > Sent: Wednesday, August 5, 2015 9:44 PM > > To: David Miller ; KY Srinivasan > > > > Cc: o...@aepfle.de; gre...@linuxfoundation.org; jasow...@redhat.com; > > driverdev-de...@linuxdriverproject.org; linux-ker...@vger.kernel.org; > > step...@networkplumber.org; stefa...@redhat.com; > > netdev@vger.kernel.org; a...@canonical.com; pebo...@tiscali.nl; > > dan.carpen...@oracle.com > > Subject: RE: [PATCH V4 7/7] Drivers: hv: vmbus: disable local interrupt when > > hvsock's callback is running > > > > > From: devel [mailto:driverdev-devel-boun...@linuxdriverproject.org] On > > Behalf > > > Of Dexuan Cui > > > Sent: Thursday, July 30, 2015 18:18 > > > To: David Miller ; KY Srinivasan > > > > > Cc: o...@aepfle.de; gre...@linuxfoundation.org; jasow...@redhat.com; > > > driverdev-de...@linuxdriverproject.org; linux-ker...@vger.kernel.org; > > > step...@networkplumber.org; stefa...@redhat.com; > > netdev@vger.kernel.org; > > > a...@canonical.com; pebo...@tiscali.nl; dan.carpen...@oracle.com > > > Subject: RE: [PATCH V4 7/7] Drivers: hv: vmbus: disable local interrupt > > when > > > hvsock's callback is running > > > > > > > From: David Miller > > > > Sent: Thursday, July 30, 2015 6:28 > > > > > From: Dexuan Cui > > > > > Date: Tue, 28 Jul 2015 05:35:30 -0700 > > > > > > > > > > In the SMP guest case, when the per-channel callback hvsock_events() > > is > > > > > running on virtual CPU A, if the guest tries to close the connection > > > > > on > > > > > virtual CPU B: we invoke vmbus_close() -> vmbus_close_internal(), > > > > > then we can have trouble: on B, vmbus_close_internal() will send IPI > > > > > reset_channel_cb() to A, trying to set channel->onchannel_callbackto > > NULL; > > > > > on A, if the IPI handler happens between > > > > > "if (channel->onchannel_callback != NULL)" and invoking > > > > > channel->onchannel_callback, we'll invoke a function pointer of NULL. > > > > > > > > > > This is why the patch is necessary. > > > > > > > > > Sorry, I do not accept that you must use conditional locking and/or > > > > IRQ disabling. > > > > > > > > Boil it down to what is necessary for the least common denominator, > > > > and use that unconditionally. > > > > > > Hi David, > > > Thanks for the comment! > > > > > > I agree with you it's not clean to use conditional IRQ disabling. > > > > > > Here I didn't use unconditionally IRQ disabling because the Hyper-V netvsc > > > and storvsc driver's vmbus event callbacks (i.e. netvsc_channel_cb() and > > > storvsc_on_channel_callback()) may take relatively long time (e.g., netvsc > > can > > > operate at a speed of 10Gb) and I think it's bad to disable IRQ for long > > > time > > > when the callbacks are running in a tasklet context, e.g., the Hyper-V > > > timer > > > can be affected: see vmbus_isr() -> hv_process_timer_expiration(). > > > > > > To resolve the race condition between vmbus_close_internal() and > > > process_chn_event() in SMP case, now I propose a new method: > > > > > > we can serialize the 2 paths by adding > > > tasklet_disable(hv_context.event_dpc[channel->target_cpu]) and > > > tasklet_enable(...) in vmbus_close_internal(). > > > > > > In this way, we need the least change and we can drop this patch. > > > > > > Please let me know your opinion. > > > > > > -- Dexuan > > > > Hi David, KY and all, > > > > May I know your opinion about my idea of adding tasklet_disable/enbable() > > in vmbus_close_internal() and dropping this patch? > > Sorry for the delayed response; I think this is a reasonable solution. Send > me the > patch. > > Regards, > > K. Y OK. Will do. Thanks, -- Dexuan -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH iproute2 -next] m_bpf: add frontend support for late binding
Frontend support for kernel commit a5c90b29e5cc ("act_bpf: properly support late binding of bpf action to a classifier"). Signed-off-by: Daniel Borkmann --- tc/m_bpf.c | 20 +++- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/tc/m_bpf.c b/tc/m_bpf.c index c51f44f..e1bb6a4 100644 --- a/tc/m_bpf.c +++ b/tc/m_bpf.c @@ -28,7 +28,7 @@ static const enum bpf_prog_type bpf_type = BPF_PROG_TYPE_SCHED_ACT; static void explain(void) { - fprintf(stderr, "Usage: ... bpf ...\n"); + fprintf(stderr, "Usage: ... bpf ... [ index INDEX ]\n"); fprintf(stderr, "\n"); fprintf(stderr, "BPF use case:\n"); fprintf(stderr, " bytecode BPF_BYTECODE\n"); @@ -49,6 +49,9 @@ static void explain(void) fprintf(stderr, "\n"); fprintf(stderr, "Where UDS_FILE points to a unix domain socket file in order\n"); fprintf(stderr, "to hand off control of all created eBPF maps to an agent.\n"); + fprintf(stderr, "\n"); + fprintf(stderr, "Where optionally INDEX points to an existing action, or\n"); + fprintf(stderr, "explicitly specifies an action index upon creation.\n"); } static void usage(void) @@ -64,6 +67,7 @@ static int parse_bpf(struct action_util *a, int *argc_p, char ***argv_p, struct rtattr *tail; struct tc_act_bpf parm = { 0 }; struct sock_filter bpf_ops[BPF_MAXINSNS]; + bool ebpf_fill = false, bpf_fill = false; bool ebpf = false, seen_run = false; const char *bpf_uds_name = NULL; const char *bpf_sec_name = NULL; @@ -148,11 +152,15 @@ opt_bpf: bpf_obj, bpf_sec_name); bpf_fd = ret; + ebpf_fill = true; } else { bpf_len = ret; + bpf_fill = true; } } else if (matches(*argv, "help") == 0) { usage(); + } else if (matches(*argv, "index") == 0) { + break; } else { if (!seen_run) goto opt_bpf; @@ -200,21 +208,15 @@ opt_bpf: } } - if ((!bpf_len && !ebpf) || (!bpf_fd && ebpf)) { - fprintf(stderr, "bpf: Bytecode needs to be passed\n"); - explain(); - return -1; - } - tail = NLMSG_TAIL(n); addattr_l(n, MAX_MSG, tca_id, NULL, 0); addattr_l(n, MAX_MSG, TCA_ACT_BPF_PARMS, &parm, sizeof(parm)); - if (ebpf) { + if (ebpf_fill) { addattr32(n, MAX_MSG, TCA_ACT_BPF_FD, bpf_fd); addattrstrz(n, MAX_MSG, TCA_ACT_BPF_NAME, bpf_name); - } else { + } else if (bpf_fill) { addattr16(n, MAX_MSG, TCA_ACT_BPF_OPS_LEN, bpf_len); addattr_l(n, MAX_MSG, TCA_ACT_BPF_OPS, &bpf_ops, bpf_len * sizeof(struct sock_filter)); -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] net: phy: select copper mode when Marvel 88e1111 in SGMII
From: Madalin Bucur For the Marvel 88e PHY only two SGMII modes are available, both allowing only SGMII to copper mode (with or without clock). SGMII to fiber mode is not supported. Make sure the fiber/copper registers selector bits are cleared for selecting copper mode. Signed-off-by: Madalin Bucur Signed-off-by: Shaohui Xie --- drivers/net/phy/marvell.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c index 3320a17..e6897b6 100644 --- a/drivers/net/phy/marvell.c +++ b/drivers/net/phy/marvell.c @@ -52,6 +52,7 @@ #define MII_M1011_PHY_SCR_MDI_X0x0020 #define MII_M1011_PHY_SCR_AUTO_CROSS 0x0060 +#define MII_M1145_PHY_EXT_ADDR_PAGE0x16 #define MII_M1145_PHY_EXT_SR 0x1b #define MII_M1145_PHY_EXT_CR 0x14 #define MII_M1145_RGMII_RX_DELAY 0x0080 @@ -552,6 +553,16 @@ static int m88e_config_init(struct phy_device *phydev) err = phy_write(phydev, MII_M_PHY_EXT_SR, temp); if (err < 0) return err; + + /* make sure copper is selected */ + err = phy_read(phydev, MII_M1145_PHY_EXT_ADDR_PAGE); + if (err < 0) + return err; + + err = phy_write(phydev, MII_M1145_PHY_EXT_ADDR_PAGE, + err & (~0xff)); + if (err < 0) + return err; } if (phydev->interface == PHY_INTERFACE_MODE_RTBI) { -- 2.1.0.27.g96db324 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] mac80211_hwsim: unregister genetlink family properly
During hwsim_init_netlink(), we should call genl_unregister_family() if failed on netlink_register_notifier() since the genetlink is already registered. Signed-off-by: Su Kang Yin --- drivers/net/wireless/mac80211_hwsim.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/mac80211_hwsim.c b/drivers/net/wireless/mac80211_hwsim.c index 99e873d..16d953e 100644 --- a/drivers/net/wireless/mac80211_hwsim.c +++ b/drivers/net/wireless/mac80211_hwsim.c @@ -3120,8 +3120,10 @@ static int hwsim_init_netlink(void) goto failure; rc = netlink_register_notifier(&hwsim_netlink_notifier); - if (rc) + if (rc) { + genl_unregister_family(&hwsim_genl_family); goto failure; + } return 0; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net] ipv6: don't reject link-local nexthop on other interface
48ed7b26faa7 ("ipv6: reject locally assigned nexthop addresses") is too strict; it rejects following corner-case: ip -6 route add default via fe80::1:2:3 dev eth1 [ where fe80::1:2:3 is assigned to a local interface, but not eth1 ] Fix this by restricting search to given device if nh is linklocal. Joint work with Hannes Frederic Sowa. Fixes: 48ed7b26faa7 ("ipv6: reject locally assigned nexthop addresses") Signed-off-by: Hannes Frederic Sowa Signed-off-by: Florian Westphal --- net/ipv6/route.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 6090969..9de4d2b 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1831,6 +1831,7 @@ int ip6_route_add(struct fib6_config *cfg) int gwa_type; gw_addr = &cfg->fc_gateway; + gwa_type = ipv6_addr_type(gw_addr); /* if gw_addr is local we will fail to detect this in case * address is still TENTATIVE (DAD in progress). rt6_lookup() @@ -1838,11 +1839,12 @@ int ip6_route_add(struct fib6_config *cfg) * prefix route was assigned to, which might be non-loopback. */ err = -EINVAL; - if (ipv6_chk_addr_and_flags(net, gw_addr, NULL, 0, 0)) + if (ipv6_chk_addr_and_flags(net, gw_addr, + gwa_type & IPV6_ADDR_LINKLOCAL ? + dev : NULL, 0, 0)) goto out; rt->rt6i_gateway = *gw_addr; - gwa_type = ipv6_addr_type(gw_addr); if (gwa_type != (IPV6_ADDR_LINKLOCAL|IPV6_ADDR_UNICAST)) { struct rt6_info *grt; -- 2.0.5 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] net/ipv4: inconsistent routing table
IMO, the routing decision is determined, given a specific routing table and local network the result MUST be determined, independence of how/what order the routing entry is added. Now there are two ways to configure the system resulting EXACTLY the same routing table and local addresses, but the routing decision is totally different. SAME routing table, DIFFERENT routing decision, there MUST be bugs in kernel. On Thu, Aug 6, 2015 at 3:43 PM, Alexander Duyck wrote: > On 08/06/2015 03:13 AM, Zang MingJie wrote: >> >> On Thu, Aug 6, 2015 at 1:45 AM, Alexander Duyck >> wrote: >>> >>> On 08/05/2015 02:06 AM, Daniel Borkmann wrote: [ please cc netdev ] On 08/05/2015 10:56 AM, Zang MingJie wrote: > > > Hi: > > I found a bug when remove an ip address which is referenced by a > routing > entry. > > step to reproduce: > > ip li add type dummy > ip li set dummy0 up > ip ad add 10.0.0.1/24 dev dummy0 > ip ad add 10.0.0.2/24 dev dummy0 >>> >>> >>> >>> Okay, so up to this point you have 2 addresses on the same subnet that >>> are >>> now on dummy0. >>> > ip ro add default via 10.0.0.2/24 >>> >>> >>> >>> This makes the default route go through 10.0.0.2. >>> > ip ad del 10.0.0.2/24 dev dummy0 >>> >>> >>> >>> Then you remove 10.0.0.2 from the local system, however since 10.0.0.1 is >>> on >>> the same subnet dummy0 would still be the correct interface to access >>> 10.0.0.2 it is just no longer local to the system. >>> > after deleting the secondary ip address, the routing entry still > pointing to 10.0.0.2 >>> >>> >>> >>> You didn't delete the default routing entry so why would you expect it to >>> change? All you did is remove 10.0.0.2 from the local system. I believe >>> the assumption is that 10.0.0.2 is still out there somewhere, it just >>> isn't >>> on the local system anymore. >> >> >> Yes, 10.0.0.2 is migrated to somewhere else > > > The address might have migrated, but the interface is still up and 10.0.0.1 > is still present on the same subnet. Because you made a local address the > default gateway the assumption is any routes not specifically called out on > other interfaces are directly accessible to this interface. > > The bug indicates that the kernel is doing something to make the table > inconsistent, but a default route that is a local interface address does > essentially the same thing. > >>> > # ip ro > default via 10.0.0.2 dev dummy0 > 10.0.0.0/24 dev dummy0 proto kernel scope link src 10.0.0.1 >>> >>> >>> >>> This matches up with what I would expect. 10.0.0.2 is the default >>> gateway >>> and it is accessible from dummy0 since 10.0.0.0/24 is accessible from >>> dummy0. >> >> >> This means 0.0.0.0/0 is accessible via 10.0.0.2 on the network of dummy0 > > > Yes, but at the time you specified it 10.0.0.2 was a local address which > belonged to dummy0. This means that dummy0 can access anything not > specified elsewhere via pretty much any address it wants. So it is > perfectly valid if it wants to use a source address of 10.0.0.1 to send > packets to 1.1.1.1 over dummy0. > >>> > but actually, kernel considers the default route is directly connected. > > # ip ro get 1.1.1.1 > 1.1.1.1 dev dummy0 src 10.0.0.1 > cache >>> >>> >>> >>> I'm not sure how you came to the "directly connected" conclusion. It is >>> still routing things out through 10.0.0.2 from 10.0.0.1. >>> >>> Maybe your example would work better if you used 10.0.0.1 and 10.0.1.1 >>> instead. Then I think you might be able to better see that when you >>> delete >>> the second address the route would be broken. >> >> >> No, it isn't. when ping 1.1.1.1, kernel will directly send arp request >> braodcast to 1.1.1.1, this is not what I expect. it should send arp >> request to 10.0.0.2, following should be the correct routing entry: >> >> # ip ro get 1.1.1.1 >> 1.1.1.1 via 10.0.0.2 dev dummy0 src 10.0.0.1 >> cache > > > I see what you are trying to say, but the example provided is a bit lacking. > Assuming you could ping 1.1.1.1 via dummy0 before with 10.0.0.2 as your > default gateway, that shouldn't change if 10.0.0.2 is migrated to another > address. That is, unless there is an issue on the system 10.0.0.2 was > migrated to. > > Now if I move away from using dummy interface and instead using a real > network interface things can get a bit more interesting. So if we follow > your example and use 2 different subnets on the two systems then pings > continue to work after we remove the addresses. However if we flip things a > bit and add the default route, and then the local address for the gateway > they don't. So something like below: > ip li set eth0 up > ip ad add 10.0.0.1/24 dev eth0 > ip ro add default via 10.0.0.2 > ip ad add 10.0.0.2/24 dev eth0 > > What you end up with is eth0 sending arp requests looking for 10.0.0.2 even > though it is a local address
[PATCH 16/31] net/cavium/liquidio: use kmemdup rather than duplicating its implementation
The patch was generated using fixed coccinelle semantic patch scripts/coccinelle/api/memdup.cocci [1]. [1]: http://permalink.gmane.org/gmane.linux.kernel/2014320 Signed-off-by: Andrzej Hajda --- drivers/net/ethernet/cavium/liquidio/octeon_device.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_device.c b/drivers/net/ethernet/cavium/liquidio/octeon_device.c index f67641a..8e23e3f 100644 --- a/drivers/net/ethernet/cavium/liquidio/octeon_device.c +++ b/drivers/net/ethernet/cavium/liquidio/octeon_device.c @@ -602,12 +602,10 @@ int octeon_download_firmware(struct octeon_device *oct, const u8 *data, snprintf(oct->fw_info.liquidio_firmware_version, 32, "LIQUIDIO: %s", h->version); - buffer = kmalloc(size, GFP_KERNEL); + buffer = kmemdup(data, size, GFP_KERNEL); if (!buffer) return -ENOMEM; - memcpy(buffer, data, size); - p = buffer + sizeof(struct octeon_firmware_file_header); /* load all images */ -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH iproute2] tipc: fix bearer get/set help synopsis
From: Richard Alpe One option is required for bearer set and bearer get. --- tipc/bearer.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/tipc/bearer.c b/tipc/bearer.c index 33295f9..30b54d9 100644 --- a/tipc/bearer.c +++ b/tipc/bearer.c @@ -412,7 +412,7 @@ static int cmd_bearer_disable(struct nlmsghdr *nlh, const struct cmd *cmd, static void cmd_bearer_set_help(struct cmdl *cmdl) { - fprintf(stderr, "Usage: %s bearer set [OPTIONS] media MEDIA ARGS...\n", + fprintf(stderr, "Usage: %s bearer set OPTION media MEDIA ARGS...\n", cmdl->argv[0]); _print_bearer_opts(); _print_bearer_media(); @@ -420,7 +420,7 @@ static void cmd_bearer_set_help(struct cmdl *cmdl) static void cmd_bearer_set_udp_help(struct cmdl *cmdl) { - fprintf(stderr, "Usage: %s bearer set [OPTIONS] media udp name NAME\n\n", + fprintf(stderr, "Usage: %s bearer set OPTION media udp name NAME\n\n", cmdl->argv[0]); _print_bearer_opts(); } @@ -528,7 +528,7 @@ static int cmd_bearer_set(struct nlmsghdr *nlh, const struct cmd *cmd, static void cmd_bearer_get_help(struct cmdl *cmdl) { - fprintf(stderr, "Usage: %s bearer get [OPTIONS] media MEDIA ARGS...\n", + fprintf(stderr, "Usage: %s bearer get OPTION media MEDIA ARGS...\n", cmdl->argv[0]); _print_bearer_opts(); _print_bearer_media(); @@ -536,7 +536,7 @@ static void cmd_bearer_get_help(struct cmdl *cmdl) static void cmd_bearer_get_udp_help(struct cmdl *cmdl) { - fprintf(stderr, "Usage: %s bearer get [OPTIONS] media udp name NAME\n\n", + fprintf(stderr, "Usage: %s bearer get OPTION media udp name NAME\n\n", cmdl->argv[0]); _print_bearer_opts(); } -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] net, thunder, bgx: Add support for ACPI binding.
On 07.08.2015 02:33, David Daney wrote: From: David Daney Find out which PHYs belong to which BGX instance in the ACPI way. Set the MAC address of the device as provided by ACPI tables. This is similar to the implementation for devicetree in of_get_mac_address(). The table is searched for the device property entries "mac-address", "local-mac-address" and "address" in that order. The address is provided in a u64 variable and must contain a valid 6 bytes-len mac addr. Based on code from: Narinder Dhillon Tomasz Nowicki Robert Richter Signed-off-by: Tomasz Nowicki Signed-off-by: Robert Richter Signed-off-by: David Daney --- drivers/net/ethernet/cavium/thunder/thunder_bgx.c | 137 +- 1 file changed, 135 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c index 615b2af..2056583 100644 --- a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c +++ b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c @@ -6,6 +6,7 @@ * as published by the Free Software Foundation. */ +#include #include #include #include @@ -26,7 +27,7 @@ struct lmac { struct bgx *bgx; int dmac; - unsigned char mac[ETH_ALEN]; + u8 mac[ETH_ALEN]; boollink_up; int lmacid; /* ID within BGX */ int lmacid_bd; /* ID on board */ @@ -835,6 +836,133 @@ static void bgx_get_qlm_mode(struct bgx *bgx) } } +#ifdef CONFIG_ACPI + +static int bgx_match_phy_id(struct device *dev, void *data) +{ + struct phy_device *phydev = to_phy_device(dev); + u32 *phy_id = data; + + if (phydev->addr == *phy_id) + return 1; + + return 0; +} + +static const char * const addr_propnames[] = { + "mac-address", + "local-mac-address", + "address", +}; + +static int acpi_get_mac_address(struct acpi_device *adev, u8 *dst) +{ + const union acpi_object *prop; + u64 mac_val; + u8 mac[ETH_ALEN]; + int i, j; + int ret; + + for (i = 0; i < ARRAY_SIZE(addr_propnames); i++) { + ret = acpi_dev_get_property(adev, addr_propnames[i], + ACPI_TYPE_INTEGER, &prop); + if (ret) + continue; + + mac_val = prop->integer.value; + + if (mac_val & (~0ULL << 48)) + continue; /* more than 6 bytes */ + + for (j = 0; j < ARRAY_SIZE(mac); j++) + mac[j] = (u8)(mac_val >> (8 * j)); + if (!is_valid_ether_addr(mac)) + continue; + + memcpy(dst, mac, ETH_ALEN); + + return 0; + } + + return ret ? ret : -EINVAL; +} + +static acpi_status bgx_acpi_register_phy(acpi_handle handle, +u32 lvl, void *context, void **rv) +{ + struct acpi_reference_args args; + const union acpi_object *prop; + struct bgx *bgx = context; + struct acpi_device *adev; + struct device *phy_dev; + u32 phy_id; + + if (acpi_bus_get_device(handle, &adev)) + goto out; + + SET_NETDEV_DEV(&bgx->lmac[bgx->lmac_count].netdev, &bgx->pdev->dev); + + acpi_get_mac_address(adev, bgx->lmac[bgx->lmac_count].mac); + + bgx->lmac[bgx->lmac_count].lmacid = bgx->lmac_count; + + if (acpi_dev_get_property_reference(adev, "phy-handle", 0, &args)) + goto out; + + if (acpi_dev_get_property(args.adev, "phy-channel", ACPI_TYPE_INTEGER, &prop)) + goto out; + + phy_id = prop->integer.value; + + phy_dev = bus_find_device(&mdio_bus_type, NULL, (void *)&phy_id, + bgx_match_phy_id); + if (!phy_dev) + goto out; + + bgx->lmac[bgx->lmac_count].phydev = to_phy_device(phy_dev); +out: + bgx->lmac_count++; + return AE_OK; +} + +static acpi_status bgx_acpi_match_id(acpi_handle handle, u32 lvl, +void *context, void **ret_val) +{ + struct acpi_buffer string = { ACPI_ALLOCATE_BUFFER, NULL }; + struct bgx *bgx = context; + char bgx_sel[5]; + + snprintf(bgx_sel, 5, "BGX%d", bgx->bgx_id); + if (ACPI_FAILURE(acpi_get_name(handle, ACPI_SINGLE_NAME, &string))) { + pr_warn("Invalid link device\n"); + return AE_OK; + } + + if (strncmp(string.pointer, bgx_sel, 4)) + return AE_OK; + + acpi_walk_namespace(ACPI_TYPE_DEVICE, handle, 1, + bgx_acpi_register_phy, NULL, bgx, NULL); + + kfree(string.pointer); + return AE_CTRL_TERMINATE; +} + +static int bgx_init_acpi_phy(struct bgx *bgx) +{ + acpi_get_de
[PATCH 26/31] net/sched: use kmemdup rather than duplicating its implementation
The patch was generated using fixed coccinelle semantic patch scripts/coccinelle/api/memdup.cocci [1]. [1]: http://permalink.gmane.org/gmane.linux.kernel/2014320 Signed-off-by: Andrzej Hajda --- net/sched/act_bpf.c | 4 +--- net/sched/cls_bpf.c | 4 +--- 2 files changed, 2 insertions(+), 6 deletions(-) diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c index 1b97dab..5c0fa03 100644 --- a/net/sched/act_bpf.c +++ b/net/sched/act_bpf.c @@ -190,12 +190,10 @@ static int tcf_bpf_init_from_ops(struct nlattr **tb, struct tcf_bpf_cfg *cfg) if (bpf_size != nla_len(tb[TCA_ACT_BPF_OPS])) return -EINVAL; - bpf_ops = kzalloc(bpf_size, GFP_KERNEL); + bpf_ops = kmemdup(nla_data(tb[TCA_ACT_BPF_OPS]), bpf_size, GFP_KERNEL); if (bpf_ops == NULL) return -ENOMEM; - memcpy(bpf_ops, nla_data(tb[TCA_ACT_BPF_OPS]), bpf_size); - fprog_tmp.len = bpf_num_ops; fprog_tmp.filter = bpf_ops; diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c index e5168f8..423f774 100644 --- a/net/sched/cls_bpf.c +++ b/net/sched/cls_bpf.c @@ -212,12 +212,10 @@ static int cls_bpf_prog_from_ops(struct nlattr **tb, if (bpf_size != nla_len(tb[TCA_BPF_OPS])) return -EINVAL; - bpf_ops = kzalloc(bpf_size, GFP_KERNEL); + bpf_ops = kmemdup(nla_data(tb[TCA_BPF_OPS]), bpf_size, GFP_KERNEL); if (bpf_ops == NULL) return -ENOMEM; - memcpy(bpf_ops, nla_data(tb[TCA_BPF_OPS]), bpf_size); - fprog_tmp.len = bpf_num_ops; fprog_tmp.filter = bpf_ops; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 27/31] net/tipc: use kmemdup rather than duplicating its implementation
The patch was generated using fixed coccinelle semantic patch scripts/coccinelle/api/memdup.cocci [1]. [1]: http://permalink.gmane.org/gmane.linux.kernel/2014320 Signed-off-by: Andrzej Hajda --- net/tipc/server.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/net/tipc/server.c b/net/tipc/server.c index 922e04a..c187cad 100644 --- a/net/tipc/server.c +++ b/net/tipc/server.c @@ -411,13 +411,12 @@ static struct outqueue_entry *tipc_alloc_entry(void *data, int len) if (!entry) return NULL; - buf = kmalloc(len, GFP_ATOMIC); + buf = kmemdup(data, len, GFP_ATOMIC); if (!buf) { kfree(entry); return NULL; } - memcpy(buf, data, len); entry->iov.iov_base = buf; entry->iov.iov_len = len; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 28/31] net/xfrm: use kmemdup rather than duplicating its implementation
The patch was generated using fixed coccinelle semantic patch scripts/coccinelle/api/memdup.cocci [1]. [1]: http://permalink.gmane.org/gmane.linux.kernel/2014320 Signed-off-by: Andrzej Hajda --- net/xfrm/xfrm_user.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c index 0cebf1f..a8de9e3 100644 --- a/net/xfrm/xfrm_user.c +++ b/net/xfrm/xfrm_user.c @@ -925,12 +925,10 @@ static int xfrm_dump_sa(struct sk_buff *skb, struct netlink_callback *cb) return err; if (attrs[XFRMA_ADDRESS_FILTER]) { - filter = kmalloc(sizeof(*filter), GFP_KERNEL); + filter = kmemdup(nla_data(attrs[XFRMA_ADDRESS_FILTER]), +sizeof(*filter), GFP_KERNEL); if (filter == NULL) return -ENOMEM; - - memcpy(filter, nla_data(attrs[XFRMA_ADDRESS_FILTER]), - sizeof(*filter)); } if (attrs[XFRMA_PROTO]) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next 00/15][pull request] Intel Wired LAN Driver Updates 2015-08-05
From: Jeff Kirsher Date: Wed, 5 Aug 2015 16:52:00 -0700 > This series contains updates to i40e, i40evf and e1000e. Pulled, thanks Jeff. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] net: ipv4: increase dhcp inter device timeout
Hi, Mugunthan V N wrote: > When a system has multiple ethernet devices and during DHCP > request (for using NFS), the system waits only for HZ/2 which is > 500mS before switching to another interface for DHCP. > > There are some routers (Ex: Trendnet routers) which responds to > DHCP request at about 560mS. When the system has only one > ethernet interface there is no issue as the timeout is 2S and the > dev xid doesn't change and only retries. > > But when the system has multiple Ethernet like DRA74x with CPSW > in dual EMAC mode, the DHCP response is dropped as the dev xid > changes while shifting to the next device. So changing inter > device timeout to HZ (which is 1S). > > Signed-off-by: Mugunthan V N > --- > net/ipv4/ipconfig.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c > index 8e7328c..bdb8cb5 100644 > --- a/net/ipv4/ipconfig.c > +++ b/net/ipv4/ipconfig.c > @@ -94,7 +94,7 @@ > /* Define the timeout for waiting for a DHCP/BOOTP/RARP reply */ > #define CONF_OPEN_RETRIES2 /* (Re)open devices twice */ > #define CONF_SEND_RETRIES6 /* Send six requests per open */ > -#define CONF_INTER_TIMEOUT (HZ/2) /* Inter-device timeout: 1/2 second */ > +#define CONF_INTER_TIMEOUT (HZ)/* Inter-device timeout: 1/2 second */ You should update comment as well at least. --yoshfuji > #define CONF_BASE_TIMEOUT(HZ*2) /* Initial timeout: 2 seconds */ > #define CONF_TIMEOUT_RANDOM (HZ)/* Maximum amount of randomization */ > #define CONF_TIMEOUT_MULT*7/4/* Rate of timeout growth */ > -- Hideaki Yoshifuji Technical Division, MIRACLE LINUX CORPORATION -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net V2] virtio-net: drop NETIF_F_FRAGLIST
From: Jason Wang Date: Wed, 5 Aug 2015 10:34:04 +0800 > virtio declares support for NETIF_F_FRAGLIST, but assumes > that there are at most MAX_SKB_FRAGS + 2 fragments which isn't > always true with a fraglist. > > A longer fraglist in the skb will make the call to skb_to_sgvec overflow > the sg array, leading to memory corruption. > > Drop NETIF_F_FRAGLIST so we only get what we can handle. > > Cc: Michael S. Tsirkin > Signed-off-by: Jason Wang Applied, thanks Jason. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] stmmac: dwmac-ipq806x: fix static checker warning
From: Mathieu Olivari Date: Tue, 4 Aug 2015 17:25:02 -0700 > The patch b1c17215d718: "stmmac: add ipq806x glue layer", leads to the > following static checker warning: > > .../stmmac/dwmac-ipq806x.c:314 ipq806x_gmac_probe() > warn: double left shift '1 << (1 << gmac->id)' > > The NSS_COMMON_CLK_SRC_CTRL_OFFSET macro is used once as an offset, and > once as a mask, which is a bug indeed. We'll fix it by defining the > offset as the real offset value and computing the mask from it when > required. > > Tested on IPQ806x ref designs AP148 & DB149. > > Reported-by: Dan Carpenter > Signed-off-by: Mathieu Olivari Applied. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: netcp: fix unused interface rx buffer size configuration
From: WingMan Kwok Date: Tue, 4 Aug 2015 16:56:53 -0400 > Prior to this patch, rx buffer size for each rx queue > of an interface is configurable through dts bindings. > But for an interface, the first rx queue's rx buffer > size is always the usual MTU size (plus usual overhead) > and page size for the remaining rx queues (if they are > enabled by specifying a non-zero rx queue depth dts > binding of the corresponding interface). This patch > removes the rx buffer size configuration capability. > > Signed-off-by: WingMan Kwok Applied. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] r8169: enforce RX_MULTI_EN on rtl8168ep/8111ep chips
From: Ivan Vecera Date: Tue, 4 Aug 2015 22:11:43 +0200 > Enforcing this flag in RxConfig for the mentioned chips fixes netdev > watchdog issues prepended with AMD IOMMU message(s) like: > AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x001d > address=0x3000 flags=0x0050] > > Note that this flag is also set in Realtek's own driver for these chips. > > Signed-off-by: Ivan Vecera > Tested-by: Alexander Lindqvist Applied. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html