Re: vethpair creation performance, 3.14 versus 4.2.0

2015-08-31 Thread David Ahern

On 8/31/15 1:48 PM, Rick Jones wrote:

My attempts to get a call-graph have been met with very limited success.
  Even though I've installed the dbg package from "make deb-pkg" the
symbol resolution doesn't seem to be working.


Looks like Debian does not enable framepointers by default:

$ grep FRAME /boot/config-3.2.0-4-amd64
...
# CONFIG_FRAME_POINTER is not set

Similar result for jessie.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vethpair creation performance, 3.14 versus 4.2.0

2015-08-31 Thread Rick Jones

On 08/31/2015 02:29 PM, David Ahern wrote:

On 8/31/15 1:48 PM, Rick Jones wrote:

My attempts to get a call-graph have been met with very limited success.
  Even though I've installed the dbg package from "make deb-pkg" the
symbol resolution doesn't seem to be working.


Looks like Debian does not enable framepointers by default:

$ grep FRAME /boot/config-3.2.0-4-amd64
...
# CONFIG_FRAME_POINTER is not set

Similar result for jessie.


And indeed, my config file has a Debian lineage.

rick

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vethpair creation performance, 3.14 versus 4.2.0

2015-08-31 Thread Eric Dumazet
On Mon, 2015-08-31 at 12:48 -0700, Rick Jones wrote:
> On 08/29/2015 10:59 PM, Raghavendra K T wrote:
>  > Please note that similar overhead was also reported while creating
>  > veth pairs  https://lkml.org/lkml/2013/3/19/556
> 
> 
> That got me curious, so I took the veth pair creation script from there, 
> and started running it out to 10K pairs, comparing a 3.14.44 kernel with 
> a 4.2.0-rc4+ from net-next and then net-next after pulling to get the 
> snmp stat aggregation perf change (4.2.0-rc8+).
> 
> Indeed, the 4.2.0-rc8+ kernel with the change was faster than the 
> 4.2.0-rc4+ kernel without it, but both were slower than the 3.14.44 kernel.
> 
> I've put a spreadsheet with the results at:
> 
> ftp://ftp.netperf.org/vethpair/vethpair_compare.ods
> 
> A perf top for the 4.20-rc8+ kernel from the net-next tree looks like 
> this out around 10K pairs:
> 
> PerfTop:   11155 irqs/sec  kernel:94.2%  exact:  0.0% [4000Hz 
> cycles],  (all, 32 CPUs)
> ---
> 
>  23.44%  [kernel]   [k] vsscanf
>   7.32%  [kernel]   [k] mutex_spin_on_owner.isra.4
>   5.63%  [kernel]   [k] __memcpy
>   5.27%  [kernel]   [k] __dev_alloc_name
>   3.46%  [kernel]   [k] format_decode
>   3.44%  [kernel]   [k] vsnprintf
>   3.16%  [kernel]   [k] acpi_os_write_port
>   2.71%  [kernel]   [k] number.isra.13
>   1.50%  [kernel]   [k] strncmp
>   1.21%  [kernel]   [k] _parse_integer
>   0.93%  [kernel]   [k] filemap_map_pages
>   0.82%  [kernel]   [k] put_dec_trunc8
>   0.82%  [kernel]   [k] unmap_single_vma
>   0.78%  [kernel]   [k] native_queued_spin_lock_slowpath
>   0.71%  [kernel]   [k] menu_select
>   0.65%  [kernel]   [k] clear_page
>   0.64%  [kernel]   [k] _raw_spin_lock
>   0.62%  [kernel]   [k] page_fault
>   0.60%  [kernel]   [k] find_busiest_group
>   0.53%  [kernel]   [k] snprintf
>   0.52%  [kernel]   [k] int_sqrt
>   0.46%  [kernel]   [k] simple_strtoull
>   0.44%  [kernel]   [k] page_remove_rmap
> 
> My attempts to get a call-graph have been met with very limited success. 
>   Even though I've installed the dbg package from "make deb-pkg" the 
> symbol resolution doesn't seem to be working.


Well, you do not need call graph to spot the well known issue with
__dev_alloc_name() which has O(N) behavior

If we really need to be fast here, and keep eth%d or veth%d names
with guarantee of lowest numbers, we would need an IDR




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html