Re: Network stack changes

2013-09-24 Thread Marko Zec
On Tuesday 24 September 2013 00:46:46 Sami Halabi wrote:
> Hi,
>
> > http://info.iet.unipi.it/~**luigi/papers/20120601-dxr.pdf >et.unipi.it/~luigi/papers/20120601-dxr.pdf>
> > http://www.nxlab.fer.hr/dxr/**stable_8_20120824.diff >er.hr/dxr/stable_8_20120824.diff>
>
> I've tried the diff in 10-current, applied cleanly but had errors
> compiling new kernel... is there any work to make it work? i'd love to
> test it.

Even if you'd make it compile on current, you could only run synthetic tests 
measuring lookup performance using streams of random keys, as outlined in 
the paper (btw. the paper at Luigi's site is an older draft, the final 
version with slightly revised benchmarks is available here:
http://www.sigcomm.org/sites/default/files/ccr/papers/2012/October/2378956-2378961.pdf)

I.e. the code only hooks into the routing API for testing purposes, but is 
completely disconnected from the forwarding path.

We have a prototype in the works which combines DXR with Netmap in userspace 
and is capable of sustaining well above line rate forwarding with 
full-sized BGP views using Intel 10G cards on commodity multicore machines.  
The work was somewhat stalled during the summer but I plan to wrap it up 
and release the code until the end of this year.  With recent advances in 
netmap it might also be feasible to merge DXR and netmap entirely inside 
the kernel but I've not explored that path yet...

Marko


> Sami
>
>
> On Sun, Sep 22, 2013 at 11:12 PM, Alexander V. Chernikov <
>
> melif...@yandex-team.ru> wrote:
> > On 29.08.2013 15:49, Adrian Chadd wrote:
> >> Hi,
> >
> > Hello Adrian!
> > I'm very sorry for the looong reply.
> >
> >> There's a lot of good stuff to review here, thanks!
> >>
> >> Yes, the ixgbe RX lock needs to die in a fire. It's kinda pointless to
> >> keep locking things like that on a per-packet basis. We should be able
> >> to do this in a cleaner way - we can defer RX into a CPU pinned
> >> taskqueue and convert the interrupt handler to a fast handler that
> >> just schedules that taskqueue. We can ignore the ithread entirely
> >> here.
> >>
> >> What do you think?
> >
> > Well, it sounds good :) But performance numbers and Jack opinion is
> > more important :)
> >
> > Are you going to Malta?
> >
> >> Totally pie in the sky handwaving at this point:
> >>
> >> * create an array of mbuf pointers for completed mbufs;
> >> * populate the mbuf array;
> >> * pass the array up to ether_demux().
> >>
> >> For vlan handling, it may end up populating its own list of mbufs to
> >> push up to ether_demux(). So maybe we should extend the API to have a
> >> bitmap of packets to actually handle from the array, so we can pass up
> >> a larger array of mbufs, note which ones are for the destination and
> >> then the upcall can mark which frames its consumed.
> >>
> >> I specifically wonder how much work/benefit we may see by doing:
> >>
> >> * batching packets into lists so various steps can batch process
> >> things rather than run to completion;
> >> * batching the processing of a list of frames under a single lock
> >> instance - eg, if the forwarding code could do the forwarding lookup
> >> for 'n' packets under a single lock, then pass that list of frames up
> >> to inet_pfil_hook() to do the work under one lock, etc, etc.
> >
> > I'm thinking the same way, but we're stuck with 'forwarding lookup' due
> > to problem with egress interface pointer, as I mention earlier. However
> > it is interesting to see how much it helps, regardless of locking.
> >
> > Currently I'm thinking that we should try to change radix to something
> > different (it seems that it can be checked fast) and see what happened.
> > Luigi's performance numbers for our radix are too awful, and there is a
> > patch implementing alternative trie:
> > http://info.iet.unipi.it/~**luigi/papers/20120601-dxr.pdf >et.unipi.it/~luigi/papers/20120601-dxr.pdf>
> > http://www.nxlab.fer.hr/dxr/**stable_8_20120824.diff >er.hr/dxr/stable_8_20120824.diff>
> >
> >> Here, the processing would look less like "grab lock and process to
> >> completion" and more like "mark and sweep" - ie, we have a list of
> >> frames that we mark as needing processing and mark as having been
> >> processed at each layer, so we know where to next dispatch them.
> >>
> >> I still have some tool coding to do with PMC before I even think about
> >> tinkering with this as I'd like to measure stuff like per-packet
> >> latency as well as top-level processing overhead (ie,
> >> CPU_CLK_UNHALTED.THREAD_P / lagg0 TX bytes/pkts, RX bytes/pkts, NIC
> >> interrupts on that core, etc.)
> >
> > That will be great to see!
> >
> >> Thanks,
> >>
> >>
> >>
> >> -adrian
> >
> > __**_
> > freebsd-...@freebsd.org mailing list
> > http://lists.freebsd.org/**mailman/listinfo/freebsd-net >eebsd.org/mailman/listinfo/freebsd-net> To unsubscribe, send any mail t

Re: VIMAGE crashes on 9.x with hotplug net80211 devices

2012-11-15 Thread Marko Zec
On Thursday 15 November 2012 20:32:06 Hans Petter Selasky wrote:
> On Thursday 15 November 2012 20:16:12 Adrian Chadd wrote:
> > Hans brings up a very good point for USB - they split if_alloc and
> > if_attach across two different threads.

Fine, so maybe one of the following options could work:

1) pass the vnet context embedded in some other already available struct 
when forwarding request from 1st to 2nd thread; or

2) if we can safely assume that device attach events can only occur in 
context of vnet0 (and I think we can), place a few CURVNET_SET(vnet0) 
macros wherever necessary in 2nd USB "attach" thread.

> > So this works for non-USB devices, but not for USB devices.

Could you post a sample backtrace for me to look at?

> > Hans, does each device implement its own workqueue for this kind of
> > delayed action, or is there some generic work queue that is doing this
> > work?
>
> Hi,
>
> I think a new thread is created for this stuff. It is inside the USB
> subsystem, but would consider this a big *hack* to add VNET specific
> stuff in there.
>
> Isn't it possible to have curvnet return "vnet0" when nothing else is
> set?

No!  This was discussed already at several ocassions, including earlier in 
this thread: with curvnet pointing by default to vnet0, it would be 
essentially impossible to detect, trace and debug leakages between vnets.

Marko
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: VIMAGE crashes on 9.x with hotplug net80211 devices

2012-11-15 Thread Marko Zec
On Thursday 15 November 2012 07:18:31 Adrian Chadd wrote:
> Hi,
>
> Here's what I have thus far. Please ignore the device_printf() change.
>
> This works for me, both for hotplug cardbus wireless devices as well
> as (inadvertently!) a USB bluetooth device.
>
> What do you think?

It looks that you've hit the right spot to set curvnet context in 
device_probe_and_attach().

Could you try out a slightly revised verstion (attached) - this one also 
removes now redundant curvnet setting from linker routines (kldload / 
kldunload), and adds a few extra bits which might be necessary for a 
broader range of drivers to work.

Note that I haven't tested this myself as I don't have a -CURRENT machine 
ATM, but a similar patch for 8.3 apparently works fine, though I don't have 
hotplugabble network cards to play with (neither cardbus nor USB)...

Cheers,

Marko
Index: sys/kern/subr_bus.c
===
--- sys/kern/subr_bus.c	(revision 243091)
+++ sys/kern/subr_bus.c	(working copy)
@@ -53,6 +53,8 @@
 #include 
 #include 
 
+#include 
+
 #include 
 
 #include 
@@ -2735,7 +2737,11 @@
 		return (0);
 	else if (error != 0)
 		return (error);
-	return (device_attach(dev));
+
+	CURVNET_SET_QUIET(vnet0);
+	error = device_attach(dev);
+	CURVNET_RESTORE();
+	return (error);
 }
 
 /**
Index: sys/kern/kern_linker.c
===
--- sys/kern/kern_linker.c	(revision 243091)
+++ sys/kern/kern_linker.c	(working copy)
@@ -53,8 +53,6 @@
 #include 
 #include 
 
-#include 
-
 #include 
 
 #include "linker_if.h"
@@ -1019,12 +1017,6 @@
 		return (error);
 
 	/*
-	 * It is possible that kldloaded module will attach a new ifnet,
-	 * so vnet context must be set when this ocurs.
-	 */
-	CURVNET_SET(TD_TO_VNET(td));
-
-	/*
 	 * If file does not contain a qualified name or any dot in it
 	 * (kldname.ko, or kldname.ver.ko) treat it as an interface
 	 * name.
@@ -1041,7 +1033,7 @@
 	error = linker_load_module(kldname, modname, NULL, NULL, &lf);
 	if (error) {
 		KLD_UNLOCK();
-		goto done;
+		return (error);
 	}
 	lf->userrefs++;
 	if (fileid != NULL)
@@ -1055,9 +1047,6 @@
 #else
 	KLD_UNLOCK();
 #endif
-
-done:
-	CURVNET_RESTORE();
 	return (error);
 }
 
@@ -1095,7 +1084,6 @@
 	if ((error = priv_check(td, PRIV_KLD_UNLOAD)) != 0)
 		return (error);
 
-	CURVNET_SET(TD_TO_VNET(td));
 	KLD_LOCK();
 	lf = linker_find_file_by_id(fileid);
 	if (lf) {
@@ -1137,7 +1125,6 @@
 #else
 	KLD_UNLOCK();
 #endif
-	CURVNET_RESTORE();
 	return (error);
 }
 
Index: sys/netgraph/bluetooth/socket/ng_btsocket.c
===
--- sys/netgraph/bluetooth/socket/ng_btsocket.c	(revision 243091)
+++ sys/netgraph/bluetooth/socket/ng_btsocket.c	(working copy)
@@ -46,6 +46,8 @@
 #include 
 #include 
 
+#include 
+
 #include 
 #include 
 #include 
@@ -285,4 +287,4 @@
 	return (error);
 } /* ng_btsocket_modevent */
 
-DOMAIN_SET(ng_btsocket_);
+VNET_DOMAIN_SET(ng_btsocket_);
Index: sys/net/if.c
===
--- sys/net/if.c	(revision 243091)
+++ sys/net/if.c	(working copy)
@@ -504,6 +504,7 @@
 
 	ifp->if_flags |= IFF_DYING;			/* XXX: Locking */
 
+	CURVNET_SET_QUIET(ifp->if_vnet);
 	IFNET_WLOCK();
 	KASSERT(ifp == ifnet_byindex_locked(ifp->if_index),
 	("%s: freeing unallocated ifnet", ifp->if_xname));
@@ -511,9 +512,9 @@
 	ifindex_free_locked(ifp->if_index);
 	IFNET_WUNLOCK();
 
-	if (!refcount_release(&ifp->if_refcount))
-		return;
-	if_free_internal(ifp);
+	if (refcount_release(&ifp->if_refcount))
+		if_free_internal(ifp);
+	CURVNET_RESTORE();
 }
 
 /*
@@ -793,7 +794,9 @@
 if_detach(struct ifnet *ifp)
 {
 
+	CURVNET_SET_QUIET(ifp->if_vnet);
 	if_detach_internal(ifp, 0);
+	CURVNET_RESTORE();
 }
 
 static void
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"

Re: VIMAGE crashes on 9.x with hotplug net80211 devices

2012-10-29 Thread Marko Zec
On Sunday 28 October 2012 19:47:20 Adrian Chadd wrote:
> ping?
>
> Marko - would you be willing to add the if_free() vnet context setup into
> -HEAD?

Feel free to do it - though I'd suggest to use the CURVNET_SET_QUIET() 
variant there, to reduce the console spam with VNET_DEBUG.

Marko


Index: if.c
===
--- if.c(revision 242304)
+++ if.c(working copy)
@@ -513,7 +513,9 @@
 
if (!refcount_release(&ifp->if_refcount))
return;
+   CURVNET_SET_QUIET(ifp->if_vnet);
if_free_internal(ifp);
+   CURVNET_RESTORE();
 }
 
 /*
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: VIMAGE crashes on 9.x with hotplug net80211 devices

2012-10-23 Thread Marko Zec
On Monday 22 October 2012 23:43:11 Adrian Chadd wrote:
> Hi,
>
> I don't mind tackling the net80211 clone detach path.
>
> I do mind how the default for hotplug is "argh, it doesn't work." :-)
>
> So I'd like to come up with something to fix the basic device detach,
> rather than having to actually add CURVNET_*() calls around each
> if_free() in each device detach method.

As already mentioned earlier, I don't terribly object if you'd place 
CURVNET_SET(ifp->if_vnet) inside if_free() and a limited number of similar 
functions, but I don't quite believe this is will enough to solve the 
device_detach() issue without having to touch any of the drivers...

Marko
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: VIMAGE crashes on 9.x with hotplug net80211 devices

2012-10-22 Thread Marko Zec
On Monday 22 October 2012 19:41:19 Adrian Chadd wrote:
> On 22 October 2012 10:29, Julian Elischer  wrote:
> >> The trouble is going to be handling unplug and kldunload events too.
> >> Does curvnet -> vnet0 during kldunload events?
> >
> > I think in unload events we probably need to cycle through all vnets
> > and do individual shutdowns of  anything that is set up on that vnet..
> > (but I'm not reading the code to say that, it's possible to ignore me
> > safely)
>
> Well, in an unload event you know the device you're unloading.
> However, there may be clones and such involved. It's not like a
> kldunload will kill a specific VAP on an ath(4) interface, it'll kill
> the whole interface with all vaps.
>
> So in net80211 I need to teach the VAP setup/destroy path to use
> CURVNET_*() correctly. That's a given.
>
> I still however need to ensure that
> CURVNET_SET(vnet0)/CURVNET_RESTORE() is used around the device
> attach/detach, as right now the hotplug code doesn't do this.
>
> So Marko:
>
> * Given that you've "fixed" the kldload path and bootup path to set
> CURVNET_SET(vnet0) as a special case, how about we teach the
> device_attach() path to just do this in general?

While it's true that the kldunload path (most probably) does 
CURVNET_SET(vnet0), this is obviously just a kludge which works on pure 
luck, i.e. only when ifnets to be detached live inside vnet0.

> * How does kldunload work right now if any devices are in a vnet?

It (most probably) doesn't.

> If I 
> kldunload if_bridge with vnets everywhere, what happens? if_bridge
> doesn't at all know anything about VIMAGE. How do the cloned
> interfaces get correctly destroyed?

Haven't tried this out recently, really, though bz@ maintained a patch for a 
while which specifically targetted VNET issues with cloner ifnets, but I 
don't know the current status of that work...

> I don't want to have to teach _every network device_ that they need to
> be vnet aware on attach or detach.
>
> * the device probe/attach path should just use vnet0; and

Right.

> * the device detach/destroy path, to things like if_free(), should
> have those functions just use ifp->if_vnet, rather than assuming
> CURVNET_SET() was called.

How many functions like if_free() are we talking about here?  If only a few 
would need to be extended to do a CURVNET_SET(ifp->if_vnet), that doesn't 
sound like too big an issue, though I'm not completely convinced that such 
an approach could guarantee that every driver would survive hotunplugging 
with vnets.  Still, that would be an improvement over what we have right 
now.

> I know you wanted to be warned if parts of the stack weren't correctly
> using CURVNET_SET()/CURVNET_RESTORE(), but I think this battle is
> already lost. :/

It is absolutely critical that, at minimum, we always completely unwind the 
VNET stack when exiting the networking code, otherwise we risk to continue 
running with a fully random implicit curvnet context.  As many of the 
networking subsystems or code paths are still not VNET-friendly, entering 
any of those on a VIMAGE kernel should lead to panics, not to obscure and 
silent inter-vnet leakages which may become a nightmare to nail down.

OTOH, avoiding excessive recursions on curvnet remains an effort similar to 
our style(9) - if you don't stick to it to the letter, things will still 
work, but some code paths may become more difficult to debug when things go 
wrong...  Plus, keep in mind that every CURVNET_SET() consumes a few CPU 
cycles here and there, and requires a few extra bytes on the stack...

Marko
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: VIMAGE crashes on 9.x with hotplug net80211 devices

2012-10-22 Thread Marko Zec
On Monday 22 October 2012 01:03:19 Adrian Chadd wrote:
...
> > Obviously, handling device attach events is an exception from this
> > rule, and up to this date this was never properly addressed...
>
> *laugh*.
>
> The problem now is figuring out how to do it without modifying all the
> drivers.
>
> The attach is easy - I can likely set it up during the device_attach()
> pass. I can do that, but it's enforcing "networking-ness" with the
> device attach, which will be called for networking and non-networking
> devices alike.
>
> However detach isn't easy - because I'm required to call
> CURVNET_SET(ifp->if_vnet) and CURVNET_RESTORE() around if_free(), and
> if_free() is called in the device specific detach() routine, I can't
> easily set the current VNET context from outside the driver.
>
> I _guess_ I could device_attach() to use CURVNET_SET(vnet0) but
> device_detach() can't do the same - it doesn't "know" about the
> networking-ness of the device.
>
> I'm open to other suggestions.

The only option I can think of now is to update all of the hotunpluggable 
device_detach() handlers to do CURVNET_SET(ifp->if_vnet) before calling 
further down into the networking stack, because as you already observed, 
whatever triggers a device_detach() handler is not aware of the nature of 
the driver.

> (how the hell does this work for devices attached at probe time? What
> vnet context do they have, and why doesn't the kernel panic there?)

Because at boot / autoconfiguration time curvnet is implicitly set to vnet0 
between SI_SUB_VNET and SI_SUB_VNET_DONE (i.e. before going SMP).

Similarly, curvnet is set to vnet0 during kldload events.

Marko
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: VIMAGE crashes on 9.x with hotplug net80211 devices

2012-10-21 Thread Marko Zec
On Sunday 21 October 2012 21:50:21 Adrian Chadd wrote:
> On 21 October 2012 12:36, Marko Zec  wrote:
> > The right approach would be to do a single CURVNET_SET(vnet0) /
> > CURVNET_RESTORE() somewhere near the root of the call graph being
> > triggered by the hotplug attach event.  Not having any hotpluggable
> > hardware at hand I cannot be more specific where that place could be...
>
> Right; would that be at the net80211 side, or something higher up (eg
> at device_attach, which gets called from the cardbus/pci bridge
> enumeration code.)

As high as it gets - if you get lucky, as a side effect you might even fix 
similar issues with USB hotplugging.

> > But most certainly doing CURVNET_SET(vnet0) on detach events would be
> > wrong: since ifnets may be assignet to non-default vnets,
> > CURVNET_SET(ifp->if_vnet) should be more appropriate there.
>
> Thanks for that. I'll look at adding that in my next debug pass.
>
> > Another thing that may help could be turning on options VNET_DEBUG
> > when, as that should reveal excessive (and probably redundant)
> > CURVNET_SET() recursions.
>
> I've spotted a couple, however the crashing here is the important bit.
> :-)
>
> So - why is it that the V_* variables are NULL pointers at this stage?
> I thought the kernel would've been running with a default vnet context
> of vnet0? Why doesn't this impact other network device hotplugging? Or
> does it, and noone noticed?

By design, the kernel is never running "by default" in any of the vnets 
(vnet0 included).  If it were, it would be extremely difficult to spot and 
catch many cases where a subsystem would be (implicitly) working with 
vnet0, while in fact it should be working in a different vnet context.

Obviously, handling device attach events is an exception from this rule, and 
up to this date this was never properly addressed...

Marko
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: VIMAGE crashes on 9.x with hotplug net80211 devices

2012-10-21 Thread Marko Zec
On Sunday 21 October 2012 21:04:41 Adrian Chadd wrote:
> Hi all,
>
> I have some crashes in the VIMAGE code on releng_9. Specifically, when
> I enable VIMAGE and then hotplug some cardbus ath(4) NICs.
>
> The panics are dereferencing the V_ ifindex and related fields.
>
> If I start adding CURVNET_SET(vnet0) and CURVNET_RESTORE() around the
> ifnet calls (attach, detach) then things stop panicing - however,
> things are slightly more complicated than that.
>
> Since it's possible that the cloned interfaces (and maybe the parent
> interface?) are placed into other VNETs, I have to make sure that the
> right vnet context is switched to before I free interfaces.
>
> So, may I please have some help by some VIMAGE-cluey people to sort
> out how to _properly_ get VIMAGE up on net80211? I'd like to fix this
> in -HEAD and -9 so people are able to use VIMAGEs for hostapd
> interfaces (and so I can abuse it for lots of local testing on a
> single laptop.)

The right approach would be to do a single CURVNET_SET(vnet0) / 
CURVNET_RESTORE() somewhere near the root of the call graph being triggered 
by the hotplug attach event.  Not having any hotpluggable hardware at hand 
I cannot be more specific where that place could be...

But most certainly doing CURVNET_SET(vnet0) on detach events would be wrong: 
since ifnets may be assignet to non-default vnets, 
CURVNET_SET(ifp->if_vnet) should be more appropriate there.

Another thing that may help could be turning on options VNET_DEBUG when, as 
that should reveal excessive (and probably redundant) CURVNET_SET() 
recursions.

Hope this helps,

Marko
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: superpages and kmem on amd64

2012-05-20 Thread Marko Zec
On Monday 21 May 2012 01:12:01 Alan Cox wrote:
...
> >>> BTW, apparently malloc(size, M_TEMP, M_NOWAIT) requests fail for size>
> >>> 1G, even at boot time.  Any ideas how to circumvent that (8.3-STABLE,
> >>> amd64, 4G physical RAM)?
> >>
> >> I suspect that you need to increase the size of your kmem map.
> >
> > Huh any hints how should I achieve that?  In desperation I placed
> >
> > vm.kmem_size=8G
> >
> > in /boot/loader.conf and got this:
> >
> > vm.kmem_map_free: 8123924480
> > vm.kmem_map_size: 8364032
> > vm.kmem_size_scale: 1
> > vm.kmem_size_max: 329853485875
> > vm.kmem_size_min: 0
> > vm.kmem_size: 8132288512
> >
> > but malloc(2G) still fails...
>
> Here is at least one reason why it fails:
>
> void *
> uma_large_malloc(int size, int wait)
>
> Note the type of "size".  Can you malloc 1GB?

Uff, good catch...  malloc(1G) works, malloc(1.99G) works, malloc(2G) doesn't!

Anyhow, malloc(1G) is big enough for what I want to do ATM, I was just curious 
why it breaks with bigger requests.

Thanks,

Marko
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: superpages and kmem on amd64

2012-05-20 Thread Marko Zec
On Sunday 20 May 2012 19:34:26 Alan Cox wrote:
...
> > In any case, I wish to be certain that a particular kmem virtual address
> > range is mapped to superpages - how can I enforce that at malloc time,
> > and / or find out later if I really got my kmem mapped to superpages? 
> > Perhaps vm_map_lookup() could provide more info, but I'm wondering if
> > someone already wrote a wrapper function for that, which takes only the
> > base virtual address as a single argument?
>
> Try using pmap_mincore() to verify that the mappings are superpages.

flags = pmap_mincore(vmspace_pmap(curthread->td_proc->p_vmspace), 
(vm_offset_t) addr));

OK, that works, and now I know my kmem chunk is on a superpage, horray!!!  
Thanks!

> > BTW, apparently malloc(size, M_TEMP, M_NOWAIT) requests fail for size> 
> > 1G, even at boot time.  Any ideas how to circumvent that (8.3-STABLE,
> > amd64, 4G physical RAM)?
>
> I suspect that you need to increase the size of your kmem map.

Huh any hints how should I achieve that?  In desperation I placed

vm.kmem_size=8G

in /boot/loader.conf and got this:

vm.kmem_map_free: 8123924480
vm.kmem_map_size: 8364032
vm.kmem_size_scale: 1
vm.kmem_size_max: 329853485875
vm.kmem_size_min: 0
vm.kmem_size: 8132288512

but malloc(2G) still fails...

Thanks,

Marko
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: superpages and kmem on amd64

2012-05-20 Thread Marko Zec
On Sunday 20 May 2012 09:25:59 Alan Cox wrote:
> On Sun, May 20, 2012 at 2:01 AM, Marko Zec  wrote:
> > Hi all,
> >
> > I'm playing with an algorithm which makes use of large contiguous blocks
> > of kernel memory (ranging from 1M to 1G in size), so it would be nice if
> > those could be somehow forcibly mapped to superpages.  I was hoping that
> > the VM system would automagically map (merge) contiguous 4k pages to
> > superpages, but
> > apparently it doesn't:
> >
> > vm.pmap.pdpe.demotions: 2
> > vm.pmap.pde.promotions: 543
> > vm.pmap.pde.p_failures: 266253
> > vm.pmap.pde.mappings: 0
> > vm.pmap.pde.demotions: 31
>
> No, your conclusion is incorrect.  These counts show that 543 superpage
> mappings were created by promotion.

OK, that sounds promising.  Does "created by promotion" count reflect 
historic / cumulative stats, or is vm.pmap.pde.promotions the actual number 
of superpages active?  Or should we subtract vm.pmap.pde.demotions from it to 
get the current value?

In any case, I wish to be certain that a particular kmem virtual address range 
is mapped to superpages - how can I enforce that at malloc time, and / or 
find out later if I really got my kmem mapped to superpages?  Perhaps 
vm_map_lookup() could provide more info, but I'm wondering if someone already 
wrote a wrapper function for that, which takes only the base virtual address 
as a single argument?

BTW, apparently malloc(size, M_TEMP, M_NOWAIT) requests fail for size > 1G, 
even at boot time.  Any ideas how to circumvent that (8.3-STABLE, amd64, 4G 
physical RAM)?

Thanks,

Marko
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


superpages and kmem on amd64

2012-05-20 Thread Marko Zec
Hi all,

I'm playing with an algorithm which makes use of large contiguous blocks of 
kernel memory (ranging from 1M to 1G in size), so it would be nice if those 
could be somehow forcibly mapped to superpages.  I was hoping that the VM 
system would automagically map (merge) contiguous 4k pages to superpages, but 
apparently it doesn't:

vm.pmap.pdpe.demotions: 2
vm.pmap.pde.promotions: 543
vm.pmap.pde.p_failures: 266253
vm.pmap.pde.mappings: 0
vm.pmap.pde.demotions: 31

I.e. I have 1G of kmem allocated using via 

malloc(1024 * 1024 * 1024, M_TEMP, M_NOWAIT);

but vm.pmap.pde.mappings: 0 suggests that no superpages are in use.

Is there an alternative kernel memory allocation method which might force 
superpages to be used for contiguous memory blocks?  And how do I find more 
details about page mappings for a given kmem virtual address?  I'm running 
8.3-STABLE on amd64.

Thanks,

Marko
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Multiple IP Jail's patch for FreeBSD 6.2

2007-05-16 Thread Marko Zec
On Wednesday 16 May 2007 09:32:37 Chris wrote:
> On 16/05/07, Marko Zec <[EMAIL PROTECTED]> wrote:
> > On Monday 14 May 2007 22:47:57 Andre Oppermann wrote:
> > > Julian Elischer wrote:
> > > > Bjoern A. Zeeb wrote:
> > > >> On Mon, 14 May 2007, Ed Schouten wrote:
> > > >>
> > > >> Hi,
> > > >>
> > > >>> * Andre Oppermann <[EMAIL PROTECTED]> wrote:
> > > >>>>  I'm working on a "light" variant of multi-IPv[46] per jail.
> > > >>>>  It doesn't
> > > >>>>  create an entirely new network instance per jail and
> > > >>>> probably is more suitable for low- to mid-end (virtual)
> > > >>>> hosting.  In those cases you normally want the host
> > > >>>> administrator to excercise full control over IP address and
> > > >>>> firewall configuration of the individual jails.  For
> > > >>>> high-end stuff where you offer jail based virtual machines
> > > >>>> or network and routing simulations Marco's work is more
> > > >>>> appropriate.
> > > >>>
> > > >>> Is there a way for us to colaborate on this? I'd really love
> > > >>> to work on this sort of stuff and I think it's really
> > > >>> interesting to dig in that sort of code.
> > > >>>
> > > >>> I already wrote an initial patch which changes the system
> > > >>> call and sysctl format of the jail structures which allow you
> > > >>> to specify lists of addresses for IPv4 and IPv6.
> > > >
> > > > talk with Marko Zec about "immunes".
> > > >
> > > > http://www.tel.fer.hr/zec/vimage/
> > > > and http://www.tel.fer.hr/imunes/
> > > >
> > > > It has a complete virtualized stack for each jail.
> > > > ipfw, routing table, divert sockets, sysctls, statistics,
> > > > netgraph etc.
> > >
> > > Like I said there is a place for both approaches and they are
> > > complementary.  A couple of hosting ISPs I know do not want to
> > > give a full virtualized stack to their customers.  They want to
> > > retain full control over the network configuration inside and
> > > outside of the jail.  In those (mass-hosting) cases it is done
> > > that way to ease support (less stuff users can fumble) and to
> > > properly position those products against full virtual machines
> > > and dedicated servers.  Something like this: jail < vimage <
> > > virtual machine < dedicated server.
> >
> > You're right we shouldn't look at virtualized stack as a
> > replacement for jails.  Every approach has its niche and use.
> >
> > > > He as a set of patches against 7-current that now implements
> > > > nearly all the parts you need. It Will be discussed at the
> > > > devsummit on Wed/Thurs and we'll be discussing whether it is
> > > > suitable for general inclusion or to be kept as patches. Note,
> > > > it can be compiled out, which leaves a pretty much binarily
> > > > compatible OS, so I personally would like to see it included.
> > >
> > > I don't think it is mature enough for inclusion into the upcoming
> > > 7.0R.  Not enough integration time.  Food for FreeBSD 8.0.
> >
> > Even not knowing how far exactly 7.0 is from being frozen and
> > entering the release process, I'd agree with your point - the stack
> > virtualization prototype for -CURRENT is still far from being ready
> > for prime time.  The fact that the patchsets I maintained for 4.11
> > were quite stable is of little significance now, given that the
> > -CURRENT prototype is a from-scratch implementation of the same
> > idea but using slightly different tricks, and of course the FreeBSD
> > code base has evolved tremendeously over the years.  What the
> > prototype does demonstrate at this point however, is that the
> > changes can be made to optionaly compile, that they should work
> > fine on a multithreaded / SMP kernel, and that all this can be
> > accomplished with relatively less churn to the existing code
> > compared to what was done in 4.11 days. Knowing that I had a
> > machine running a virtualized -CURRENT kernel under different kinds
> > of workloads for over a month without a glitch might be considered
> > encouranging but nothing spectacular...
> >
&

Re: Multiple IP Jail's patch for FreeBSD 6.2

2007-05-16 Thread Marko Zec
On Monday 14 May 2007 22:47:57 Andre Oppermann wrote:
> Julian Elischer wrote:
> > Bjoern A. Zeeb wrote:
> >> On Mon, 14 May 2007, Ed Schouten wrote:
> >>
> >> Hi,
> >>
> >>> * Andre Oppermann <[EMAIL PROTECTED]> wrote:
> >>>>  I'm working on a "light" variant of multi-IPv[46] per jail.  It
> >>>> doesn't
> >>>>  create an entirely new network instance per jail and probably
> >>>> is more suitable for low- to mid-end (virtual) hosting.  In
> >>>> those cases you normally want the host administrator to
> >>>> excercise full control over IP address and firewall
> >>>> configuration of the individual jails.  For high-end stuff where
> >>>> you offer jail based virtual machines or network and routing
> >>>> simulations Marco's work is more appropriate.
> >>>
> >>> Is there a way for us to colaborate on this? I'd really love to
> >>> work on this sort of stuff and I think it's really interesting to
> >>> dig in that sort of code.
> >>>
> >>> I already wrote an initial patch which changes the system call
> >>> and sysctl format of the jail structures which allow you to
> >>> specify lists of addresses for IPv4 and IPv6.
> >
> > talk with Marko Zec about "immunes".
> >
> > http://www.tel.fer.hr/zec/vimage/
> > and http://www.tel.fer.hr/imunes/
> >
> > It has a complete virtualized stack for each jail.
> > ipfw, routing table, divert sockets, sysctls, statistics, netgraph
> > etc.
>
> Like I said there is a place for both approaches and they are
> complementary.  A couple of hosting ISPs I know do not want to
> give a full virtualized stack to their customers.  They want to
> retain full control over the network configuration inside and
> outside of the jail.  In those (mass-hosting) cases it is done
> that way to ease support (less stuff users can fumble) and to
> properly position those products against full virtual machines
> and dedicated servers.  Something like this: jail < vimage <
> virtual machine < dedicated server.

You're right we shouldn't look at virtualized stack as a replacement for 
jails.  Every approach has its niche and use.

> > He as a set of patches against 7-current that now implements nearly
> > all the parts you need. It Will be discussed at the devsummit on
> > Wed/Thurs and we'll be discussing whether it is suitable for
> > general inclusion or to be kept as patches. Note, it can be
> > compiled out, which leaves a pretty much binarily compatible OS, so
> > I personally would like to see it included.
>
> I don't think it is mature enough for inclusion into the upcoming
> 7.0R.  Not enough integration time.  Food for FreeBSD 8.0.

Even not knowing how far exactly 7.0 is from being frozen and entering 
the release process, I'd agree with your point - the stack 
virtualization prototype for -CURRENT is still far from being ready for 
prime time.  The fact that the patchsets I maintained for 4.11 were 
quite stable is of little significance now, given that the -CURRENT 
prototype is a from-scratch implementation of the same idea but using 
slightly different tricks, and of course the FreeBSD code base has 
evolved tremendeously over the years.  What the prototype does 
demonstrate at this point however, is that the changes can be made to 
optionaly compile, that they should work fine on a multithreaded / SMP 
kernel, and that all this can be accomplished with relatively less 
churn to the existing code compared to what was done in 4.11 days.  
Knowing that I had a machine running a virtualized -CURRENT kernel 
under different kinds of workloads for over a month without a glitch 
might be considered encouranging but nothing spectacular...

OTOH, even if we miss the window for sneaking this into 7.0-R, it would 
be a huge pitty not to at least reserve a few additional fields in 
various kernel structures needed to support stack virtualization.  That 
way it would be possible to maintain a virtualized 7.0-R kernel in a 
separate code branch, which could be used as a snap-in replacement for 
the stock kernel even after API / ABI freeze comes into effect.  This 
would allow us to give people an opportunity to conveniently test and 
play with the new framework on an otherwise production-grade OS, while 
continuing work towards (hopefully) merging of the chages into 8.0 at 
some point.

Cheers,

Marko

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Multiple Ips in Jail

2003-12-09 Thread Marko Zec
On Tuesday 09 December 2003 20:42, Mooneer Salem wrote:
> Hello,
>
> On Mon, 2003-12-08 at 23:39, Mike K wrote:
> > Anyone successfully get this to work in a working environment, not
> > just install?
>
> I'm thinking that this would be a better solution for a virtual
> hosting environment (once stabilized):
> http://www.tel.fer.hr/zec/BSD/vimage/. In any case, I've gotten it
> working under VMWare using 5.0-RELEASE and am able to access the
 ^^^

Was this a typo or have you really succeeded in porting the patch to the 
5.x branch? The original diffs are against 4.9-R, and when applied 
against -CURRENT they yield almost a complete reject.

Cheers,

Marko

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: PATCH: Pentium-M deeper sleep support in idle loop

2003-10-17 Thread Marko Zec
On Friday 17 October 2003 11:23, Ducrot Bruno wrote:
> In case of P-M (Banias), speedstep does work differently, and this will not
> work. Be sure to disable speedstep stuff in such case (or implement it).

True, the new "Centrino" CPUs are equipped with a slightly different speedstep 
control model, but have in mind that the speedstep support was only of 
marginal importance in my patch, as clearly stated in the original post.

The main purpose of the patch is enabling deeper sleep mode in the idle loop, 
which is a completely independent feature from the speedstep. Furthermore, it 
should work across all -M pentium models in combination with ICH3 and ICH4 
chipsets. I'd be more than glad to hear some feedback on if and how that 
works for people out there...

> SpeedStep also work if ICH-2M, and it is easy to add it in your patch (but
> probably not the deeper sleep stuff though, especially if you have an
> older PIII, but sleep should be ok).

If it's easy, then by all means go for it. Unfortunately I don't have any ICH2 
based systems available for testing, so I'm 100% sure I won't be implementing 
it myself.

Cheers,

Marko
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


PATCH: Pentium-M deeper sleep support in idle loop

2003-10-16 Thread Marko Zec
>From http://www.tel.fer.hr/zec/BSD/pm/4.8-ich-ds.patch you can fetch an 
experimental patch for the 4.8-RELEASE kernel that allows for significant 
power savings on mobile systems by utilizing a feature called "deeper sleep 
mode". The deeper sleep mode is available on recent Intel mobile processors 
(Pentium III-M, Pentium IV-M and "Centrino" Mobile Pentium) in combination 
with ICH3 / ICH4 chipsets, and is used to simultaneously stop the CPU clock 
and significantly lower the chip core voltage. When in such a state, the CPU 
is supposed to consume only around 0.6 W, according to Intel specs.

The power saving policy in idle loop is controlled by the machdep.cpu_idle_hlt 
sysctl, which now has two new modes:

Mode 0  (std)  Do not halt the CPU, return from the idle loop as soon as
   possible.

Mode 1  (std)  Halt the CPU using the "hlt" instruction. CPU clock is not
   stopped (TSC keeps counting).

Mode 2  (new)  Halt the CPU using APM BIOS call followed by a "hlt". This
   method stops the clock, thus saving slightly more power.

Mode 3  (new)  Halt the CPU by entering the deeper sleep mode (max. power
   savings).

The battery life extension that can be obtained on an idle system using this 
patch looks very promissing. Here's what I could observe on my ThinkPad X30 
(Pentium III-M 1200, ICH-3 chipset) with a slightly worn-out battery:

+-+--+--+
|cpu_idle |  LCD ON (dim)|LCD OFF   |
|  mode   | Bat. life |   gain   | Bat. life |   gain   |
+-+---+--+---+--+
|1|4:03   |  |5:12   |  |
+-+---+--+---+--+
|2|4:10   | 2%   |5:23   | 3%   |
+-+---+--+---+--+
|3|4:48   |18%   |6:21   |22%   |
+-+---+--+---+--+

I had no ICH-4 based laptop available for testing, so I cannot promise that 
the patch will work on such systems, although it should.

The patch also introduces a new sysctl machdep.speedstep, which can be used to 
directly controll the CPU clock frequency / operating voltage. If your BIOS 
already correctly does this job you probably won't need this sysctl, however 
the BIOS in my ThinkPad annoyingly persists with the same speedstep mode 
regardless of the power source (external/battery), so I had to implement a 
method to control it.

Anyhow, hope you like the patch... The usual liability disclaimer applies - if 
anything goes wrong with your machinery or data, you are on your own :-)
Have fun,

Marko
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Network stack cloning / virtualization patches

2003-06-03 Thread Marko Zec
Sean Chittenden wrote:

> > I have been running tests between two machines with this patch
> > installed. There is a "per packet" overhead increase of about 1%.
> > there is no overhead increase in the per-byte overhead..  in ther
> > words, sending 1 byte packets gets about a 1% decrease in throughput
> > but sending 8k chunks has almost no overhead increase..
> >
> > Both my machines end up maxing out the 100Mb ethernet between them
> > before they see any speed difference at high packet sizes.
>
> 1% per packet seems a bit high... where is the overhead coming from?
> Seems as though there should be less overhead and that lookup of the
> necessary components for each vimage could be found with a hash...  I
> looked through the patch and couldn't see any places that screamed
> optimization.  Is the overhead really just from copying the data of
> the vimage around?

There are two major possible causes for overhead increase. First, each IP
protocol related tunable variable and most of the global symbols involved in
network processing have been virtualized. This means that instead of being
accessed directly, the symbols have to be dereferenced inside a struct vimage.
The described additional level of abstraction means that more clock cycles
will be spent on each access to any of those symbols, which happens rather
often in case of relatively complex TCP processing. And second, many kernel
functions have been extended with an additional argument, typically a pointer
to a struct vimage, so passing and fetching the extended argument lists has
certainly also contributed to the slight decrease in TCP performance. However,
a couple of percents in overhead increase that can be observed only in worst
case loopback tests do not present a problem in any real-life scenario.

On the other hand, I do not follow what you are aiming at with hash lookups?
Also, there's no special copying of data to / from vimages as you are
suggesting, besides the described dereferencing the virtualized symbols within
the struct vimage.

> Julian, am I safe in assuming that you have an interest in this work?
> If not, I may setup a p4 branch to work with and to merge these bits
> into -CURRENT if no one else is interested.  -sc

I would be really honored to see the cloning code merged in -CURRENT one day.
However, at the moment I'm strongly opposed to such a proposal, since the code
is simply not mature enough. As explained in one of the previous notes, the
vimage should be first restructured as a modular resource container facility,
vs. the current monolithic implementation. Many people have also proposed the
API to be reengineered. Further, no protocol except IPv4 (excluding IPSEC) has
been virtualized at the moment, etc.

Forcing such a partial solution into the official tree would beyond any doubt
create a terrible mess, a huge amount of breakage, a lot of unnecessary fights
and debates, but most of all it would make it more difficult to do the
virtualization properly once the original patch becomes mature enough.

Cheers,

Marko


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Network stack cloning / virtualization patches

2003-05-31 Thread Marko Zec
Sean Chittenden wrote:

> can it be broken down into a smaller set of commits?

No it can't. That's probably the biggest problem with the network stack
cloning concept - you can either properly virtualize the entire stack or do
no virtualization at all. Therefore even if I ever succeed in bringing the
patch in full sync with -CURRENT, I assume that many people would stand out
against it to be incorporated to the main source tree, as the patch would
significantly change pretty much of the code in the net* portions of the
kernel tree...

At the moment your best option is to try out the available version (against
4.8-R), report any bugs you encounter, and provide suggestions for further
reengineering...

Cheers,

Marko


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Network stack cloning / virtualization patches

2003-05-31 Thread Marko Zec
Juli Mallett wrote:

> * Sean Chittenden <[EMAIL PROTECTED]> [ Date: 2003-05-30 ]
> [ w.r.t. Re: Network stack cloning / virtualization patches ]
> > > at http://www.tel.fer.hr/zec/vimage/ you can find a set of patches
> > > against 4.8-RELEASE kernel that provide support for network stack
> > > cloning.
> >
> > Has anyone stepped forward to possibly shepherd this code into the
> > tree?  I am highly interested in this code and would like to see it
> > incorporated into the base system (read: -CURRENT, before 5.2).  After
> > looking at the TODO, I realize that this patch isn't 100% yet, but can
> > it be broken down into a smaller set of commits?
>
> Has anyone looked at making the patch work with CURRENT?  Does this do
> anything to degrade performance of UP systems with no (0?) virtualised
> images running?  Does it make the locking situation much worse?  Can it
> be stripped down to minimal, clean, well-architected diffs to accomplish
> a centralised goal, rather than a "Network+goodies, random subsystem
> overhaul"?  Those are probably good questions for someone to know the
> answers to (by looking at the code, or someone trying such) before it
> gets too close to the tree.

I plan to start porting the cloning code to -CURRENT once it becomes -STABLE
(that means once the 5.2 gets out, I guess). In the meanwhile I'd like to get
more feedback on what people like / dislike regarding the general concept and
the code as it is right now, in which direction I should strive to redesign the
management API etc.

I fully agree with Juli's comment that the patch coalesces many things not
fundamentally related to the network stack itself, and that it therefore has to
be slightly reengineered first. While at BSDCon in Amsterdam, idowse@ and phk@
suggested to me that the vimage framework should probably be implemented in a
more modular fashion, so that admins could choose which system resources to
virtualize and which not. My current experiments are going in that direction...

Regarding the question on performance penalty, I suggest that you check the
EuroBSDCon slides which provide a basic comparison between the standard and the
patched kernel. The overhead increase is generally hardly measurable, and
depending on traffic type it does not exceed 3-4% in worst case scenarios.
Julian Elischer will be giving a talk accompanying a paper on the subject at
the upcoming USENIX / FreeNIX ATC, so perhaps this could also be a good place
to learn a couple of more details :-) Unfortunately I won't be able to attend
the conference personally :-| , but I hope to hear some feedback though...

Cheers,

Marko


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Per-jail CPU limits?

2003-02-19 Thread Marko Zec
Mooneer Salem wrote:

> Hello,
>
> I've been looking at the kernel source, in particular the scheduler
> in the past few weeks. I found a place in kern_switch.c where per-jail
> CPU controls could be added (in particular, in the kse_reassign() function).
> >From looking at that function, I could loop through td_runq until I either:
>
> 1. Found a thread that isn't jailed,
> 2. Found a jailed thread, but determine it's safe to let it run because
>it does not go over sysctl-defined limits, or
> 3. Find no usable thread, in which case the KSE would theoretically switch
>over to the idle process until it's time to repeat the process again.
>
> This should allow the use of the standard FreeBSD scheduler, except for
> the jail limits. The question is, how do we determine the total CPU used
> by the jail? I found the kg_estcpu entry in struct ksegrp, which the thread
> has a pointer to, but would that be enough? Is there a different approach we
> could take that would solve this problem?

A rudimentary CPU usage limiting on per virtual image basis (virtual image can
be considered a jailed environment with its own independent network stack
instance) was implemented using an algorithm very similar to what you proposed,
so you might check the original patch against 4.7-RELEASE kernel at
http://www.tel.fer.hr/zec/BSD/vimage/index.html
As I didn't have enough time yet to make a usable port to 5.0, my assumptions
regarding programming in -CURRENT might be slightly wrong, but I guess you'll
have to:

1) extend the jail structure to hold CPU usage accounting information on
per-jail basis;
2) update this field when doing normal per-process CPU accounting in
kern/kern_clock.c / statclock();
3) do some decay filtering to ensure stability and "smoothness" of the acquired
per-jail CPU usage data;
4) in kern/kern_switch.c / chooseproc() implement the steps you originally
defined as 1. to 3.
5) on each HZ tick in kern/kern_synch.c / schedclock() check the current
process/jail hasn't consumed more CPU time than it was allowed, and if it has,
reschedule a new process. This is necessary to ensure acceptable interactive
response for processes/jails running with administratively restricted CPU
resources, otherwise the process could consume the entire time quantum (10 ms by
default), and would than have to wait annoyingly long in order for the average
per-jail CPU usage to drop under the defined threshold.
6) optionally, extend procrunnable() in kern/kern_switch.c to return 0 in case
there are over-the-CPU-limit processes in active run queue, in order for idle
loop to be able to execute the halt instruction, instead of unnecessarily
looping endlessly through chooseproc() until the next clock tick. This can be
especially useful on laptops where you don't want a process with CPU usage limit
to actually burn the battery power in idle loop, and also burn your lap at the
same time :)

Note: everything I wrote is based on my experience with 4.7-R kernel, in 5.0
many things have changed replacing process with threads as the atomic entities
for scheduling, so probably the function naming and some logic has changed
also...

Marko



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Routing within a Jail

2003-02-02 Thread Marko Zec
Yakov Sudeikin wrote:

> Hi freebsd-hackers,
>
> Jail with multiple LAN cards accessible from within?
>
> I have my 4.7 box serving a lot of things, and I have a Linux box routing
> the network packets for people in my block. I am not an administraotr of the
> router. I want to get rid of the Linux station, I want to create a jail on
> my FreeBSD box and start a router + firewall there. As far as I know this is
> not possible, jail is started binded to single IP. And I need to route
> between different interfaces and even differend LAN cards. One of them is
> WaveLan, others are Ethernet rl0 like. I want the router to be in the jail
> for security purposes, and have all my services also in the other jails
> (mysql, apache, ftp, mail, named, samba etc). And I want the host system
> ONLY serve jails and do nothing else by itself. Is FreeBSD jail subsystem
> mature enough to accomplish this?
>

Check http://www.tel.fer.hr/zec/BSD/vimage/ , this could probably be a solution
for your scenario.

Marko



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: FreeBSD firewall for high profile hosts - waste of time ?

2003-01-16 Thread Marko Zec
Terry Lambert wrote:

> Josh Brooks wrote:
> > You know, I keep hearing this ... the machine is a 500 mhz p3 celeron with
> > 256 megs ram ... and normally `top` says it is at about 80% idle, and
> > everything is wonderful - but when someone shoves 12,000-15,000 packets
> > per second down its throat, it chokes _hard_.  You think that optimizing
> > my ruleset will change that ?  Or does 15K p/s choke any freebsd+ipfw
> > firewall with 1-200 rules running on it ?
>
> No I'm just plain confused... 15,000 packets/second is just not
> that much load:
>
> Minisize15000 * 64B * 8b= 7,680,000b/S
> ...just less than 10 megabits/second.
>
> Maxsize 15000 * 1500B * 8b  = 180,000,000b/S
> ...just less than 200 megabits/second.
>
> I don't understand where you are spending your CPU time, even
> if the packets are being written to disk before they are sent
> on...

At 20.000 pps you have only 50 usec for forwarding each packet, without doing
any other work on the system. With 500 MHz CPU this translates to 25.000 clock
cycles per packet. Subtract some general interrupt and IP processing overhead,
divide that by 200 ipfw rules, and you are left with only around 100 clock
cycles per ipfw rule. Having in mind you are running on a system with a limited
CPU cache, you'll certainly wait a lot for accessing the code/data in RAM, it's
clear that this is becomes an impossible mission. So, obviously you don't need
any ruleset "loops" as you are suggesting for such configuration to livelock...

Marko



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: FreeBSD firewall for high profile hosts - waste of time ?

2003-01-16 Thread Marko Zec
Josh Brooks wrote:

> My freebsd machine does _nothing_ but filter packets and run ssh.
>
> > ONLY purpose is to deal with attacks.  With an entire cpu dedicated
> > to dealing with attacks you aren't likely to run out of CPU suds (at least
> > not before your attackers fills your internet pipe).  This allows you
> > to use more reasonable rulesets on your other machines.
>
> You know, I keep hearing this ... the machine is a 500 mhz p3 celeron with
> 256 megs ram ... and normally `top` says it is at about 80% idle, and
> everything is wonderful - but when someone shoves 12,000-15,000 packets
> per second down its throat, it chokes _hard_.  You think that optimizing
> my ruleset will change that ?  Or does 15K p/s choke any freebsd+ipfw
> firewall with 1-200 rules running on it ?

In my opinion, besides trying to optimize the filtering ruleset as suggested by
other folks, you could do yourself a favor by purchasing a more decent CPU and
faster DDRAM. It is obvious that at 20.000 pps or even more (with typical DoS
small-sized packets) your machine won't hit the PCI bus limits, so you won't need
any fancy and expensive PCI-X motherboards and/or NICs, just go for higher CPU
clock, more cache, and more RAM bandwidth.
Another thing to consider if your system is experiencing livelock under attacks
would be using the polling mode instead of interrupts, see
http://info.iet.unipi.it/~luigi/polling/ for details.

Marko



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: jail: multiple ip's

2002-12-04 Thread Marko Zec
Terry Lambert wrote:

> Tony Finch wrote:
> > [EMAIL PROTECTED] (Mike Ghunt) wrote:
> > >  Has anyone hacked the jail code to support more than one ip?
> > >Would it be wise to hack at the code to add such a feature?
> >
> > Probably the best way to address this issue is to incorporate the
> > network stack virtualization patch, then change the jail ID from
> > an IPv4 address into a network stack ID.
>
> I'm really tempted to say that the network virtualization patch
> is special purpose, and introduces a lot of overhead that would
> not be there without the network virtualization patch.

Just the contrary, the network stack virtualization concept is mostly
general-purpose oriented. The (minor) penalty of "a lot of overhead"
introduced by the patch is measurable only on loopback traffic, however
in practice the NIC media sets the limit on traffic throughput, so in
most cases no performance degradation can be observed. Some measurement
results can be found at
http://www.tel.fer.hr/zec/papers/zec-bsdconeurope-2002.pdf

On the other hand, I agree with you that this stuff is still in early
experimental phase, but the patch has been proven to work reliably with
4.7-RELEASE as announced, with a -CURRENT version to follow soon...

Marko


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Clock Granularity (kernel option HZ)

2002-01-31 Thread Marko Zec

Storms of Perfection wrote:

> Ok. Since I have a limited hardware/software set at my finger tips. I can
> generate an attack on my machine (such as a synflood or something) to see
> what type of reponses I can get by setting it up and down. I think this may
> apply to this feature, to help the machine withstand attacks (and possibly
> have performance related gains/decreases)

Under no circumstances can increasing HZ make the machine less vulnerable to
high packet-rate traffic. In fact, it will performe even slightly worse. Try
using interrupt coalescing instead (on cards that support it), or even better,
try Luigi's polling code.

Marko


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: FreeBSD4.4, fxp, no net after ifconfig for ~50 seconds (fwd)

2001-11-27 Thread Marko Zec

As Cisco switches have STP enabled by default on all ports, maybe a reboot of
an 4.4 system is seen as a change in link state, so Catalyst holds the port
STP-blocked for a couple of seconds before putting it to forwarding state. Did
you try disabling STP on Catalyst eth ports?

Marko

David Kirchner wrote:

> Hi,
>
> This problem is still ongoing; unfortunately I haven't seen a reply about
> it from questions. Maybe someone here knows what's up?
>
> -- Forwarded message --
> Date: Thu, 15 Nov 2001 13:08:38 -0800 (PST)
> From: David Kirchner <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Subject: FreeBSD4.4, fxp, no net after ifconfig for ~50 seconds
>
> We've recently started using FreeBSD 4.4 for production servers, in an
> environment where servers between 3.2 and 4.3 have had no trouble.
> Starting with 4.4, the servers have been booting up without being able to
> see the network for around 50 seconds.
>
> tcpdump indicates that the gateway isn't responding to the ARP request for
> x.x.x.1 right away. However, the gateway responds immediately to ARP
> requests from 3.2 through 4.3 machines. All other ARP requests are
> responded to immediately (ie, other FreeBSD 3.2-4.4 servers, even before
> the gateway responds)
>
> I was wondering if a) anyone else has been experiencing similar trouble,
> and b) if anything non-obvious has changed in the way FreeBSD ARP request
> packets are sent that would cause this?
>
> Our network runs on primarily Cisco hardware, and the servers are
> connected to Catalyst (29xx I believe) switches. The gateway is a Cisco
> somethingorother router.
>
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-questions" in the body of the message
>
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-hackers" in the body of the message


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: fxp patch - bundling receive interrupts

2001-10-24 Thread Marko Zec


Mike Silbersack wrote:

> That being said, I thought I should check on one thing:  In your original
> post, you mentioned that these techniques came from the linux drive for
> these cards.  In the process of writing this patch, did you copy any
> section of code from the Linux driver?  If possible, it would be best to
> avoid any GPL entanglements.

I used the microcode from Intel's proprietary Linux driver, which is
definetely not GPL'ed. I'm not nearly a copyright expert, but it seems to me
that Intel put a BSD-like copyrihght on mentioned sources. Intel's copyright
is included in rcvbundle.h, so I hope some of BSD "legals" can check on that,
and if in any doubt the simplest thing to do would be asking Intel for their
position before including the code in a official distributon.

Marko


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: fxp patch - bundling receive interrupts

2001-10-24 Thread Marko Zec

I am not an official FreeBSD commiter, so I can't tell really...
Therefore jlemon was in cc: (he is the fxp driver maintainer), so it is
his call.
Nevertheless, I think this patch needs a little bit more testing - there
are many 8255* chipset revisions out there, and as the code is *very*
chipset dependent, we should wait for gathering some feedback first from
the people testing the driver.

Dennis Wong wrote:

> Marko,
>
> Is this going to be rolled into -stable anytime soon?
>
> Thanks


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



fxp patch - bundling receive interrupts

2001-10-24 Thread Marko Zec

An updated fxp driver patch for bundling receive interrupts, thus saving
a noticeable amount of CPU overhead for interrupt processing, can be
found at http://www.tel.fer.hr/zec/BSD/fxp/. New features include:
- control of microcode parameters through sysctl variables
- activation/deactivation of microcode without bringing the interface
down
- independent control of microcode parameters/activity for each fxp
interface instance
- new parameter hw.fxp.size_mask
- hw.fxp.int_delay is now defined in microseconds, instead of microcode
time-counter units

The microcode should work on many revisions - if not all - of Intel
8255* chipset, but the BSD driver is currently tested only on 82558-B0,
so I would really appreciate any feedback on driver
functionality/stability on other chipset revisions.

Have fun!



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



fxp driver - receive interrupt bundling

2001-10-19 Thread Marko Zec

On http://fly.cc.fer.hr/~zec/index.html you can find a 4.4-RELEASE fxp
driver source, with patches that incorporate receive interrupt bundling
microcode, borrowed from the Intel's Linux e100 driver.

Bundling interrupts for a couple of received Ethernet frames can
significantly lower interrupt processing overhead, so if you have a
really busy server or router or whatever this code can make a noticeable
difference. On an 1200 MHz Athlon machine, the microcode saves around
10% of CPU utilization, with incoming traffic of 20k pps on a single
interface.

The code is tested on 82558 rev B0 hardware, I'd be glad to know how it
works on other versions of Intel's fxp cards.

Pls. send your comments, suggestions etc. to [EMAIL PROTECTED]

Have fun!


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message