Re: [networking-discuss] Socket filter design

Anders Persson Tue, 15 Dec 2009 22:48:55 -0800

I just found out that my original reply to Erik did not make it to the
list. So I'm forwarding the email just to make the thread complete.

----- Forwarded message from Anders Persson <[email protected]> -----

Date: Mon, 14 Dec 2009 19:23:00 -0800
From: Anders Persson <[email protected]>
To: Erik Nordmark <[email protected]>
Cc: [email protected]
Subject: Re: Re: [networking-discuss] Socket filter design

Hi Erik,

Thank you for your comments. My comments are inline.

On Tue, Dec 08, 2009 at 07:03:54AM -0800, Erik Nordmark wrote:
> Section 2.1 says you manage the filters using SMF. How does that 
> interact with zones? For instance, if a shared-IP or exclusive-IP zone 
> disables one of the service instances, what happens?
> Or do the filter service instances only exist in the global zone 
> somehow? (Having them exist in non-global zones but have no effect on 
> the filters used would seem confusing.)

The service would only exist in the global zone.

> Section 3.2. What is the errno when a FIL_DETACH is done on an automatic 
> filter?

setsockopt() would fail with EINVAL.

> Section 4.3. It makes sense to mention that the bypass flag is a 
> performance optimization. The filter must have its own checks whether or 
> not it should bypass its work to handle any callbacks that are active 
> when the bypass flag is set.

Indeed; I'll update the doc.

> Section 5.3. How does flow control interact with filters that inject 
> data (either for input or output)?

Flow control can be enabled as a result of a filter injecting data, and
filters that do inject data must stop to do so if the socket becomes flow
controlled.

> In don't understand the need for section 7.1. The mblks that shouldn't 
> be modified have db_ref = 2 (this is done by using esballoca) hence 
> there shouldn't be any need to do anything special for the filters. A 
> filter, just like any streams modules or driver, must check db_ref == 1 
> before doing any in-place modifications to dblk.

That's right (and db_ref comment needs to be added to the document), but there
are some additional complications with sendfile() that I need to look into a
bit more. So I'll get back to on this one.

> Section 7.2. How does a filter indicate that it doesn't have a data_in 
> callback? Null pointer?

Yes. Any callback can be set to NULL to indicate that the filter is not
interested in that event.

> Section 7.3 seems dangerous. If kssl is implemented using filters and 
> some application issues some innocuous I_* ioctl to get something, that 
> would make kssl disappear from the socket? I think instead the ioctl (or 
> other reason for the fallback) must fail.

If the specific I_* ioctl require the socket to fallback for the operation to
complete, then yes, the filter would disappear. I considered denying the
STREAMS operation, but I thought forcing the ioctl to fail could lead to some
interesting application failures. However, looking at how kssl used to work on
STREAMS sockets it looks like it would stick around even if sockmod is pop'ed,
which could result in the application seeing bogus TLI messages. But again,
that's probably a much less likely scenario than an I_SETSIG being issued on a
non-STREAMS socket, resulting in a fallback.

So could you expand a bit on what you see as being dangerous?

> Section 8. Do we have interested parties outside of the consolidation 
> that would like to use socket filters? What would it take to make this 
> an evolving interface?

I'm not sure how much work would be involved to bump the level to Uncommitted,
but my guess would be "not much". That being said, I would like to file a
follow-up case to make such changes.

> Section 8.1: What are the semantics of sof_unregister for a filter that 
> is in use? Will it fail with EBUSY? Something else?

Yup, EBUSY will be returned as long as it's in use.

> Section 8.2.1 If there are multiple filters and one returns 
> SOF_RVAL_DEFER, do the filters above it get an attach callback?

Yes, the attach callback for each filter will be triggered, and multiple
filters can return DEFER.

> Section 8.2.3, 8.2.4 etc says the filter can modify the mblk. In light 
> of db_ref considerations it would be better to say that it can return a 
> different mblk (with in-sito modifications of the dblk allowed if db_ref 
> = 1). Instead of having a mblk_t ** it seems to be less error prone to 
> have an mblk_t pointer as the return value. That makes it less likely to 
> introduce errors where the mblk is freed or changed, but the *mpp isn't 
> updated. (I've fixed this in IP recently.)

OK, I'll make the clarification and change the return value of data_in to
mblk_t *. It might be possible to simplify data_in_proc as well, see the
comment regarding 'tailmp' below.

> What is the definition of *sizep? It is what I get from msgdsize()? 
> msgsize()? (Can there be mblks other than M_DATA?)

`sizep' is total amount of data (from msgdsize()) contained in the chain. And
there can be non-M_DATA mblks, for example, M_PROTO with T_UNITDATA_IND,
T_OPTDATA_IND, etc.

> Section 8.2.4. Why do we need the complexity of tailmp?
> Why do we even need the complexity of a b_next chain? Doesn't TCP just 
> have b_cont chains?

The `tailmp' is just an optimization; why chase the tail if the filter knows
where it is. But it is possible for the mblk chain to be large, so the search
could potential be expensive. It might be a premature optimization, so I'll run
some benchmarks to see if it can be simplified. 

In case of UDP there might be a b_next chain of datagrams, and for TCP we might
end up with T_OPTDATA_IND mblks that break the b_cont chain. The framework
could potentially break up the b_next chain and call the callback multiple
times, but that would require the framework to also calculate the size of each
b_cont chain (the size information available is that of the whole b_next
chain). 

> Section 8.2.5. In light of the above comments I think an mblk_t * return 
> value type makes more sense, and have the error return value as a 
> separate out argument. When can the filter free and/or consume an mblk? 
> For instance, if the filter wants to queue the mblk would it return a 
> NULL mblk and set RVAL_RETURN? Must the filter free the message when 
> returning an error?

Sure, having a mblk_t * return value would be OK.

The filter must return a NULL mblk if it set rval to anything other that
RVAL_CONTINUE, so in case of error it should free the mblk. I'll update the
document to make sure that is clear.

> Section 8.2.6. What is the utility of SOF_RVAL_RETURN for bind or 
> connect? I can imagine using it for a deferred connect (similar to an 
> accept filter but in the reverse direction). But to do that the filter 
> would need a way to continue propagating the connect down to the 
> protocol, and I don't see such a facility.
> Ditto for setsockopt.
> Perhaps we should restrict SOF_RVAL_RETURN to the callbacks where we 
> know we have all the pieces necessary to make use of it?

You are right, there is no such facility. The main motivation behind it was for
setsockopt/getsockopt/ioctl, where the filter might be able to handle the
request itself (without sending it to the protocol).

> Section 8.4.3 and 8.4.4 doesn't indicate how this works when there are 
> multiple filters. Earlier in the document you stated how accept filters 
> work (each filter has to say sof_newconn_ready). Is the same true for 
> flow control that all that have stopped the flow have to restart the 
> flow? It makes sense documenting the behavior in the earlier part on 
> flow control.

Yes, the behavior is the same. I'll add that to the document.

Thanks again,
Anders

----- End forwarded message -----

Anders
_______________________________________________
networking-discuss mailing list
[email protected]

Re: [networking-discuss] Socket filter design

Reply via email to