On 4/14/2021 1:14 PM, sten.kristian.ivars...@gmail.com wrote:
Hi Ken

Using AF_UNIX/SOCK_DGRAM with current version (3.2.0)
seems
to
drop messages or at least they are not received in the same
order they are  sent

[snip]

Thanks for the test case.  I can confirm the problem.  I'm not
familiar enough with the current AF_UNIX implementation to
debug this easily.  I'd rather spend my time on the new
implementation (on the topic/af_unix branch).  It turns out
that your test case fails there too, but in a completely
different way, due to a bug in sendto for datagrams.  I'll see
if I can fix that bug and then try again.

Ken

Ok, too bad it wasn't our own code base but good that the
"mystery"
is verified

I finally succeed to build topic/af_unix (after finding out what
version of zlib was needed), but not with -D__WITH_AF_UNIX to
CXXFLAGS though and thus I haven’t tested it yet

Is it sufficient to add the define to the "main" Makefile or do
you have to add it to all the Makefile:s ? I guess I can find
out though

I do it on the configure line, like this:

     ../af_unix/configure CXXFLAGS="-g -O0 -D__WITH_AF_UNIX" --
prefix=...

Is topic/af_unix fairly up to date with master branch ?

Yes, I periodically cherry-pick commits from master to topic/af_unix.
I'lldo that again right now.

Either way, I'll be glad to help out testing topic/af_unix

Thanks!

I've now pushed a fix for that sendto bug, and your test case runs
without error on the topic/af_unix branch.

It seems like the test-case do work now with topic/af_unix in
blocking mode, but when using non-blocking (with MSG_DONTWAIT)
there are
some
issues I think

1. When the queue is empty with non-blocking recv(), errno is set
to EPIPE but I think it should be EAGAIN (or maybe the pipe is
getting broken for real of some reason ?)

2. When using non-blocking recv() and no message is written at all,
it seems like recv() blocks forever

3. Using non-blocking recv() where the "client" does send less than
"count" messages, sometimes recv() blocks forever (as well)


My naïve analysis of this is that for the first issue (if any) the
wrong errno is set and for the second issue it blocks if no
sendto() is done after the first recv(), i.e. nothing kicks the "reader
thread"
in the butt to realise the queue is empty. It is not super clear
though what POSIX says about creating blocking descriptors and then
using non-blocking-flags with recv(), but this works in Linux any
way

The explanation is actually much simpler.  In the recv code where a
bound datagram socket waits for a remote socket to connect to the
pipe, I simply forget to handle MSG_DONTWAIT.  I've pushed a
fix.  Please retest.

I should add that in all my work so far on the topic/af_unix branch,
I've thought mainly about stream sockets.  So there may still be
things remaining to be implemented for the datagram case.

I finally got some time to test topic/af_unix in our "real"
cygwin-application
(casual) and unfortunately very few of our unittests pass

The symptoms are that there's unexpected eternal blocking, sometimes
there's unexpected EADDRNOTAVAIL, sometimes it looks like some
memory
corruption (and
core-dumps)

Of course the memory corruption etc could be our self and the
core-dumps might be because of uncaught exceptions

Needles to say is that all unittests pass on Linux, but of course
cygwin-topic/af_unix could act according to POSIX-standard and the
behaviour couldbe due to our own misinterpretation of how POSIX works

More likely it's due to bugs in the topic/af_unix branch.  This is
still very much a work in progress.

I will try to narrow down the quite complex logic and reproduce the
problems

That would be ideal.

If you of some reason wanna try it with casual, I'd be glad to help
you out (it should be easier now that last time (but there might be
some documentation missing for Cygwin still))

https://bitbucket.org/casualcore/

I'm going on vacation in a few days, but I might do this when I get back.

Thanks for your testing.

By the way, if your code is using datagram sockets, then there are very serious
problems with our implementation (even aside from the performance issue
that we've already discussed).  For example, I don't know of any reasonable
way for select to test whether such a socket is ready for writing.  We'll need 
to
solve that somehow.

If you by that mean if we're using SOCK_DGRAM, the answer is yes

I tried SOCK_STREAM (and SOCK_SEQPACKET I think) for CYGWIN 3.2.0 but that 
didn't work at all

As far as I understand, both all types on pretty much all implementations 
preserves message ordering though

I haven't tried SOCK_STREAM and/or SOCK_SEQPACKET with the 
topic/af_unix-branch. Is that worth a try ?

SOCK_STREAM is definitely worth a try. The implementation of that should be much more reliable than the implementation of SOCK_DGRAM at the moment. We don't implement SOCK_SEQPACKET.

Ken
--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to