>
> Firstly, if the handle isn't being read with binmode set then
> perhaps the \r\n are being converted to \n (if this is Windows)?
> How are you creating/initializing the socket?
>
Unfortunately, with or without binmode, there's no difference to the
matching (from what I can tell)
Socket creation:
my $TCPSocket = new IO::Socket::INET (PeerHost => "x.x.x.x",
PeerPort => "5000",
Proto => "tcp",
Blocking => "1", ####
<-- Tried with blocking (0|1) as well.
) or die "ERROR in Socket Creation :
$!\n";
# Ensure we get output right away
$TCPSocket->autoflush(1);
binmode $TCPSocket; ### Tried with/without binmode
Similarly, the character encoding of the data on the socket could
> matter. You said there are character codes above 127. Does that
> mean the encoding is 8-bit such as [extended] ASCII or latin1, or
> do you mean the character codes are WAY above 127? Character
> encoding could be another culprit if the \r and \n characters are
> encoded differently in the stream than you (and Perl) expects.
> Using the IO layers or the explicit Encode module you should be
> able to decode the stream into a Perl string that Perl
> understands properly.
>
>From the relevant RFCs:
The terms "NUL", "TAB", "LF", "CR, and "space" refer to the octets
%x00, %x09, %x0A, %x0D, and %x20, respectively (that is, the octets
with those codes in US-ASCII [ANSI1986] and thus in UTF-8 [RFC3629]).
The term "CRLF" or "CRLF pair" means the sequence CR immediately
followed by LF (that is, %x0D.0A). A "printable US-ASCII character"
is an octet in the range %x21-7E. Quoted characters refer to the
octets with those codes in US-ASCII (so "." and "<" refer to %x2E and
%x3C) and will always be printable US-ASCII characters; similarly,
"digit" refers to the octets %x30-39.
However, the data stream does contain yEnc content, which as far as I know,
is 8-bit encoding. So whilst the protocol itself may use UTF-8, the data
transmitted in the protocol can either be UTF-8, or 8-bit
Lines *should* be terminated by CRLF (provided the 8-bit encoding doesn't
mess up the detection), and the entire data stream is then terminated with
a CRLF.CRLF (similar to a SMTP message for example in terms of protocol).
> You can attach an IO layer to the file handle by passing an
> additional argument to binmode:
>
> binmode $fh, ':encoding(UTF-8)';
>
>
Loads, and LOADS and *piles* of UTF-8 errors...
utf8 "\xD826" does not map to Unicode at test.pl line 40 (#1)
utf8 "\x1583F9" does not map to Unicode at test.pl line 40 (#1)
etc.
>From personal experience and using other (nasty) methods and components for
doing what I -should- be able to do with native perl, I've learned the hard
way that messing with binmode $fh, ":encoding...." generally corrupts the
8-bit (yEnc) data. Again, I am more than likely doing it incorrectly, but
I'm really trying to understand how to do it correctly though :-)
> Lastly, you're reading from a socket so there's no guarantee that
> the buffer string is going to necessarily end at the termination
> boundary. Perhaps the protocol guarantees that, but the socket
> surely doesn't. You may need to look for that terminating
> sequence in the middle of the buffer.
>
>
But isn't that exactly why we set things like autoflush(1) or $|=1? After
the data stream has been sent from the server (i.e. CRLF.CRLF) the server
stops transmitting data and waits for the next command, so there's no
chance that a second data stream may be received by the client socket, at
least not until the client socket issues a new command.
> Does any of that help?
>
>
I appreciate it, truly. But no, not really :-( I can honestly say, been
there, done that.
I realize my problem here is the really whacky way in which the data stream
is encoded (and that is completely out of my control). But there must be a
adequate and proper way to handle this data.
--
Regards,
Chris Knipe