Re: busybox sed, 'r' command

2016-03-24 Thread Cristian Ionescu-Idbohrn
On Thu, 24 Mar 2016, Ron Yorston wrote:
>
> and specifically about 'r':
>
>If rfile does not exist or cannot be read, it shall be treated as if
>it were an empty file, causing no error condition.

My observation, looking at the strace from GNU sed, is it attempts to
open a file with no/empty name and fails, but ignores the error.


Cheers,

-- 
Cristian
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: *FLAWED* Re: busybox sed, 'r' command

2016-03-24 Thread Ron Yorston
I was curious what POSIX says and how other *nix systems would handle
Christian's examples.

On the 'r' command POSIX makes a general comment:

   The r and w command verbs, and the w flag to the s command, take an
   rfile (or wfile) parameter...

and specifically about 'r':

   If rfile does not exist or cannot be read, it shall be treated as if
   it were an empty file, causing no error condition.

This offers no guidance on how to handle a missing parameter, unless
you read 'if rfile does not exist' to mean the parameter rather than
the actual file.

In practice only GNU sed ignores an 'r' command with no parameter;
BusyBox, FreeBSD, Solaris and Version 7 UNIX[1] treat it as an error.

On newlines, POSIX only offers:

   In default operation, sed cyclically shall append a line of input,
   less its terminating  character, into the pattern space.

Given Christian's sample file with no trailing newline and the command
'sed -n p /tmp/bar' GNU sed returns all three lines with no newline on
the last; BusyBox and FreeBSD return all three lines with a newline on
the last; Solaris and Version 7 UNIX only return the first two lines.

So, you pays your money and you takes your choice.  Busybox sed's
behaviour is certainly consistent with *nix tradition.  We're just
lucky to have so many traditions to choose from.

Ron
---
[1] http://www.nordier.com/v7x86/index.html has a virtual machine with
UNIX v7 for x86.
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: *FLAWED* Re: busybox sed, 'r' command

2016-03-23 Thread Mike Frysinger
busybox generally follows a pretty simple flow:
(1) is the behavior you're looking at explicitly documented by POSIX ?
if yes, then do what POSIX says & you're done
(2) is the behavior described as "implementation defined" by POSIX ?
if yes, do whatever produces smaller code
(3) is the behavior attempting to replicate another standard (e.g. GNU) ?
is it behavior the standard explicitly documents ?
if yes, do what the replicated standard does

otherwise, if it's an edge case no one cares about, stick to small code.
-mike


signature.asc
Description: Digital signature
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox

Re: *FLAWED* Re: busybox sed, 'r' command

2016-03-23 Thread Cristian Ionescu-Idbohrn
On Wed, 23 Mar 2016, Ralf Friedl wrote:
>
> On the other hand, I don't know why busybox sed needs exactly one space
> between command and filename. GNU sed works with zero or more spaces.

Good points, everyone.  Thanks.  Still...

# Note, the input file /tmp/bar lacks the  on the last line

# simplified

GNU sed ignores open failure on not specified/not existing file:

$ strace sed 'r' /tmp/bar
open("/tmp/bar", O_RDONLY|O_LARGEFILE)  = 3
read(3, "foo\nbar\nbaz", 4096)  = 11
write(1, "foo\n", 4foo
)= 4
open("", O_RDONLY|O_LARGEFILE)  = -1 ENOENT (No such file or
directory)
write(1, "bar\n", 4bar
)= 4
open("", O_RDONLY|O_LARGEFILE)  = -1 ENOENT (No such file or
directory)
read(3, "", 4096)   = 0
write(1, "baz\n", 4baz
)= 4
open("", O_RDONLY|O_LARGEFILE)  = -1 ENOENT (No such file or
directory)
read(3, "", 4096)   = 0
close(3)= 0

busybox sed reports an error:

$ strace busybox sed 'r' /tmp/bar
write(2, "sed: empty filename\n", 20sed: empty filename
)   = 20

Arguably, this may look like a bug in GNU sed, or intentional
behaviour?

# Let's do a more reasonable test.

$ sed -n '1,$p' /tmp/bar | cat -E
foo$
bar$
baz
   ^
Note the missing  char on the last line.

$ busybox sed -n '1,$p' /tmp/bar | cat -E
foo$
bar$
baz$
   ^
There's a  char on the last line.

Which is at fault here?  I would say both (with reservations).
But obviously, non-determinism.

$ f=/tmp/bar && cat $f && [ -z "$(tail -c1 $f)" ] || echo

and:

$ f=/tmp/bar && cat $f && tail -c1 $f | read __ || echo

work, but they look more convoluted to me.


Cheers,

-- 
Cristian
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: busybox sed, 'r' command

2016-03-23 Thread Michael Conrad

On 3/23/2016 12:12 PM, Ralf Friedl wrote:

Cristian Ionescu-Idbohrn wrote:

sed (GNU sed) 4.2.2 can do this:

$ printf 'foo
bar
baz' | sed r -
foo
bar
baz

or, after storing the text in a file:

$ printf 'foo
bar
baz' >/tmp/bar

$ sed r /tmp/bar
foo
bar
baz

But busybox sed can't:

$ printf 'foo
bar
baz' | busybox sed r -
sed: empty filename

$ busybox sed r /tmp/bar
sed: empty filename

$ printf '' | busybox sed 'r /tmp/bar'


$ busybox sed 'r /tmp/bar'


The 'r' command is documented by GNU sed as a GNU extension. Still,
busybox sed documents the 'r' command as supported:

r [address]r file
   Read contents of file and append after the contents of the
   pattern space. Exactly one space must be put between r and 
the

   filename.

Am I misinterpreting the documentation?

From the documentation:
>   The full format for invoking `sed' is:
> sed OPTIONS... [SCRIPT] [INPUTFILE...]
So in your example you invoce sed with the script "r" and the input 
file "-" or "/tmp/bar". The content is not printed because it is the 
argument to the "r" command, but because it is the main input file to 
sed. You can avoid that by using quotes around the command and the 
file name, or by omitting the space between the command and the filename.
You should also try the last two examples, where you invoke busybox 
sed with quotes, with GNU sed. The behaviour is the same.


You should note that in your example when reading from a file, sed 
didn't read from stdin, at least you don't mention it, although your 
interpretation would mean that the filename is the argument to the "r" 
command, therefor no argument is given to sed, and sed should read stdin.


You should also not that invoking the "r" command with the filename 
causes the content of this file to be inserted after every line. When 
reading from a pipe, the pipe is empty after the first line.


My documentation to GNU sed 4.2.2 says:
> `r FILENAME'
>  As a GNU extension, this command accepts two addresses.
>
>  Queue the contents of FILENAME to be read and inserted into the
>  output stream at the end of the current cycle, or when the next
>  input line is read.  Note that if FILENAME cannot be read, it is
>  treated as if it were an empty file, without any error indication.
>
>  As a GNU `sed' extension, the special value `/dev/stdin' is
>  supported for the file name, which reads the contents of the
>  standard input.

So the main difference seems to be that GNU sed doesn't give an error 
message if the file can't be read. I'm not sure why that would be a 
good idea.
Also not that there is no mention of using "r -" for stdin, instead 
/dev/stdin is mentioned.


On the other hand, I don't know why busybox sed needs exactly one 
space between command and filename. GNU sed works with zero or more 
spaces.


It looks to me that what actually happens when running "sed r" is that 
it appends *no lines* to the end of each line read from stdin.


$ printf 'foo
bar
baz' | sed

Does not add a final newline

$ printf 'foo
bar
baz' | sed r

Does add a final newline

$ printf 'foo
bar
baz' | sed 'r /dev/null'

Does add a final newline

echo "blah" > -
$ printf 'foo
bar
baz' | sed 'r -'

results in

foo
blah
bar
blah
baz
blah

So it is not a special case for the filename.

I personally don't see much value in preserving the behavior of 
appending nothing for a file which doesn't exist.  Tools should give 
errors if they can't do what you ask them to.


-Mike C.
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: busybox sed, 'r' command

2016-03-23 Thread Ralf Friedl

Cristian Ionescu-Idbohrn wrote:

sed (GNU sed) 4.2.2 can do this:

$ printf 'foo
bar
baz' | sed r -
foo
bar
baz

or, after storing the text in a file:

$ printf 'foo
bar
baz' >/tmp/bar

$ sed r /tmp/bar
foo
bar
baz

But busybox sed can't:

$ printf 'foo
bar
baz' | busybox sed r -
sed: empty filename

$ busybox sed r /tmp/bar
sed: empty filename

$ printf '' | busybox sed 'r /tmp/bar'


$ busybox sed 'r /tmp/bar'


The 'r' command is documented by GNU sed as a GNU extension.  Still,
busybox sed documents the 'r' command as supported:

r [address]r file
   Read contents of file and append after the contents of the
   pattern space. Exactly one space must be put between r and the
   filename.

Am I misinterpreting the documentation?

From the documentation:
>   The full format for invoking `sed' is:
> sed OPTIONS... [SCRIPT] [INPUTFILE...]
So in your example you invoce sed with the script "r" and the input file 
"-" or "/tmp/bar". The content is not printed because it is the argument 
to the "r" command, but because it is the main input file to sed. You 
can avoid that by using quotes around the command and the file name, or 
by omitting the space between the command and the filename.
You should also try the last two examples, where you invoke busybox sed 
with quotes, with GNU sed. The behaviour is the same.


You should note that in your example when reading from a file, sed 
didn't read from stdin, at least you don't mention it, although your 
interpretation would mean that the filename is the argument to the "r" 
command, therefor no argument is given to sed, and sed should read stdin.


You should also not that invoking the "r" command with the filename 
causes the content of this file to be inserted after every line. When 
reading from a pipe, the pipe is empty after the first line.


My documentation to GNU sed 4.2.2 says:
> `r FILENAME'
>  As a GNU extension, this command accepts two addresses.
>
>  Queue the contents of FILENAME to be read and inserted into the
>  output stream at the end of the current cycle, or when the next
>  input line is read.  Note that if FILENAME cannot be read, it is
>  treated as if it were an empty file, without any error indication.
>
>  As a GNU `sed' extension, the special value `/dev/stdin' is
>  supported for the file name, which reads the contents of the
>  standard input.

So the main difference seems to be that GNU sed doesn't give an error 
message if the file can't be read. I'm not sure why that would be a good 
idea.
Also not that there is no mention of using "r -" for stdin, instead 
/dev/stdin is mentioned.


On the other hand, I don't know why busybox sed needs exactly one space 
between command and filename. GNU sed works with zero or more spaces.

___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: busybox sed, 'r' command

2016-03-23 Thread Cristian Ionescu-Idbohrn
On Wed, 23 Mar 2016, Ron Yorston wrote:
>
> Since the 'r' command requires a space before the filename it will need
> to be quoted.  Some of your examples have quotes and some don't so you
> aren't always comparing the same thing.

Right.  Still.  The different behaviour confused me.

> "sed r -" is an 'r' command with no filename while the "sed 'r -'" is an
> 'r' command with a filename of '-'.  It appears that GNU sed and BusyBox
> sed handle an 'r' command with no filename differently.

Yes.  That seems to be it.  Question is if busybox sed should mimic
GNU sed behaviour or not.  The current GNU sed behaviour might be seen
upon as a bug.  But it's been like that for ages.  Maybe it's a bug
upstream wants to keep for historical reasons?

> Also note that printf doesn't issue a newline at the end of the string.
> This can affect the results.

Yes, that was intentional.  A file that lacks a  at the end
of the last line, passing through:

$ sed r 

enforces proper line termination on last line.  I know there's other
cludge that can achieve the same thing.


Cheers,

-- 
Cristian
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: busybox sed, 'r' command

2016-03-23 Thread Ron Yorston
Cristian,

Since the 'r' command requires a space before the filename it will need
to be quoted.  Some of your examples have quotes and some don't so you
aren't always comparing the same thing.

"sed r -" is an 'r' command with no filename while the "sed 'r -'" is an
'r' command with a filename of '-'.  It appears that GNU sed and BusyBox
sed handle an 'r' command with no filename differently.

Also note that printf doesn't issue a newline at the end of the string.
This can affect the results.

Ron
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox