Re: CVS commit: src/bin/sleep

2019-01-26 Thread Kamil Rytarowski
On 26.01.2019 02:30, Robert Elz wrote:
> Date:Fri, 25 Jan 2019 15:51:05 +0100
> From:Kamil Rytarowski 
> Message-ID:  
> 
>   | sort(1)
>   | stat(1)
> 
> Those take no floating point input that I can see.   For sort,
> its only use of floats would be sorting files containing them,
> for which (I assume) one would want and expect the file
> to be a locale specific format, and for sort to recognise
> the floats in a locale specific way.

This is where I disagree. In my opinion (of a native user of ",") -
parsing locale specific input for such programs doesn't make sense.

Locale specific format is in my opinion appropriate only for programs
that process text for printing (like man(1) or groff(1)).

>   Whether our sort does
> that or not I have not tested (I have no files with floating
> numbers in them, and if I did, they'd use '.' ...)
> 
>   | bc(1)
>   | dc(1)
> 
> Those have their own purpose built language with
> a tightly specified grammar.   They even consider
> A B C ... as numbers, not as letters.   They are
> useless as a point of consideration.
> 
>   | timeout(1)
> 
> That uses strtod() to parse the floating point command
> line args, so is locale specific, but just like FreeBSD's
> sleep, does not call setlocale() so runs only in the C
> (aka POSIX) locale, and so does not adapt in any
> way to the user's locale settings (including any output
> it might generate in the case of an error - no locale
> specific strerror() strings.)
> 
>   | printf(1)
> 
> This one perhaps.   But again, what it accepts is very
> precisely specified (that we had not noticed that we did
> not implement that was, I think, an oversight, which will
> be fixed).
> 
> kre
> 




signature.asc
Description: OpenPGP digital signature


Re: CVS commit: src/bin/sleep

2019-01-26 Thread Martin Husemann
On Sat, Jan 26, 2019 at 12:28:08PM +0100, Kamil Rytarowski wrote:
> > Those take no floating point input that I can see.   For sort,
> > its only use of floats would be sorting files containing them,
> > for which (I assume) one would want and expect the file
> > to be a locale specific format, and for sort to recognise
> > the floats in a locale specific way.
> 
> This is where I disagree. In my opinion (of a native user of ",") -
> parsing locale specific input for such programs doesn't make sense.

As another user of such a locale: I disagree, it makes perfect sense
to get properly numerically sorted output from sort if I specify 
the correct locale. I do a lot of financial stuff in sh, awk and
various other base system commands and sometimes input data comes from
external sources.

Martin


Re: CVS commit: src/bin/sleep

2019-01-26 Thread Robert Elz
Date:Sat, 26 Jan 2019 12:28:08 +0100
From:Kamil Rytarowski 
Message-ID:  

  | This is where I disagree. In my opinion (of a native user of ",") -
  | parsing locale specific input for such programs doesn't make sense.

I don't want to argue, as, as has been pointed out, I don't actively
use locales much, but this confuses me.

If you don't want to (normally) use locales at all, I'd expect you'd
just pretend to be me, and not set any of the LC variables at all,
except when you are going to run one of those special programs
which you want generate locale specific values.   Or perhaps
more likely, you might have LC_CTYPE set so your language
char set can be recognised, but none of the others.

Then all the normal programs like sort will act the way that you
want them too, no LC_NUMERIC so to sort floats they need to
be written with a '.', no LC_COLLATE so you get ascii ordering
(as I understand it, on NetBSD that's what you get anyway...)

Or you can do it the way you seem to be suggesting and have
all the programs (but a few) ignore locales (except LC_CTYPE?)
completely, so you can set the LC_ vars to whatever you like
and they will change nothing.   That seems a bit odd to me.

Then, what do you do, when someone sends you a file full of
columns of floating numbers (intermixed with integers, ie: when the
fraction would be 0 it jas just been omitted, along with the radix)
that are in the common European format, which also contains periods
in other places, including perhaps near the numbers and need to
sort it, and then send it nack.   WIth the first method, you'd just do
something like:
LC_NUMERIC=pl sort [options] file

but if sort is ignoring LC_NUMERIC, that won't work, and I suspect
that you'd need to cojvert the file from ',' floats to '.' floats, sort,
and then convert back - an error prone and annoying operation
(not impossible to do, but not always easy either - certainly not
as simple as tr , . followed by tr . , as that would convert chars
that used to be '.' (like after someone's initials, in a name) into
'.' incorrectly.   It all gets ugly, and typically takes special case
code for every different example you're faced with.

That doesn't seem appealing to me, and I think I'd like it if all
I had to do to deal with such a file was temporarily set an env
var and it would all just work, even if I didn't believe that format
file should ever exist.

Note, this is not about whether people ought to be using ',' as
the radix or not, whether that's gradually vanishing, ...   That
is a whole other question, and one where I probably agree with
you - the more meaningless differences there are in different
regions (which side of the road to drive on, how to write floating
numbers, metric or imperial, 24 or 12 hour clocks, ...) for which
there is no particularly good reason for most (well, metric is clearly
better than imperial, and 24 hour clocks are the way to go as
well ... IMO anyway) the world would be simpler, more efficient,
and generally better if all those differences could just magically
go away.   Some probably will, in time, others not., but that's out
of scope for us, we cannot make any of that happen here.

While the differences exist, we need to find ways to work with
them, not simply wish them away and pretend they don't exist.
That never works.

kre



Re: CVS commit: src/bin/sleep

2019-01-26 Thread Kamil Rytarowski
On 26.01.2019 16:05, Martin Husemann wrote:
> On Sat, Jan 26, 2019 at 12:28:08PM +0100, Kamil Rytarowski wrote:
>>> Those take no floating point input that I can see.   For sort,
>>> its only use of floats would be sorting files containing them,
>>> for which (I assume) one would want and expect the file
>>> to be a locale specific format, and for sort to recognise
>>> the floats in a locale specific way.
>>
>> This is where I disagree. In my opinion (of a native user of ",") -
>> parsing locale specific input for such programs doesn't make sense.
> 
> As another user of such a locale: I disagree, it makes perfect sense
> to get properly numerically sorted output from sort if I specify 
> the correct locale. I do a lot of financial stuff in sh, awk and
> various other base system commands and sometimes input data comes from
> external sources.
> 
> Martin
> 

I see. I've mentioned that these punctuations are used in my region
almost only in Office suites.. but if someone does this sort of tasks in
awk(1) and not with Excel/Calc-like program then there is a use-case
(not just hypothetical).



signature.asc
Description: OpenPGP digital signature


Re: CVS commit: src/bin/sleep

2019-01-26 Thread Joerg Sonnenberger
On Fri, Jan 25, 2019 at 12:29:55PM +0700, Robert Elz wrote:
> Date:Thu, 24 Jan 2019 16:18:49 +0100
> From:Joerg Sonnenberger 
> Message-ID:  <20190124151849.ga10...@britannica.bec.de>
> 
>   | This is overcomplicated and fragile, IMO.
> 
> ps:  if the fragility referred to is that it might now
> switch mid-stream into sending messages in English
> rather than in the locale's language - then that is
> a valid concern, and I could certainly change it
> to use strtod_l() in that case to avoid that problem.

No, the fragile refers to the problem that many locales use both "." and
"," in numbers. While the standards decided in their infinite wisdom
that grouping characters shouldn't be parsed in floating point context,
it is very confusing at least for casual users. "sleep 1.000" would be
perfectly sensible for a German user, but certainly not do what is
expected.

Arguing about locale behavior based on OpenBSD doesn't work, since they
intentionally doesn't implement most of it anyway.

Let's take a step back from the implementation details. I consider the
command line interface of a program part of the shell language universe.
Programming languages shouldn't change arbitrary based on locale
settings. Otherwise you get the VBA madness. That's very different from
the data being processed or messages used for interacting with the user.
Valery mentioned the Postscript example already.

Joerg


Re: CVS commit: src/bin/sleep

2019-01-26 Thread Robert Elz
Date:Sat, 26 Jan 2019 21:00:45 -0500
From:"Christos Zoulas" 
Message-ID:  <20190127020045.35a7df...@cvs.netbsd.org>

  | cast to intmax_t instead of long, since time_t is "long long"

Some of  this is unnecessary, though not technically wrong, Martin's
change was fine, and the same thing I would have done (was doing,
but he got the commit processed first!)

The value being printed has already been range checked, it would
actually be fine to print it as an int.  The same is true of another of
the ones you changed (the 2nd warnx()).   That is, we don't need
%jd to print values that we know are either < 2000, or < 10.

The first of the 3 warnx's that were altered ought more correctly (usefully)
be printed using %g as is done for the same value other places, I just
had not had any reason to alter that one recently - (that contained the
remnants of the original warnx() from the historic sleep.c which used
(long) for all of these values).   %g produices more rational output for
very large values than %(anything)d does, and it is only when we
have a very large value that there's any difference between %ld and
%jd (as long as the corresponding arg is the correct type, of course.)

kre





Re: CVS commit: src/bin/sleep

2019-01-26 Thread Christos Zoulas
I think it is easier and less error-prone to consistently cast time_t
to intmax_t instead of choosing how to cast based on knowing the range.

christos

> On Jan 26, 2019, at 9:59 PM, Robert Elz  wrote:
> 
>Date:Sat, 26 Jan 2019 21:00:45 -0500
>From:"Christos Zoulas" 
>Message-ID:  <20190127020045.35a7df...@cvs.netbsd.org>
> 
>  | cast to intmax_t instead of long, since time_t is "long long"
> 
> Some of  this is unnecessary, though not technically wrong, Martin's
> change was fine, and the same thing I would have done (was doing,
> but he got the commit processed first!)
> 
> The value being printed has already been range checked, it would
> actually be fine to print it as an int.  The same is true of another of
> the ones you changed (the 2nd warnx()).   That is, we don't need
> %jd to print values that we know are either < 2000, or < 10.
> 
> The first of the 3 warnx's that were altered ought more correctly (usefully)
> be printed using %g as is done for the same value other places, I just
> had not had any reason to alter that one recently - (that contained the
> remnants of the original warnx() from the historic sleep.c which used
> (long) for all of these values).   %g produices more rational output for
> very large values than %(anything)d does, and it is only when we
> have a very large value that there's any difference between %ld and
> %jd (as long as the corresponding arg is the correct type, of course.)
> 
> kre
> 
> 



Re: CVS commit: src/bin/sleep

2019-01-26 Thread Robert Elz
Date:Sat, 26 Jan 2019 21:49:51 +0100
From:Joerg Sonnenberger 
Message-ID:  <20190126204951.ga7...@britannica.bec.de>

  | No, the fragile refers to the problem that many locales use both "." and
  | "," in numbers.

Yes, like English...   I wasn't previously aware that '.' was ever used
as the grouping char, though I did believe that some locales use a
space for that purpose.

  | While the standards decided in their infinite wisdom
  | that grouping characters shouldn't be parsed in floating point context,
  | it is very confusing at least for casual users. "sleep 1.000" would be
  | perfectly sensible for a German user, but certainly not do what is
  | expected.

Do you have a solution for that which can actually be implemented?

While the original code I wrote to deal with the (not-PR'd) bug report
- the one you commented on originally from a week ago - was fragile in
this area, the current one is slightly less so I think.

If the arg can be parsed by strtod() in the user's locale, it will be.
If it cannot, but if it can be parsed in the C locale, it will be handled
that way instead.   If neither work it is an error.

This keeps the traditional behaviour of NetBSD sleep (with some error
checking added, anticipating dholland's PR 53910 before he submitted
it) while also allowing scripts to sanely (well, as sanely as them ever
using non-integral inputs is) use standard C floats as the input (as,
as you point out, a floating number is, rightly or wrongly, currently
only parsed as a string of digits and an optional single radix character,
no grouping chars allowed.)   [Aside: hex counts as "digits".]

The question of whether sleep (and perhaps other commands) ought
to parse their args in a locale specific manner is a different issue, and
one worthy of considerable discussion.   I have no strong opinion on
this as (as has been pointed out) it does not really affect me much.

  | Arguing about locale behavior based on OpenBSD doesn't work,
  | since they intentionally doesn't implement most of it anyway.

I do not think that was even the intent.   In order to determine the
"how should we parse the args" question, Christos suggested looking
at how other systems do it - which is valid data to have.   If we
collectively come up with some particularly compelling reason that
we should do it one way of the other, that we mostly agree on, then what
other systems do is largely irrelevant.   If there is no particularly good
reason to prefer one way or the other, or we cannot agree, some
developers/users prefer one, and others prefer the other, then at least
acting the same way as (most) other systems (which allow non-integer
args) might be enough to decide one way or the other.

  | Let's take a step back from the implementation details. I consider the
  | command line interface of a program part of the shell language universe.
  | Programming languages shouldn't change arbitrary based on locale
  | settings. Otherwise you get the VBA madness. That's very different from
  | the data being processed or messages used for interacting with the user.
  | Valery mentioned the Postscript example already.

That;s all a valid point, which really should be made in some discussion in
messages on a better list than source-changes-d (in messages with the
Subject header "Re: CVS commit: src/bin/sleep").   This is not where
someone from the fututure would expect to find a discussion on a
philisophical (or technical) reasons why we should decide one way
or the other, should they be wondering why we made the decision now
whichever way we end up making it, if 20 years (or more) into the
future this all comes up again.

But since this is here, one point I'd make is that there is no particular
distinguishing feature in the command line interface of a program, which
distinguishes it from data used with interacting with the user.The
program does not know from where its args were obtained.   If they're
written in a script, then I absolutely agree with you, it ought to use the
"standard" notation (one way or another).

But I know that I frequently simply type

sleep n

into my shell, and then follow that by a bunch of commands I want
executed a little later, and while it would be unusual for me to give
fractional seconds in such a case, if I did, I'd normally expect to
enter those in the same format I use for any other floating point number
in my day to day life, which is what I would have LC_NUMERIC
set up to produce and consume (that is, whatever I believe is best for
me, which is not necessarily the same as the guy at the next desk
in the same room ... obviously in the same country.)

Similarly, if a script requests a delay value from the user, as in the

printf 'How long should the delay be? '
read delay || exit 1
sleep "$delay"

type example, which is that?   Data obtained while interacting with the
user, or the command line interface of a program?

This gets a bit messy, a

Re: CVS commit: src/bin/sleep

2019-01-26 Thread Robert Elz
Date:Sat, 26 Jan 2019 23:23:32 -0500
From:Christos Zoulas 
Message-ID:  

  | I think it is easier and less error-prone to consistently cast time_t
  | to intmax_t instead of choosing how to cast based on knowing the range.

The real problem is the long standing abuse of time_t to represent
durations, which it really should never be used for.   This is like
using size_t where what we have is a ptrdiff_t, while they look
similar, and are likely to have similar ranges, they are not the
same thing.

That and that we don't have a PRItt (or something) for time_t

I may (someday) convert the one that should be %g, but I will leave
the others...

kre



Re: CVS commit: src/bin/sleep

2019-01-26 Thread Kamil Rytarowski
On 27.01.2019 05:42, Robert Elz wrote:
> Yes, like English...   I wasn't previously aware that '.' was ever used
> as the grouping char, though I did believe that some locales use a
> space for that purpose.

I don't know whether there is formality that is followed, but in
practice people use no distinction, spaces (or some automatic distance
in font grouping numbers) or dots/commas (the other character that has
been used for radix). In computer science and some programming languages
there are used '_' (especially for hex numbers).

One of the reasons I prefer to reduce it (at least for my own purposes)
to printing text, not parsing files.



signature.asc
Description: OpenPGP digital signature