On 7/10/20 8:15 PM, Jordan Geoghegan wrote:


On 2020-07-10 16:59, Rosen Penev wrote:
On Fri, Jul 10, 2020 at 4:17 PM Jordan Geoghegan <jor...@geoghegan.ca> wrote:


On 2020-07-10 14:54, Rosen Penev wrote:
On Fri, Jul 10, 2020 at 2:29 PM Jordan Geoghegan <jor...@geoghegan.ca> wrote:

On 2020-07-10 14:15, Magnus Kroken wrote:
Hi Jordan

On 10.07.2020 22:45, Jordan Geoghegan wrote:
Hey folks,

Does the 'tr' utility support character classes in OpenWRT? I was
playing around with an OpenWRT x86_64 VM and I noticed that 'tr'
doesn't seem to support character classes.
The command " echo HELLO | tr '[:upper:]' '[:lower:]' "  does not
convert to the text to lowercase as it should (and as required by
POSIX).
This would be expected behavior. OpenWrt disables tr character classes
in BusyBox by default, see [1]:

config BUSYBOX_DEFAULT_FEATURE_TR_CLASSES
          bool
          default n
config BUSYBOX_DEFAULT_FEATURE_TR_EQUIV
          bool
          default n

I don't know what the size cost in the BusyBox binary is, but that
will likely be the deciding factor for such a change.

1:
https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=package/utils/busybox/Config-defaults.in

Regards,
Magnus Kroken
Hi Magnus,

Thanks for confirming that so quickly.

I obviously understand that space saving is essential to OpenWRT, but
POSIX does require[1] that 'tr' support character classes:
awk '{print toupper($0)}' is an alternative.
Yes, but this means that any script expecting tr to work correctly could
explode, as tr silently ignores the character class and treats all the
characters literally.
git grep upper | grep tr\ | wc -l
3

In the packages feed. All those results are things that run on the
host, not on OpenWrt.

tr a-z A-Z works as an alternative and is used in many places.
tr a-z A-Z is bad practice as it can behave unexpectedly in different locales; I've also heard tales of folks with Turkish locales having issues with '0-9' for example. Is a couple kb of space worth such a loss in portability (not to mention deviating heavily from POSIX)?
:class:
                Represents all characters belonging to the defined character class, as defined by the current setting of the LC_CTYPE locale  cate-                 gory. The following character class names shall be accepted when specified in string1:

                  alnum    blank   digit   lower   punct   upper
                  alpha    cntrl   graph   print   space   xdigit


1: https://www.unix.com/man-page/posix/1posix/tr/

Unless there is an overwhelming size cost, basic POSIX binaries should be provided "POSIX'ly correct" by default. Applying experimental theory, a discipline's standard is the null hypothesis (H0) which is the default decision. A deviation to the standard and especially _shorting_ the standard is the alternate hypothesis (H1) and requires good data with separation to accept. (standards often permit well formulated extensions to them.)

_______________________________________________
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel

Reply via email to