[1003.1(2024)/Issue8 0001959]: dd conv=lcase and conv=ucase should only translate single byte locales

Austin Group Issue Tracker via austin-group-l at The Open Group Mon, 15 Dec 2025 23:22:31 -0800

A NOTE has been added to this issue. 
====================================================================== 
https://www.austingroupbugs.net/view.php?id=1959 
====================================================================== 
Reported By:                collinfunk
Assigned To:                
====================================================================== 
Project:                    1003.1(2024)/Issue8
Issue ID:                   1959
Category:                   Shell and Utilities
Tags:                       tc1-2024
Type:                       Clarification Requested
Severity:                   Editorial
Priority:                   normal
Status:                     Interpretation Required
Name:                        
Organization:               GNU 
User Reference:              
Section:                    XCU dd 
Page Number:                2778 
Line Number:                91990 - 91996 
Interp Status:              Proposed 
Final Accepted Text:       
https://www.austingroupbugs.net/view.php?id=1959#c7335 
Resolution:                 Accepted As Marked
Fixed in Version:           
====================================================================== 
Date Submitted:             2025-11-13 23:13 UTC
Last Modified:              2025-12-16 07:16 UTC
====================================================================== 
Summary:                    dd conv=lcase and conv=ucase should only translate
single byte locales
======================================================================


---------------------------------------------------------------------- 
 (0007340) stephane (reporter) - 2025-12-16 07:16
 https://www.austingroupbugs.net/view.php?id=1959#c7340 
---------------------------------------------------------------------- 
I don't think:

> dd if=/dev/tape ibs=800 cbs=80 conv=ascii | tr '[:upper:]' '[:lower:]'

Makes much sense.

That conv=ascii is meant to be a "EBCDIC" to "ASCII", but with tables that cover
values 0 to 255 even though ASCII only has characters for bytes 0 to 127.

One may think it's about conversion from some superset of ASCII such as one of
the ISO8859-x character sets to some national variant of EBCDIC, but it does not
appear to be.

Those conversion tables appear to be taken straight from the original dd
implementation from Unix v5 in the early 70s
(https://github.com/dspinellis/unix-history-repo/blob/Research-V5/usr/source/s1/dd.c#L31-L100),
and PWB Unix for the "IBM" variant.

Those tables reference OCR characters ⑀⑁⑂
(https://en.wikipedia.org/wiki/OCR-A) which are not found in any modern
single-byte character set (let alone ASCII based or EBCDIC-based ones) and
suggest those conversion tables (and the "print train" terminology) is about
interaction with long-forgotten technology from the era (long before Unix
localisation, long before ISO8859-x let alone Unicode and UTF-8).

The dd text mentions "standard EBCDIC" as if there was *one* such standard. Same
in iconv:

> The iconv utility may support the conversion between ASCII and EBCDIC-based   
                                                                                
                             > encodings, but is not required to do so. In an
XSI-compliant implementation,
> the dd utility is the only method guaranteed to support conversion between
> these two character sets

In the rationale section of dd:

> 2. EBCDIC 0137 (') translates to/from ASCII 0236 ('^'). In the standard
> table, EBCDIC 0232 (no graphic) is used.

ASCII ^ is 0136, not 0236 (ASCII only covers 0 to 0177). In the "EBCDIC" table,
we indeed see 0232 for ASCII 0136, and for the "IBM" one, we see 0137 and a ´
glyph (not in ASCII).

So the output of dd conv=ascii would be in a charset that is a superset of ASCII
but is not in use today, and in particular not found as the charmap in any
locale.

dd if=/dev/tape ibs=800 cbs=80 conv=ascii |
  LC_ALL=C tr '[:upper:]' '[:lower:]'

May make sense (maybe to process a tape recovered from some museum?) on
ASCII-based systems as would transliterate only A-Z to a-z, but then again,
you'd rather use:

LC_ALL=C dd if=/dev/tape ibs=800 cbs=80 conv=ascii,lcase

then.

On non-ASCII systems, the output of dd conv=ascii would not be text that can be
processed by tr in any locale.

I would remove any attempt to do case conversion on the output of dd
conv=ascii/ebcdic/ibm, and (though that would likely be for a separate bug),
clarify that those conv=ascii/ebcdic/ibm are legacy stuff of no relevance today
and modernise the text so it can be understandable by someone who hasn't
happened to have worked at AT&T in the 70s/80s. Or maybe drop those altogether. 

Issue History 
Date Modified    Username       Field                    Change               
====================================================================== 
2025-11-13 23:13 collinfunk     New Issue                                    
2025-12-11 17:17 geoffclare     Note Added: 0007335                          
2025-12-11 17:18 geoffclare     Status                   New => Interpretation
Required
2025-12-11 17:18 geoffclare     Resolution               Open => Accepted As
Marked
2025-12-11 17:18 geoffclare     Name                     Your Name Here =>   
2025-12-11 17:18 geoffclare     Interp Status             => Pending         
2025-12-11 17:18 geoffclare     Final Accepted Text       =>
https://www.austingroupbugs.net/view.php?id=1959#c7335    
2025-12-11 17:19 geoffclare     Tag Attached: tc1-2024                       
2025-12-15 06:55 ajosey         Interp Status            Pending => Proposed 
2025-12-15 06:55 ajosey         Note Added: 0007338                          
2025-12-16 07:16 stephane       Note Added: 0007340                          
======================================================================

[1003.1(2024)/Issue8 0001959]: dd conv=lcase and conv=ucase should only translate single byte locales

Reply via email to