A NOTE has been added to this issue. ====================================================================== https://www.austingroupbugs.net/view.php?id=1959 ====================================================================== Reported By: collinfunk Assigned To: ====================================================================== Project: 1003.1(2024)/Issue8 Issue ID: 1959 Category: Shell and Utilities Tags: tc1-2024 Type: Clarification Requested Severity: Editorial Priority: normal Status: Interpretation Required Name: Organization: GNU User Reference: Section: XCU dd Page Number: 2778 Line Number: 91990 - 91996 Interp Status: Proposed Final Accepted Text: https://www.austingroupbugs.net/view.php?id=1959#c7335 Resolution: Accepted As Marked Fixed in Version: ====================================================================== Date Submitted: 2025-11-13 23:13 UTC Last Modified: 2025-12-16 07:16 UTC ====================================================================== Summary: dd conv=lcase and conv=ucase should only translate single byte locales ======================================================================
---------------------------------------------------------------------- (0007340) stephane (reporter) - 2025-12-16 07:16 https://www.austingroupbugs.net/view.php?id=1959#c7340 ---------------------------------------------------------------------- I don't think: > dd if=/dev/tape ibs=800 cbs=80 conv=ascii | tr '[:upper:]' '[:lower:]' Makes much sense. That conv=ascii is meant to be a "EBCDIC" to "ASCII", but with tables that cover values 0 to 255 even though ASCII only has characters for bytes 0 to 127. One may think it's about conversion from some superset of ASCII such as one of the ISO8859-x character sets to some national variant of EBCDIC, but it does not appear to be. Those conversion tables appear to be taken straight from the original dd implementation from Unix v5 in the early 70s (https://github.com/dspinellis/unix-history-repo/blob/Research-V5/usr/source/s1/dd.c#L31-L100), and PWB Unix for the "IBM" variant. Those tables reference OCR characters โโโ (https://en.wikipedia.org/wiki/OCR-A) which are not found in any modern single-byte character set (let alone ASCII based or EBCDIC-based ones) and suggest those conversion tables (and the "print train" terminology) is about interaction with long-forgotten technology from the era (long before Unix localisation, long before ISO8859-x let alone Unicode and UTF-8). The dd text mentions "standard EBCDIC" as if there was *one* such standard. Same in iconv: > The iconv utility may support the conversion between ASCII and EBCDIC-based > encodings, but is not required to do so. In an XSI-compliant implementation, > the dd utility is the only method guaranteed to support conversion between > these two character sets In the rationale section of dd: > 2. EBCDIC 0137 (') translates to/from ASCII 0236 ('^'). In the standard > table, EBCDIC 0232 (no graphic) is used. ASCII ^ is 0136, not 0236 (ASCII only covers 0 to 0177). In the "EBCDIC" table, we indeed see 0232 for ASCII 0136, and for the "IBM" one, we see 0137 and a ยด glyph (not in ASCII). So the output of dd conv=ascii would be in a charset that is a superset of ASCII but is not in use today, and in particular not found as the charmap in any locale. dd if=/dev/tape ibs=800 cbs=80 conv=ascii | LC_ALL=C tr '[:upper:]' '[:lower:]' May make sense (maybe to process a tape recovered from some museum?) on ASCII-based systems as would transliterate only A-Z to a-z, but then again, you'd rather use: LC_ALL=C dd if=/dev/tape ibs=800 cbs=80 conv=ascii,lcase then. On non-ASCII systems, the output of dd conv=ascii would not be text that can be processed by tr in any locale. I would remove any attempt to do case conversion on the output of dd conv=ascii/ebcdic/ibm, and (though that would likely be for a separate bug), clarify that those conv=ascii/ebcdic/ibm are legacy stuff of no relevance today and modernise the text so it can be understandable by someone who hasn't happened to have worked at AT&T in the 70s/80s. Or maybe drop those altogether. Issue History Date Modified Username Field Change ====================================================================== 2025-11-13 23:13 collinfunk New Issue 2025-12-11 17:17 geoffclare Note Added: 0007335 2025-12-11 17:18 geoffclare Status New => Interpretation Required 2025-12-11 17:18 geoffclare Resolution Open => Accepted As Marked 2025-12-11 17:18 geoffclare Name Your Name Here => 2025-12-11 17:18 geoffclare Interp Status => Pending 2025-12-11 17:18 geoffclare Final Accepted Text => https://www.austingroupbugs.net/view.php?id=1959#c7335 2025-12-11 17:19 geoffclare Tag Attached: tc1-2024 2025-12-15 06:55 ajosey Interp Status Pending => Proposed 2025-12-15 06:55 ajosey Note Added: 0007338 2025-12-16 07:16 stephane Note Added: 0007340 ======================================================================
