NFW; unless the documentation describes such bizarre behavior, it should *NOT* 
translate characters to SUB when there is a correct translation. If you want to 
preserve the length then use a character set in which all characters are 8 
bits, e.g., ISO-8869=15.


--
Shmuel (Seymour J.) Metz
http://mason.gmu.edu/~smetz3


________________________________________
From: IBM Mainframe Discussion List <IBM-MAIN@LISTSERV.UA.EDU> on behalf of 
Charles Mills <charl...@mcn.org>
Sent: Monday, November 16, 2020 4:14 PM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: FTP converting between UTF-8 and EBCDIC

If you tell FTP that the non-EBCDIC file is UTF-8 then FTP *should* convert
accented characters and such to EBCDIC SUB (X'3F') rather than to two bytes.
Should. YMMV.

Charles


-----Original Message-----
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On
Behalf Of Frank Swarbrick
Sent: Monday, November 16, 2020 10:16 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: FTP converting between UTF-8 and EBCDIC

The record is made up of multiple fixed-length fields.  I guess the field in
question technically didn't overflow.  But rather it "expanded" the field by
one byte, pushing every other field one byte to the right.  Likely the
program that creates the file is treating the "field length" as the number
of characters, rather than the number of bytes.  I've actually asked them to
create the file as ISO-8859-1 instead of UTF-8, and if they're willing/able
to do that then this entire discussion is moot.  But I wanted to have this
as a backup solution.

________________________________
From: IBM Mainframe Discussion List <IBM-MAIN@LISTSERV.UA.EDU> on behalf of
Paul Gilmartin <0000000433f07816-dmarc-requ...@listserv.ua.edu>
Sent: Monday, November 16, 2020 10:55 AM
To: IBM-MAIN@LISTSERV.UA.EDU <IBM-MAIN@LISTSERV.UA.EDU>
Subject: Re: FTP converting between UTF-8 and EBCDIC

On Mon, 16 Nov 2020 17:26:12 +0000, Frank Swarbrick wrote:

>Yes, it "overflowed" a fixed-length field.  x'C3A1' in the source file was
treated as two separate "ASCII" characters, x'C3' and x'A1'.  Since those
don't exist in the EBCDIC code page I am using they just get converted to
two "nonsense" characters.
>
How wide is that field?  You must have been on the bitter edge of the limit.
What happens if a client enters an actual surname exceeding the limit?

>I agree that ideally the input source would restrict the input.  But since
that's on another team, and this workaround is likely "good enough", that's
probably unlikely to happen.
>
What was the workaround you chose, converting to which EBCDIC CCSID?
Is there no possibility of a client's entering a character not in that
CCSID?
What happens if someone does?  Can you fuzz test or would that intrude
"on another team"?

I'd expect you need to do some filtering, perhaps to preclude SQL injection
downstream.  But that might be achieved by encoding.

(I guessed wrong: "á", not  "â".  Spellcheck flags both.)

-- gil

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Reply via email to