po 30. 11. 2020 v 22:56 odesílatel Pavel Stehule <pavel.steh...@gmail.com> napsal:
> > > po 30. 11. 2020 v 22:15 odesílatel Pavel Stehule <pavel.steh...@gmail.com> > napsal: > >> >> >> po 30. 11. 2020 v 14:14 odesílatel Peter Eisentraut < >> peter.eisentr...@enterprisedb.com> napsal: >> >>> On 2020-11-29 18:36, Pavel Stehule wrote: >>> > >>> > I don't really get the point of this function. There is AFAICT no >>> > function to produce this escaped format, and it's not a recognized >>> > interchange format. So under what circumstances would one need to >>> > use this? >>> > >>> > >>> > Some corporate data can be in CSV format with escaped unicode >>> > characters. Without this function it is not possible to decode these >>> > files without external application. >>> >>> I would like some supporting documentation on this. So far we only have >>> one stackoverflow question, and then this implementation, and they are >>> not even the same format. My worry is that if there is not precise >>> specification, then people are going to want to add things in the >>> future, and there will be no way to analyze such requests in a >>> principled way. >>> >>> >> I checked this and it is "prefix backslash-u hex" used by Java, >> JavaScript or RTF - >> https://billposer.org/Software/ListOfRepresentations.html >> >> In some languages (Python), there is decoder "unicode-escape". Java has >> a method escapeJava, for conversion from unicode to ascii. I can imagine so >> these data are from Java systems exported to 8bit strings - so this >> implementation can be accepted as referential. This format is used by >> https://docs.oracle.com/javase/8/docs/technotes/tools/unix/native2ascii.html >> tool too. >> >> Postgres can decode this format too, and the patch is based on Postgres >> implementation. I just implemented a different interface. >> >> Currently decode function does only text->bytea transformation. Maybe a >> more generic function "decode_text" and "encode_text" for similar cases can >> be better (here we need text->text transformation). But it looks like >> overengineering now. >> >> Maybe we introduce new encoding "ascii" and we can implement new >> conversions "ascii_to_utf8" and "utf8_to_ascii". It looks like the most >> clean solution. What do you think about it? >> > > a better name of new encoding can be "unicode-escape" than "ascii". We use > "to_ascii" function for different use case. > > set client_encoding to unicode-escape; > copy tab from xxx; > ... > > but it doesn't help when only a few columns from the table are in > unicode-escape format. > > probably the most complete solution can be from two steps: 1. introducing new encoding - "ascii_unicode_escape" with related conversions 2. introducing two new functions - text_escape and text_unescape with two parameters - source text and conversion name select text_convert_to('Тимати', 'ascii_unicode_escape') \u0422\u0438\u043c\u0430\u0442\u0438 .. result is text select text_convert_from('\u0422\u0438\u043c\u0430\u0442\u0438', 'ascii_unicode_escape') ┌──────────┐ │ ?column? │ ╞══════════╡ │ Тимати │ └──────────┘ (1 row) > > >> Regards >> >> Pavel >> >> >>