Re: proposal: unescape_text function

Pavel Stehule Tue, 01 Dec 2020 21:49:01 -0800

st 2. 12. 2020 v 0:05 odesílatel Andrew Dunstan <and...@dunslane.net>
napsal:

>
> On 11/30/20 8:14 AM, Peter Eisentraut wrote:
> > On 2020-11-29 18:36, Pavel Stehule wrote:
> >>
> >>     I don't really get the point of this function.  There is AFAICT no
> >>     function to produce this escaped format, and it's not a recognized
> >>     interchange format.  So under what circumstances would one need to
> >>     use this?
> >>
> >>
> >> Some corporate data can be in CSV format with escaped unicode
> >> characters. Without this function it is not possible to decode these
> >> files without external application.
> >
> > I would like some supporting documentation on this.  So far we only
> > have one stackoverflow question, and then this implementation, and
> > they are not even the same format.  My worry is that if there is not
> > precise specification, then people are going to want to add things in
> > the future, and there will be no way to analyze such requests in a
> > principled way.
> >
> >
> >
>
>
> Also, should this be an extension? I'm dubious about including such
> marginal uses in the core code unless there's a really good case for it.
>

I am not sure, and  I am inclined so it should be core functionality.

1. Although this use case is marginal, this is related to most used
encodings - ascii and unicode. 8 bit encodings enhanced about escaped
multibyte chars will be used for a very long time. Unfortunately - this
will be worse, because Postgres will be used more in the corporate
environment, where there is a bigger press to conserve very legacy
technologies without correct multibyte support. The core problem so this
issue is out of concept bytea -> text or text -> bytea transformations
supported by Postgres. This is text -> text transformation (for almost all
encoding based on ascii), that is not supported by Postgres now.

2. Postgres already has this functionality - but unfortunately there is a
limit just only literal constants.

create or replace function uunescape(text)
returns text as $$
declare r text;
begin
  -- don't use this code!!!
  execute 'select e''' || $1 || '''' into r;
  return r;
end;
$$ language plpgsql immutable;

But one way how anybody can use it is SQL injection vulnerable and slow. So
some simple buildin solution can be protection against some future security
issues. Personally I am happy with just this limited function that will be
safe (although the design based on introducing new encoding and conversions
can be more complete and accurate). I agree so this case is marginal, but
it is a fully valid use case, and supporting unicode escaped codes just by
parser is a needless limit.

3. there are new disadvantages of extensions in current DBaaS times. Until
the extension is not directly accepted by a cloud provider, then the
extension is not available for users. The acceptance of extensions is not
too agile - so moving this code to extension doesn't solve this problem.
Without DBaaS the implementation of this feature as the extensions can be
good enough.

Regards

Pavel

>
> cheers
>
>
> andrew
>
>

Re: proposal: unescape_text function

Reply via email to