[
https://issues.apache.org/jira/browse/MADLIB-1243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530370#comment-16530370
]
Frank McQuillan edited comment on MADLIB-1243 at 7/2/18 7:59 PM:
-----------------------------------------------------------------
https://github.com/apache/madlib/pull/281
is the relevent PR.
Testing example from description:
{code}
DROP TABLE IF EXISTS abalone_special_char;
CREATE TABLE abalone_special_char (
id serial,
"se''x" character varying,
"len'%*()gth" double precision,
diameter double precision,
height double precision,
"ClaЖss" integer
);
COPY abalone_special_char ("se''x", "len'%*()gth", diameter, height, "ClaЖss")
FROM stdin WITH DELIMITER '|' NULL as '@';
F"F|0.475|0.37|0.125|2
F'F|0.475|0.37|0.125|2
F$F|0.475|0.37|0.125|2
MЖM|0.475|0.37|0.125|2
M@[}(:*;M|0.475|0.37|0.125|2
M,M|0.475|0.37|0.125|2
\.
{code}
{code}
select encode_categorical_variables('abalone_special_char',
'abalone_special_char_out2',
'"se''''x"', '',
NULL, '3'
);
select * from abalone_special_char_out2;
{code}
produces
{code}
select * from abalone_special_char_out2;
id | len'%*()gth | diameter | height | Class | se''x_MЖM | se''x_M,M |
se''x_F"F | se''x__misc__
----+-------------+----------+--------+-------+-----------+-----------+-----------+---------------
3 | 0.475 | 0.37 | 0.125 | 2 | 0 | 0 |
0 | 1
4 | 0.475 | 0.37 | 0.125 | 2 | 1 | 0 |
0 | 0
1 | 0.475 | 0.37 | 0.125 | 2 | 0 | 0 |
1 | 0
5 | 0.475 | 0.37 | 0.125 | 2 | 0 | 0 |
0 | 1
2 | 0.475 | 0.37 | 0.125 | 2 | 0 | 0 |
0 | 1
6 | 0.475 | 0.37 | 0.125 | 2 | 0 | 1 |
0 | 0
(6 rows)
{code}
LGTM
was (Author: fmcquillan):
https://github.com/apache/madlib/pull/281
is the relevent PR.
Testing example from description:
{code}
DROP TABLE IF EXISTS abalone_special_char;
CREATE TABLE abalone_special_char (
id serial,
"se''x" character varying,
"len'%*()gth" double precision,
diameter double precision,
height double precision,
"ClaЖss" integer
);
COPY abalone_special_char ("se''x", "len'%*()gth", diameter, height, "ClaЖss")
FROM stdin WITH DELIMITER '|' NULL as '@';
F"F|0.475|0.37|0.125|2
F'F|0.475|0.37|0.125|2
F$F|0.475|0.37|0.125|2
MЖM|0.475|0.37|0.125|2
M@[}(:*;M|0.475|0.37|0.125|2
M,M|0.475|0.37|0.125|2
\.
{code}
{code}
select encode_categorical_variables('abalone_special_char',
'abalone_special_char_out2',
'"se''''x"', '',
NULL, '3'
);
select * from abalone_special_char_out2;
{code}
produces
{code}
select * from abalone_special_char_out2;
id | len'%*()gth | diameter | height | Class | se''x_MЖM | se''x_M,M |
se''x_F"F | se''x__misc__
----+-------------+----------+--------+-------+-----------+-----------+-----------+---------------
3 | 0.475 | 0.37 | 0.125 | 2 | 0 | 0 |
0 | 1
4 | 0.475 | 0.37 | 0.125 | 2 | 1 | 0 |
0 | 0
1 | 0.475 | 0.37 | 0.125 | 2 | 0 | 0 |
1 | 0
5 | 0.475 | 0.37 | 0.125 | 2 | 0 | 0 |
0 | 1
2 | 0.475 | 0.37 | 0.125 | 2 | 0 | 0 |
0 | 1
6 | 0.475 | 0.37 | 0.125 | 2 | 0 | 1 |
0 | 0
(6 rows)
{code}
> Encode_categorical_variables doesn't work with column name with special
> characters when specifying top
> ------------------------------------------------------------------------------------------------------
>
> Key: MADLIB-1243
> URL: https://issues.apache.org/jira/browse/MADLIB-1243
> Project: Apache MADlib
> Issue Type: Bug
> Components: Module: Utilities
> Reporter: Jingyi Mei
> Assignee: Jingyi Mei
> Priority: Minor
> Fix For: v1.15
>
>
> Encode_categorical_variables doesn't work with column name with special
> characters when specifying 'top' value as input parameter. Here is the repro:
> 1. Create table with special character in column name
> {code:java}
> DROP TABLE IF EXISTS abalone_special_char;
> CREATE TABLE abalone_special_char (
> id serial,
> "se''x" character varying,
> "len'%*()gth" double precision,
> diameter double precision,
> height double precision,
> "ClaЖss" integer
> );
> COPY abalone_special_char ("se''x", "len'%*()gth", diameter, height,
> "ClaЖss") FROM stdin WITH DELIMITER '|' NULL as '@';
> F"F|0.475|0.37|0.125|2
> F'F|0.475|0.37|0.125|2
> F$F|0.475|0.37|0.125|2
> MЖM|0.475|0.37|0.125|2
> M@[}(:*;M|0.475|0.37|0.125|2
> M,M|0.475|0.37|0.125|2
> \.{code}
> 2. call encode_categorical_variables with "se''x" as categorical column name
> and specify 3 as top value:
> {code:java}
> select encode_categorical_variables('abalone_special_char',
> 'abalone_special_char_out2',
> '"se''''x"', '',
> NULL, '3'
> );{code}
> Here is the error msg:
> {code:java}
> ERROR: KeyError: '"se\'\'x"' (plpython.c:4960)
> CONTEXT: Traceback (most recent call last):
> PL/Python function "encode_categorical_variables", line 23, in <module>
> return encode_categorical.encode_categorical_variables(**globals())
> PL/Python function "encode_categorical_variables", line 611, in
> encode_categorical_variables
> PL/Python function "encode_categorical_variables", line 104, in
> build_output_table
> PL/Python function "encode_categorical_variables", line 342, in
> _build_encoding_str
> PL/Python function "encode_categorical_variables"{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)