[ https://issues.apache.org/jira/browse/MADLIB-1237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490072#comment-16490072 ]
Jingyi Mei edited comment on MADLIB-1237 at 5/25/18 1:13 AM: ------------------------------------------------------------- We have this special character issue when: 1. we do a select and have a where clause like {code:java} WHERE column_name = 'something'with*special$character'{code} And 2.when we try to create a table with column likeĀ {code:java} CREATE TABLE example_table AS SELECT '{ele'with*special_char, 'M,M', 'M$M'}'::text[] AS class_values{code} Or {code:java} CREATE TABLE example_table AS SELECT ARRAY['ele'with*special_char', 'M,M', 'M$M'] AS class_values{code} We need to handle all situations and make special character and also unicode work. was (Author: jingyimei): We have this special character issue when: 1. we do a select and have a where clause like {code:java} WHERE column_name = 'something'with*special$character'{code} And 2.when we try to create a table with column likeĀ {code:java} CREATE TABLE example_table AS SELECT '{ele'with*special_char, 'M,M', 'M$M'}'::text[] AS class_values{code} We need to handle both situations and make special character and also unicode work. > Mini-batch preprocessor fails for dt_golf dataset > -------------------------------------------------- > > Key: MADLIB-1237 > URL: https://issues.apache.org/jira/browse/MADLIB-1237 > Project: Apache MADlib > Issue Type: Bug > Components: Module: Utilities > Reporter: Frank McQuillan > Priority: Major > Fix For: v1.15 > > > For the dt_golf data set from > http://madlib.apache.org/docs/latest/group__grp__decision__tree.html#examples > minibatch pre-processor fails > {code} > SELECT madlib.minibatch_preprocessor('dt_golf', > 'dt_golf_packed_2', > 'class', > '"Temp_Humidity"', NULL ,1, True); > ERROR: spiexceptions.SyntaxError: syntax error at or near "t" > LINE 8: ...T madlib.array_contains_null(ARRAY[(class) = 'Don't Play', (... > ^ > QUERY: > SELECT SUM(source_table_row_count_by_group) AS source_table_row_count, > SUM(num_rows_processed_by_group) AS total_num_rows_processed, > AVG(num_rows_processed_by_group) AS avg_num_rows_processed > FROM ( > SELECT COUNT(*) AS source_table_row_count_by_group, > SUM(CASE > WHEN NOT madlib.array_contains_null(ARRAY[(class) = 'Don't Play', (class) = > 'Play']::INTEGER[]) AND > NOT madlib.array_contains_null(("Temp_Humidity")::DOUBLE PRECISION[]) > THEN 1 > ELSE 0 > END) AS num_rows_processed_by_group > FROM dt_golf > ) AS s > CONTEXT: Traceback (most recent call last): > PL/Python function "minibatch_preprocessor", line 24, in <module> > minibatch_preprocessor_obj.minibatch_preprocessor() > PL/Python function "minibatch_preprocessor", line 45, in wrapper > PL/Python function "minibatch_preprocessor", line 104, in > minibatch_preprocessor > PL/Python function "minibatch_preprocessor", line 236, in > _get_skipped_rows_processed_count > PL/Python function "minibatch_preprocessor" > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)