Frank McQuillan created MADLIB-1237:
---------------------------------------
Summary: Mini-batch preprocessor fails for dt_golf dataset
Key: MADLIB-1237
URL: https://issues.apache.org/jira/browse/MADLIB-1237
Project: Apache MADlib
Issue Type: Bug
Components: Module: Utilities
Reporter: Frank McQuillan
Fix For: v1.15
For the dt_golf data set from
http://madlib.apache.org/docs/latest/group__grp__decision__tree.html#examples
minibatch pre-processor fails
{code}
madlib=# SELECT madlib.minibatch_preprocessor('dt_golf',
'dt_golf_packed_2',
'class',
'"Temp_Humidity"', NULL ,1, True);
ERROR: spiexceptions.SyntaxError: syntax error at or near "t"
LINE 8: ...T madlib.array_contains_null(ARRAY[(class) = 'Don't Play', (...
^
QUERY:
SELECT SUM(source_table_row_count_by_group) AS source_table_row_count,
SUM(num_rows_processed_by_group) AS total_num_rows_processed,
AVG(num_rows_processed_by_group) AS avg_num_rows_processed
FROM (
SELECT COUNT(*) AS source_table_row_count_by_group,
SUM(CASE
WHEN NOT madlib.array_contains_null(ARRAY[(class) = 'Don't Play', (class) =
'Play']::INTEGER[]) AND
NOT madlib.array_contains_null(("Temp_Humidity")::DOUBLE PRECISION[])
THEN 1
ELSE 0
END) AS num_rows_processed_by_group
FROM dt_golf
) AS s
CONTEXT: Traceback (most recent call last):
PL/Python function "minibatch_preprocessor", line 24, in <module>
minibatch_preprocessor_obj.minibatch_preprocessor()
PL/Python function "minibatch_preprocessor", line 45, in wrapper
PL/Python function "minibatch_preprocessor", line 104, in
minibatch_preprocessor
PL/Python function "minibatch_preprocessor", line 236, in
_get_skipped_rows_processed_count
PL/Python function "minibatch_preprocessor"
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)