[
https://issues.apache.org/jira/browse/MADLIB-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16365814#comment-16365814
]
ASF GitHub Bot commented on MADLIB-1202:
----------------------------------------
Github user iyerr3 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/234#discussion_r168525141
--- Diff: src/ports/postgres/modules/utilities/encode_categorical.py_in ---
@@ -317,7 +317,19 @@ class CategoricalEncoder(object):
if self.output_type not in ('array', 'svec'):
if not self._output_dictionary:
- value_names = {None: 'NULL',
+ ## MADLIB-1202
+ ## In postgres, boolean variables are always saved
+ ## as 'True', 'False' with the first letter as capital,
+ ## which will cause the generated column name as
+ ## <boolean column name>_True/False that needs double
+ ## quoting to query. To make it more convnient to user,
+ ## we cast them to lower case true/false so that the
+ ## generated column name is <boolean column
name>_true/false
+ ## The same logic applied to _null and _misc strs
+ if v in ('True', 'False'):
--- End diff --
Wouldn't this be better off as `if isinstance(v, bool)`?
> encode_categorical_variables() creates all lower case column names for
> boolean columns
> --------------------------------------------------------------------------------------
>
> Key: MADLIB-1202
> URL: https://issues.apache.org/jira/browse/MADLIB-1202
> Project: Apache MADlib
> Issue Type: Improvement
> Reporter: Jarrod Vawdrey
> Assignee: Jingyi Mei
> Priority: Minor
> Fix For: v1.14
>
>
>
> It would be handy if encode_categorical_variables() created lower case column
> names for boolean columns vs upper case that require double quoting to query.
> Current implementation generates "<boolean column name>_True" and "<boolean
> column name>_False".
> Improvement to generate <boolean column name>_true and <boolean column
> name>_false.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)