GitHub user iyerr3 opened a pull request:
https://github.com/apache/madlib/pull/268
DT: Don't use NULL value to get dep_var type
JIRA: MADLIB-1233
Function `_is_dep_categorical` is used to obtain the type of the
dependent variable expression. This function gets a random value using
`LIMIT 1` and checks the type of the corresponding value in Python.
Further this does not filter out NULL values.
Since NULL values are not filtered out,
it's possible the `LIMIT 1` returns a "None" type in Python, leading to
incorrect results.
This commit updates the type extraction by checking the type in the
database instead of in Python and also filters out NULL values.
Additionally it checks if at least one non-NULL value is obtained, else
throws an appropriate error.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/madlib/madlib bugfix/dt_dep_var_type_null
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/madlib/pull/268.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #268
----
commit 6570a27278aca301c0c8869899a8fad26c69bb7d
Author: Rahul Iyer <riyer@...>
Date: 2018-05-01T21:24:34Z
DT: Don't use NULL value to get dep_var type
JIRA: MADLIB-1233
Function `_is_dep_categorical` is used to obtain the type of the
dependent variable expression. This function gets a random value using
`LIMIT 1` and checks the type of the corresponding value in Python.
Further this does not filter out NULL values.
Since NULL values are not filtered out,
it's possible the `LIMIT 1` returns a "None" type in Python, leading to
incorrect results.
This commit updates the type extraction by checking the type in the
database instead of in Python and also filters out NULL values.
Additionally it checks if at least one non-NULL value is obtained, else
throws an appropriate error.
----
---