Github user fmcquillan99 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/206#discussion_r154252893
--- Diff: src/ports/postgres/modules/stats/correlation.sql_in ---
@@ -207,8 +203,9 @@ Result:
</pre>
@par Notes
-Current implementation ignores a row that contains NULL entirely. This
means
-any correlation in such a row (with NULLs) does not contribute to the
final answer.
+
+WARNING: Rows with NULL values will not be ignored. Null values will be
+replaced by the mean of their respective columns.
--- End diff --
Here is a blog post on the dangers of mean imputation
https://www.theanalysisfactor.com/mean-imputation/
Can we point to some literature (and put in the user docs) that back up our
plan to use mean imputation?
---