Github user iyerr3 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/206#discussion_r154422562
--- Diff: src/ports/postgres/modules/stats/correlation.sql_in ---
@@ -207,8 +204,17 @@ Result:
</pre>
@par Notes
-Current implementation ignores a row that contains NULL entirely. This
means
-any correlation in such a row (with NULLs) does not contribute to the
final answer.
+
+Null values will be replaced by the mean of their respective columns (Mean
imputation/substitution). Mean imputation is a method in which the missing
value on a certain variable is replaced by the mean of the available cases.
This method maintains the sample size and is easy to use, but the variability
in the data is reduced, so the standard deviations and the variance estimates
tend to be underestimated. Please refer to [1] and [2] for details.
--- End diff --
Need to wrap this to within 80 char length
---