Github user fmcquillan99 commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/206#discussion_r154252893
  
    --- Diff: src/ports/postgres/modules/stats/correlation.sql_in ---
    @@ -207,8 +203,9 @@ Result:
     </pre>
     
     @par Notes
    -Current implementation ignores a row that contains NULL entirely. This 
means
    -any correlation in such a row (with NULLs) does not contribute to the 
final answer.
    +
    +WARNING: Rows with NULL values will not be ignored. Null values will be
    +replaced by the mean of their respective columns.
    --- End diff --
    
    Here is a blog post on the dangers of mean imputation
    https://www.theanalysisfactor.com/mean-imputation/
    
    Can we point to some literature (and put in the user docs) that back up our 
plan to use mean imputation?



---

Reply via email to