+1 Thanks, Glenn
From: Arvind Surve <[email protected]> To: "[email protected]" <[email protected]> Date: 02/18/2017 10:01 PM Subject: Re: Weighted Statistical Estimates +1 ------------------ Arvind Surve Spark Technology Center http://www.spark.tc/ From: Felix Schüler <[email protected]> To: [email protected] Sent: Saturday, February 18, 2017 9:42 PM Subject: Re: Weighted Statistical Estimates Sounds good! -Felix On 18.02.2017 21:20, Matthias Boehm wrote: > Going toward to our 1.0 release, I'd like to create consistency across our > weighted statistics. Conceptually, theses weights represent frequency > counts, i.e., multiplicities of input values. > > So far, our documentation does not state any restrictions on these weights > but some runtime operations require integer data (I), while others allow > arbitrary floating point data as indicated below: > > * moment > * cov > * aggregate > * table > * median (I) > * quantile (I) > * interQuartileMean (I) > > This can lead to unexpected errors as shown by recent issues such as > SYSTEMML-1265. Looking back to R and its packages like Hmisc or reldist, it > turns out that they all allow arbitrary weights. > > So, relaxing any restrictions of integer weights seems like the right > choice. As this changes the external behavior - albeit in a generalizing > manner - we should make this change now. If you have any concerns, let me > know. > > Regards, > Matthias >
