+1 Thanks, Glenn
From: Arvind Surve <ac...@yahoo.com.INVALID> To: "dev@systemml.incubator.apache.org" <dev@systemml.incubator.apache.org> Date: 02/18/2017 10:01 PM Subject: Re: Weighted Statistical Estimates +1 ------------------ Arvind Surve Spark Technology Center http://www.spark.tc/ From: Felix Schüler <fschue...@posteo.de> To: dev@systemml.incubator.apache.org Sent: Saturday, February 18, 2017 9:42 PM Subject: Re: Weighted Statistical Estimates Sounds good! -Felix On 18.02.2017 21:20, Matthias Boehm wrote: > Going toward to our 1.0 release, I'd like to create consistency across our > weighted statistics. Conceptually, theses weights represent frequency > counts, i.e., multiplicities of input values. > > So far, our documentation does not state any restrictions on these weights > but some runtime operations require integer data (I), while others allow > arbitrary floating point data as indicated below: > > * moment > * cov > * aggregate > * table > * median (I) > * quantile (I) > * interQuartileMean (I) > > This can lead to unexpected errors as shown by recent issues such as > SYSTEMML-1265. Looking back to R and its packages like Hmisc or reldist, it > turns out that they all allow arbitrary weights. > > So, relaxing any restrictions of integer weights seems like the right > choice. As this changes the external behavior - albeit in a generalizing > manner - we should make this change now. If you have any concerns, let me > know. > > Regards, > Matthias >