Thanks for the response. So do I go ahead and create a jira ticket? Can then send a pull request for the same with the changes.
On Mon, Jan 14, 2019 at 8:18 PM Sean Owen <sro...@gmail.com> wrote: > I think that's reasonable. The caller probably has the number of docs > already but sure, it's one long and is already computed. This would > have to be added to Pyspark too. > > On Mon, Jan 14, 2019 at 7:56 AM Jatin Puri <purija...@gmail.com> wrote: > > > > Hello. > > > > As part of `org.apache.spark.ml.feature.IDFModel`, I think it is a good > idea to also expose: > > > > 1. Document frequency vector > > 2. Number of documents > > > > We get the above for free currently and they just need to be exposed as > public val. > > > > This avoids re-implementation for someone who needs to compute > DocumentFrequency of terms. Currently if someone needs df, then one would > need to reverse compute it based on the idf values obtained. > > > > Afaik, we dont explicitly provide such a functionality in mllib. And we > don't need to have a separate class, if we can expose it in `IDFModel` > itself. > > > > Does it sound alright? > > > > Regards, > > Jatin > > > -- Jatin Puri http://jatinpuri.com <http://www.jatinpuri.com>