Yes that seems OK to me. On Mon, Jan 14, 2019 at 9:40 AM Jatin Puri <purija...@gmail.com> wrote: > > Thanks for the response. So do I go ahead and create a jira ticket? > Can then send a pull request for the same with the changes. > > On Mon, Jan 14, 2019 at 8:18 PM Sean Owen <sro...@gmail.com> wrote: >> >> I think that's reasonable. The caller probably has the number of docs >> already but sure, it's one long and is already computed. This would >> have to be added to Pyspark too. >> >> On Mon, Jan 14, 2019 at 7:56 AM Jatin Puri <purija...@gmail.com> wrote: >> > >> > Hello. >> > >> > As part of `org.apache.spark.ml.feature.IDFModel`, I think it is a good >> > idea to also expose: >> > >> > 1. Document frequency vector >> > 2. Number of documents >> > >> > We get the above for free currently and they just need to be exposed as >> > public val. >> > >> > This avoids re-implementation for someone who needs to compute >> > DocumentFrequency of terms. Currently if someone needs df, then one would >> > need to reverse compute it based on the idf values obtained. >> > >> > Afaik, we dont explicitly provide such a functionality in mllib. And we >> > don't need to have a separate class, if we can expose it in `IDFModel` >> > itself. >> > >> > Does it sound alright? >> > >> > Regards, >> > Jatin >> > > > > > -- > Jatin Puri > http://jatinpuri.com >
--------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org