[mllib] Document frequency

Jatin Puri Mon, 14 Jan 2019 05:56:28 -0800

Hello.

As part of `org.apache.spark.ml.feature.IDFModel`, I think it is a good
idea to also expose:


1. Document frequency vector
2. Number of documents

We get the above for free currently and they just need to be exposed as
public val.

This avoids re-implementation for someone who needs to compute
DocumentFrequency of terms. Currently if someone needs df, then one would
need to reverse compute it based on the idf values obtained.

Afaik, we dont explicitly provide such a functionality in mllib. And we
don't need to have a separate class, if we can expose it in `IDFModel`
itself.

Does it sound alright?

Regards,
Jatin

[mllib] Document frequency

Reply via email to