This would be interesting and a good addition I think. It bears some thought about the API though. One approach is to have an "inverseTransform" method similar to sklearn.
The other approach is to "formalize" something like StringIndexerModel -> IndexToString. Here, the inverse transformer is a standalone transformer. It could be returned from a "getInverseTransformer" method, for example. The former approach is simpler, but cannot be used in pipelines (which work on "fit" / "transform"). The latter approach is more cumbersome, but fits better into pipelines. So it depends on the use cases - i.e. how common is it to use the inverse transform function within a pipeline (for StringIndexer <-> IndexToString it is quite common to get back the labels, while for other transformers it may or may not be). On Mon, 8 Jan 2018 at 11:10 Tomasz Dudek <megatrontomaszdu...@gmail.com> wrote: > Hello, > > since the similar question on StackOverflow remains unanswered ( > https://stackoverflow.com/questions/46092114/is-there-no-inverse-transform-method-for-a-scaler-like-minmaxscaler-in-spark > ) and perhaps there is a solution that I am not aware of, I'll ask: > > After traning MinMaxScaler(or similar scaler) is there any built-in way to > revert the process? What I mean is to transform the scaled data back to its > original form. SKlearn has a dedicated method inverse_transform that does > exactly that. > > I can, of course, get the originalMin/originalMax Vectors from the > MinMaxScalerModel and then map the values myself but it would be nice to > have it built-in. > > Yours, > Tomasz > >