Yes, the workaround is to create multiple StringIndexers as you described. OneHotEncoderEstimator is only in Spark 2.3.0, you will have to use just OneHotEncoder.
On Tue, May 15, 2018, 8:40 AM Mina Aslani <aslanim...@gmail.com> wrote: > Hi, > > So, what is the workaround? Should I create multiple indexer(one for each > column), and then create pipeline and set stages to have all the > StringIndexers? > I am using 2.2.1 as I cannot move to 2.3.0. Looks like > oneHotEncoderEstimator is broken, please see my email sent today with > subject: > OneHotEncoderEstimator - java.lang.NoSuchMethodError: org.apache.spark.sql > .Dataset.withColumns > > Regards, > Mina > > On Tue, May 15, 2018 at 2:37 AM, Nick Pentreath <nick.pentre...@gmail.com> > wrote: > >> Multi column support for StringIndexer didn’t make it into Spark 2.3.0 >> >> The PR is still in progress I think - should be available in 2.4.0 >> >> On Mon, 14 May 2018 at 22:32, Mina Aslani <aslanim...@gmail.com> wrote: >> >>> Please take a look at the api doc: >>> https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/ml/feature/StringIndexer.html >>> >>> On Mon, May 14, 2018 at 4:30 PM, Mina Aslani <aslanim...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> There is no SetInputCols/SetOutputCols for StringIndexer in Spark java. >>>> How multiple input/output columns can be specified then? >>>> >>>> Regards, >>>> Mina >>>> >>> >>> >