Hi,

Any help on above mail use case ?

Regards,
Rajesh

On Tue, Sep 6, 2016 at 5:40 PM, Madabhattula Rajesh Kumar <
mrajaf...@gmail.com> wrote:

> Hi,
>
> I am new to Spark ML, trying to create a LabeledPoint from categorical
> dataset(example code from spark). For this, I am using One-hot encoding
> <http://en.wikipedia.org/wiki/One-hot> feature. Below is my code
>
> val df = sparkSession.createDataFrame(Seq(
>       (0, "a"),
>       (1, "b"),
>       (2, "c"),
>       (3, "a"),
>       (4, "a"),
>       (5, "c"),
>       (6, "d"))).toDF("id", "category")
>
>     val indexer = new StringIndexer()
>       .setInputCol("category")
>       .setOutputCol("categoryIndex")
>       .fit(df)
>
>     val indexed = indexer.transform(df)
>
>     indexed.select("category", "categoryIndex").show()
>
>     val encoder = new OneHotEncoder()
>       .setInputCol("categoryIndex")
>       .setOutputCol("categoryVec")
>     val encoded = encoder.transform(indexed)
>
>      encoded.select("id", "category", "categoryVec").show()
>
> *Output :- *
> +---+--------+-------------+
> | id|category|  categoryVec|
> +---+--------+-------------+
> |  0|       a|(3,[0],[1.0])|
> |  1|       b|    (3,[],[])|
> |  2|       c|(3,[1],[1.0])|
> |  3|       a|(3,[0],[1.0])|
> |  4|       a|(3,[0],[1.0])|
> |  5|       c|(3,[1],[1.0])|
> |  6|       d|(3,[2],[1.0])|
> +---+--------+-------------+
>
> *Creating LablePoint from encoded dataframe:-*
>
> val data = encoded.rdd.map { x =>
>       {
>         val featureVector = Vectors.dense(x.getAs[org.
> apache.spark.ml.linalg.SparseVector]("categoryVec").toArray)
>         val label = x.getAs[java.lang.Integer]("id").toDouble
>         LabeledPoint(label, featureVector)
>       }
>     }
>
>     data.foreach { x => println(x) }
>
> *Output :-*
>
> (0.0,[1.0,0.0,0.0])
> (1.0,[0.0,0.0,0.0])
> (2.0,[0.0,1.0,0.0])
> (3.0,[1.0,0.0,0.0])
> (4.0,[1.0,0.0,0.0])
> (5.0,[0.0,1.0,0.0])
> (6.0,[0.0,0.0,1.0])
>
> I have a four categorical values like a, b, c, d. I am expecting 4
> features in the above LablePoint but it has only 3 features.
>
> Please help me to creation of LablePoint from categorical features.
>
> Regards,
> Rajesh
>
>
>

Reply via email to