Hi All,

Simple in Java as well.
You can get the Dataset<Row> Directly

Dataset<Row> encodedString = df.select("Column")
.where("<Any Condition you want>")
.as(Encoders.STRING())
.toDF();

On Mon, 6 Jun 2022 at 15:26, Christophe Préaud <
christophe.pre...@kelkoogroup.com> wrote:

> Hi Marc,
>
> I'm not much familiar with Spark on Java, but according to the doc
> <https://spark.apache.org/docs/latest/sql-getting-started.html#creating-datasets>,
> it should be:
> Encoder<String> stringEncoder = Encoders.STRING();
> dataset.as(stringEncoder);
>
> For the record, it is much simpler in Scala:
> dataset.as[String]
>
> Of course, this will work if your DataFrame only contains one column of
> type String, e.g.:
> val df = spark.read.parquet("Cyrano_de_Bergerac_Acte_V.parquet")
> df.printSchema
>
> root
>  |-- line: string (nullable = true)
>
> df.as[String]
>
> Otherwise, you will have to convert somehow the Row to a String, e.g. in
> Scala:
> case class Data(f1: String, f2: Int, f3: Long)
> val df = Seq(Data("a", 1, 1L), Data("b", 2, 2L), Data("c", 3, 3L),
> Data("d", 4, 4L), Data("e", 5, 5L)).toDF
> val ds = df.map(_.mkString(",")).as[String]
> ds.show
>
> +-----+
> |value|
> +-----+
> |a,1,1|
> |b,2,2|
> |c,3,3|
> |d,4,4|
> |e,5,5|
> +-----+
>
> Regards,
> Christophe.
>
> On 6/4/22 14:38, marc nicole wrote:
>
> Hi,
> How to convert a Dataset<Row> to a Dataset<String>?
> What i have tried is:
>
> List<String> list = dataset.as(Encoders.STRING()).collectAsList();
> Dataset<String> datasetSt = spark.createDataset(list, Encoders.STRING());
> // But this line raises a org.apache.spark.sql.AnalysisException: Try to
> map struct... to Tuple1, but failed as the number of fields does not line
> up
>
> Type of columns being String
> How to solve this?
>
>
>

Reply via email to