Hi Marc,

I'm not much familiar with Spark on Java, but according to the doc 
<https://spark.apache.org/docs/latest/sql-getting-started.html#creating-datasets>,
 it should be:
Encoder<String> stringEncoder = Encoders.STRING();
dataset.as(stringEncoder);


For the record, it is much simpler in Scala:
dataset.as[String]


Of course, this will work if your DataFrame only contains one column of type 
String, e.g.:
val df = spark.read.parquet("Cyrano_de_Bergerac_Acte_V.parquet")
df.printSchema

root
 |-- line: string (nullable = true)

df.as[String]


Otherwise, you will have to convert somehow the Row to a String, e.g. in Scala:
case class Data(f1: String, f2: Int, f3: Long)
val df = Seq(Data("a", 1, 1L), Data("b", 2, 2L), Data("c", 3, 3L), Data("d", 4, 
4L), Data("e", 5, 5L)).toDF
val ds = df.map(_.mkString(",")).as[String]
ds.show

+-----+
|value|
+-----+
|a,1,1|
|b,2,2|
|c,3,3|
|d,4,4|
|e,5,5|
+-----+


Regards,
Christophe.

On 6/4/22 14:38, marc nicole wrote:
> Hi,
> How to convert a Dataset<Row> to a Dataset<String>?
> What i have tried is:
>
> List<String> list = dataset.as 
> <http://dataset.as>(Encoders.STRING()).collectAsList(); Dataset<String> 
> datasetSt = spark.createDataset(list, Encoders.STRING()); // But this line 
> raises a org.apache.spark.sql.AnalysisException: Try to map struct... to 
> Tuple1, but failed as the number of fields does not line up 
>
> Type of columns being String
> How to solve this?

Reply via email to