Hi All, Simple in Java as well. You can get the Dataset<Row> Directly
Dataset<Row> encodedString = df.select("Column") .where("<Any Condition you want>") .as(Encoders.STRING()) .toDF(); On Mon, 6 Jun 2022 at 15:26, Christophe Préaud < christophe.pre...@kelkoogroup.com> wrote: > Hi Marc, > > I'm not much familiar with Spark on Java, but according to the doc > <https://spark.apache.org/docs/latest/sql-getting-started.html#creating-datasets>, > it should be: > Encoder<String> stringEncoder = Encoders.STRING(); > dataset.as(stringEncoder); > > For the record, it is much simpler in Scala: > dataset.as[String] > > Of course, this will work if your DataFrame only contains one column of > type String, e.g.: > val df = spark.read.parquet("Cyrano_de_Bergerac_Acte_V.parquet") > df.printSchema > > root > |-- line: string (nullable = true) > > df.as[String] > > Otherwise, you will have to convert somehow the Row to a String, e.g. in > Scala: > case class Data(f1: String, f2: Int, f3: Long) > val df = Seq(Data("a", 1, 1L), Data("b", 2, 2L), Data("c", 3, 3L), > Data("d", 4, 4L), Data("e", 5, 5L)).toDF > val ds = df.map(_.mkString(",")).as[String] > ds.show > > +-----+ > |value| > +-----+ > |a,1,1| > |b,2,2| > |c,3,3| > |d,4,4| > |e,5,5| > +-----+ > > Regards, > Christophe. > > On 6/4/22 14:38, marc nicole wrote: > > Hi, > How to convert a Dataset<Row> to a Dataset<String>? > What i have tried is: > > List<String> list = dataset.as(Encoders.STRING()).collectAsList(); > Dataset<String> datasetSt = spark.createDataset(list, Encoders.STRING()); > // But this line raises a org.apache.spark.sql.AnalysisException: Try to > map struct... to Tuple1, but failed as the number of fields does not line > up > > Type of columns being String > How to solve this? > > >