Hi Sean, Thanks, actually I have a dataset where I want to inferSchema after discarding the specific String value of "+". I do this because the column would be considered StringType while if i remove that "+" value it will be considered DoubleType for example or something else. Basically I want to remove "+" from all dataset rows and then inferschema. Here my idea is to filter the rows not equal to "+" for the target columns (potentially all of them) and then use spark.read().csv() to read the new filtered dataset with the option inferSchema which would then yield correct column types. What do you think?
Le sam. 4 juin 2022 à 15:56, Sean Owen <sro...@gmail.com> a écrit : > I don't think you want to do that. You get a string representation of > structured data without the structure, at best. This is part of the reason > it doesn't work directly this way. > You can use a UDF to call .toString on the Row of course, but, again > what are you really trying to do? > > On Sat, Jun 4, 2022 at 7:35 AM marc nicole <mk1853...@gmail.com> wrote: > >> Hi, >> How to convert a Dataset<Row> to a Dataset<String>? >> What i have tried is: >> >> List<String> list = dataset.as(Encoders.STRING()).collectAsList(); >> Dataset<String> datasetSt = spark.createDataset(list, Encoders.STRING()); >> // But this line raises a org.apache.spark.sql.AnalysisException: Try to >> map struct... to Tuple1, but failed as the number of fields does not line >> up >> >> Type of columns being String >> How to solve this? >> >