Hi Sean,

Thanks, actually I have a dataset where I want to inferSchema after
discarding the specific String value of "+". I do this because the column
would be considered StringType while if i remove that "+" value it will be
considered DoubleType for example or something else. Basically I want to
remove "+" from all dataset rows and then inferschema.
Here my idea is to filter the rows not equal to "+" for the target columns
(potentially all of them) and then use spark.read().csv() to read the new
filtered dataset with the option inferSchema which would then yield correct
column types.
What do you think?

Le sam. 4 juin 2022 à 15:56, Sean Owen <sro...@gmail.com> a écrit :

> I don't think you want to do that. You get a string representation of
> structured data without the structure, at best. This is part of the reason
> it doesn't work directly this way.
> You can use a UDF to call .toString on the Row of course, but, again
> what are you really trying to do?
>
> On Sat, Jun 4, 2022 at 7:35 AM marc nicole <mk1853...@gmail.com> wrote:
>
>> Hi,
>> How to convert a Dataset<Row> to a Dataset<String>?
>> What i have tried is:
>>
>> List<String> list = dataset.as(Encoders.STRING()).collectAsList();
>> Dataset<String> datasetSt = spark.createDataset(list, Encoders.STRING());
>> // But this line raises a org.apache.spark.sql.AnalysisException: Try to
>> map struct... to Tuple1, but failed as the number of fields does not line
>> up
>>
>> Type of columns being String
>> How to solve this?
>>
>

Reply via email to