Hi Manjunath,
Can you share the data example?
>From the information shared above, it seems that you will need to apply
mapping with custom logic on the rows in your RDD to be consistent before
you can apply the schema.
I recommend reading about the mapping functionality here:
https://data-flair.training/blogs/apache-spark-map-vs-flatmap/
I hope it helps!
-Adi
On Sat, 16 May 2020 at 17:50, Manjunath Shetty H
wrote:
> Hi,
>
> I have a dataframe with some columns and data that is fetched from JDBC,
> as i have to maintain the schema consistent in the ORC file i have to apply
> different schema for that dataframe. Column names will be same, but Data or
> Schema may contain some extra columns.
>
> Is there any way i can apply the schema on top the existing Dataframe ?.
> Schema may be just doing the columns reordering in the most of the cases.
>
> i have tried this "
>
> DataFrame dfNew = hc.createDataFrame(df.rdd(), ((StructType)
> DataType.fromJson(schema)));
>
> "
>
> But this will map the columns based on index and it will fail in case of
> columns reordering.
>
> Any pointers will be helpful.
>
> Thanks and Regards
> Manjunath Shetty
>