Re: schema changes of custom data source in persistent tables DataSourceV1

2020-07-20 Thread fansparker
Makes sense, Russell. I am trying to figure out if there is a way to enforce
metadata reload at "createRelation" if the provided schema in the new
sparkSession is different than the existing metadata schema. 



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: schema changes of custom data source in persistent tables DataSourceV1

2020-07-20 Thread Russell Spitzer
The code you linked to is very old and I don't think that method works
anymore (Hive context not existing anymore). My latest attempt at trying
this was Spark 2.2 and I ran into the issues I wrote about before.

In DSV2 it's done via a catalog implementation, so you basically can write
a new catalog that can create tables and such with whatever metadata you
like. I'm not sure there is a Hive Metastore catalog implemented yet in
DSV2. I also think if it was it would only be in Spark 3.0

On Mon, Jul 20, 2020 at 10:05 AM fansparker  wrote:

> Thanks Russell.  This
> <
> https://gite.lirmm.fr/yagoubi/spark/commit/6463e0b9e8067cce70602c5c9006a2546856a9d6#fecff1a3ad108a52192ba9cd6dd7b11a3d18871b_0_141>
>
> shows that the "refreshTable" and "invalidateTable" could be used to reload
> the metadata but they do not work in our case. I have tried to invoke the
> "schema()" with the updated schema from the "buildScan()" as well.
>
> It will be helpful to have this feature enabled for DataSourceV1 as the
> schema evolves, i will check if this is an change that can be made.
>
> You mentioned that it works in DataSourceV2. Is there an implementation
> sample for persistent tables DataSourceV2 that works with spark 2.4.4?
> Thanks again.
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: schema changes of custom data source in persistent tables DataSourceV1

2020-07-20 Thread fansparker
Thanks Russell.  This

  
shows that the "refreshTable" and "invalidateTable" could be used to reload
the metadata but they do not work in our case. I have tried to invoke the
"schema()" with the updated schema from the "buildScan()" as well. 

It will be helpful to have this feature enabled for DataSourceV1 as the
schema evolves, i will check if this is an change that can be made.

You mentioned that it works in DataSourceV2. Is there an implementation
sample for persistent tables DataSourceV2 that works with spark 2.4.4?
Thanks again.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: schema changes of custom data source in persistent tables DataSourceV1

2020-07-20 Thread Russell Spitzer
The last I looked into this the answer is no. I believe since there is a
Spark Session internal relation cache, the only way to update a sessions
information was a full drop and create. That was my experience with a
custom hive metastore and entries read from it. I could change the entries
in the metastore underneath the session but since the session cached the
relation lookup I couldn't get it to reload the metadata.

DatssourceV2 does make this easy though

On Mon, Jul 20, 2020, 5:17 AM fansparker  wrote:

> Does anybody know if there is a way to get the persisted table's schema
> updated when the underlying custom data source schema is changed?
> Currently,
> we have to drop and re-create the table.
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: schema changes of custom data source in persistent tables DataSourceV1

2020-07-20 Thread Piyush Acharya
Do you want to merge the schema when incoming data is changed?

spark.conf.set("spark.sql.parquet.mergeSchema", "true")

https://kontext.tech/column/spark/381/schema-merging-evolution-with-parquet-in-spark-and-hive


On Mon, Jul 20, 2020 at 3:48 PM fansparker  wrote:

> Does anybody know if there is a way to get the persisted table's schema
> updated when the underlying custom data source schema is changed?
> Currently,
> we have to drop and re-create the table.
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: schema changes of custom data source in persistent tables DataSourceV1

2020-07-20 Thread fansparker
Does anybody know if there is a way to get the persisted table's schema
updated when the underlying custom data source schema is changed? Currently,
we have to drop and re-create the table. 



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org