[
https://issues.apache.org/jira/browse/SPARK-31001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17598397#comment-17598397
]
Kevin Appel commented on SPARK-31001:
-------------------------------------
Its is defined in here:
https://github.com/apache/spark/blob/55ee406df9933ca522bc98c2d2ccc0245e97ff67/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala
/**
* The key to use for storing partitionBy columns as options.
*/
val PARTITIONING_COLUMNS_KEY = "__partition_columns"
I started off doing something like: df.write.partitionBy("id").option("path",
"/user/kevin/ktest1").saveAsTable("kevin.ktest1")
Just to see if this works and it does, so somehow the df is having the schema
with partition by it can pass to the saveAsTable and that is able to make the
external table correctly.
Then inside the
[https://github.com/apache/spark/blob/36dd531a93af55ce5c2bfd8d275814ccb2846962/python/pyspark/sql/catalog.py#L705]
it has an extra item
**options : dict, optional extra options to specify in the table.
I started to look for partition options and i found from the delta link:
.option("__partition_columns", """["join_dim_date_id"]""")
>From there I had built that into a dictionary and send that into the function
>and it worked to declare the schema correct with the partition by, then the
>second command goes and scans all the partitions and after that it seems to be
>working.
Something like this is also working:
spark.catalog.createTable("kevin.ktest1", "/user/kevin/ktest1",
__partition_columns":"['id']")
spark.sql("alter table kevin.ktest1 recover partitions")
Whether or not this is the right go forward solution, hopefully some of the
spark experts could chime in; this __ variable name is meant for private
variables
> Add ability to create a partitioned table via catalog.createTable()
> -------------------------------------------------------------------
>
> Key: SPARK-31001
> URL: https://issues.apache.org/jira/browse/SPARK-31001
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.1.0
> Reporter: Nicholas Chammas
> Priority: Minor
>
> There doesn't appear to be a way to create a partitioned table using the
> Catalog interface.
> In SQL, however, you can do this via {{{}CREATE TABLE ... PARTITIONED BY{}}}.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]