[ https://issues.apache.org/jira/browse/SPARK-31001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17598397#comment-17598397 ]
Kevin Appel edited comment on SPARK-31001 at 8/31/22 2:14 PM: -------------------------------------------------------------- Its is defined in here: [https://github.com/apache/spark/blob/55ee406df9933ca522bc98c2d2ccc0245e97ff67/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala] /** * The key to use for storing partitionBy columns as options. */ val PARTITIONING_COLUMNS_KEY = "__partition_columns" I started off doing something like: df.write.partitionBy("id").option("path", "/user/kevin/ktest1").saveAsTable("kevin.ktest1") Just to see if this works and it does, so somehow the df is having the schema with partition by it can pass to the saveAsTable and that is able to make the external table correctly. Then inside the [https://github.com/apache/spark/blob/36dd531a93af55ce5c2bfd8d275814ccb2846962/python/pyspark/sql/catalog.py#L705] it has an extra item **options : dict, optional extra options to specify in the table. I started to look for partition options and i found from the delta link: .option("__partition_columns", """["join_dim_date_id"]""") >From there I had built that into a dictionary and send that into the function >and it worked to declare the schema correct with the partition by, then the >second command goes and scans all the partitions and after that it seems to be >working. Something like this is also working: spark.catalog.createTable("kevin.ktest1", "/user/kevin/ktest1", __partition_columns="['id']") spark.sql("alter table kevin.ktest1 recover partitions") Whether or not this is the right go forward solution, hopefully some of the spark experts could chime in; this __ variable name is meant for private variables was (Author: kevinappel): Its is defined in here: https://github.com/apache/spark/blob/55ee406df9933ca522bc98c2d2ccc0245e97ff67/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala /** * The key to use for storing partitionBy columns as options. */ val PARTITIONING_COLUMNS_KEY = "__partition_columns" I started off doing something like: df.write.partitionBy("id").option("path", "/user/kevin/ktest1").saveAsTable("kevin.ktest1") Just to see if this works and it does, so somehow the df is having the schema with partition by it can pass to the saveAsTable and that is able to make the external table correctly. Then inside the [https://github.com/apache/spark/blob/36dd531a93af55ce5c2bfd8d275814ccb2846962/python/pyspark/sql/catalog.py#L705] it has an extra item **options : dict, optional extra options to specify in the table. I started to look for partition options and i found from the delta link: .option("__partition_columns", """["join_dim_date_id"]""") >From there I had built that into a dictionary and send that into the function >and it worked to declare the schema correct with the partition by, then the >second command goes and scans all the partitions and after that it seems to be >working. Something like this is also working: spark.catalog.createTable("kevin.ktest1", "/user/kevin/ktest1", __partition_columns":"['id']") spark.sql("alter table kevin.ktest1 recover partitions") Whether or not this is the right go forward solution, hopefully some of the spark experts could chime in; this __ variable name is meant for private variables > Add ability to create a partitioned table via catalog.createTable() > ------------------------------------------------------------------- > > Key: SPARK-31001 > URL: https://issues.apache.org/jira/browse/SPARK-31001 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.1.0 > Reporter: Nicholas Chammas > Priority: Minor > > There doesn't appear to be a way to create a partitioned table using the > Catalog interface. > In SQL, however, you can do this via {{{}CREATE TABLE ... PARTITIONED BY{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org