[ https://issues.apache.org/jira/browse/SPARK-18915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiao Li updated SPARK-18915: ---------------------------- Issue Type: Sub-task (was: Bug) Parent: SPARK-17861 > Return Nothing when Querying a Partitioned Data Source Table without > Repairing it > --------------------------------------------------------------------------------- > > Key: SPARK-18915 > URL: https://issues.apache.org/jira/browse/SPARK-18915 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 2.1.0 > Reporter: Xiao Li > Priority: Critical > > In Spark 2.1, if we create a parititoned data source table given a specified > path, it returns nothing when we try to query it. To get the data, we have to > manually issue a DDL to repair the table. > In Spark 2.0, it can return the data stored in the specified path, without > repairing the table. > Below is the output of Spark 2.1. > {noformat} > scala> spark.range(5).selectExpr("id as fieldOne", "id as > partCol").write.partitionBy("partCol").mode("overwrite").saveAsTable("test") > [Stage 0:======================> (3 + 5) / > 8]SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further > details. > > > scala> spark.sql("select * from test").show() > +--------+-------+ > |fieldOne|partCol| > +--------+-------+ > | 0| 0| > | 1| 1| > | 2| 2| > | 3| 3| > | 4| 4| > +--------+-------+ > scala> spark.sql("desc formatted test").show(50, false) > +----------------------------+----------------------------------------------------------------------+-------+ > |col_name |data_type > |comment| > +----------------------------+----------------------------------------------------------------------+-------+ > |fieldOne |bigint > |null | > |partCol |bigint > |null | > |# Partition Information | > | | > |# col_name |data_type > |comment| > |partCol |bigint > |null | > | | > | | > |# Detailed Table Information| > | | > |Database: |default > | | > |Owner: |xiaoli > | | > |Create Time: |Sat Dec 17 17:46:24 PST 2016 > | | > |Last Access Time: |Wed Dec 31 16:00:00 PST 1969 > | | > |Location: > |file:/Users/xiaoli/IdeaProjects/sparkDelivery/bin/spark-warehouse/test| > | > |Table Type: |MANAGED > | | > |Table Parameters: | > | | > | transient_lastDdlTime |1482025584 > | | > | | > | | > |# Storage Information | > | | > |SerDe Library: > |org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe | > | > |InputFormat: > |org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat | > | > |OutputFormat: > |org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat | > | > |Compressed: |No > | | > |Storage Desc Parameters: | > | | > | serialization.format |1 > | | > |Partition Provider: |Catalog > | | > +----------------------------+----------------------------------------------------------------------+-------+ > scala> spark.sql(s"create table newTab (fieldOne long, partCol int) using > parquet options (path > 'file:/Users/xiaoli/IdeaProjects/sparkDelivery/bin/spark-warehouse/test') > partitioned by (partCol)") > res3: org.apache.spark.sql.DataFrame = [] > scala> spark.table("newTab").show() > +--------+-------+ > |fieldOne|partCol| > +--------+-------+ > +--------+-------+ > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org