[ 
https://issues.apache.org/jira/browse/SPARK-26407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bao Yunz updated SPARK-26407:
-----------------------------
    Description: 
Scenario 1

Create an external non-partitioned table, in which location directory has a 
directory named with "part=1" and its schema is (id, name), for example. And 
there is some data in the "part=1" directory. Then desc the table, we will find 
the "part" is added in table schema as table column. when insert into the table 
with two columns data, will throw a exception that  target table has 3 columns 
but the inserted data has 2 columns. 

Scenario 2

Create an external non-partitioned table, which location path is empty and its 
scema is (id, name), for example. After several times insert operation, we add 
a directory named with "part=1" in the table location directory.  And there is 
some data in the "part=1" directory.  Then do insert and select operation, we 
will find the scan path is changed to "tablePath/part=1",so that we will get a 
wrong result.

 The right logic should be that if a table is a non-partitioned table, adding a 
partition-like folder under tablePath should not change its schema and select 
result.

  was:
Scenario 1

Create an external non-partitioned table, in which location directory has a 
directory named with "part=1" and its schema is (id, name), for example. And 
there is some data in the "part=1" directory. Then desc the table, we will find 
the "part" is added in table scehma as table column. when insert into the table 
with two columns data, will throw a exception that  target table has 3 columns 
but the inserted data has 2 columns. 

Scenario 2

Create an external non-partitioned table, which location path is empty and its 
scema is (id, name), for example. After several times insert operation, we add 
a directory named with "part=1" in the table location directory.  And there is 
some data in the "part=1" directory.  Then do insert and select operation, we 
will find the scan path is changed to "tablePath/part=1",so that we will get a 
wrong result.

 The right logic should be that if a table is a non-partitioned table, adding a 
partition-like folder under tablePath should not change its schema and select 
result.


> For an external non-partitioned table, if add a directory named with k=v to 
> the table path, select result will be wrong
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-26407
>                 URL: https://issues.apache.org/jira/browse/SPARK-26407
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Bao Yunz
>            Priority: Major
>              Labels: usability
>
> Scenario 1
> Create an external non-partitioned table, in which location directory has a 
> directory named with "part=1" and its schema is (id, name), for example. And 
> there is some data in the "part=1" directory. Then desc the table, we will 
> find the "part" is added in table schema as table column. when insert into 
> the table with two columns data, will throw a exception that  target table 
> has 3 columns but the inserted data has 2 columns. 
> Scenario 2
> Create an external non-partitioned table, which location path is empty and 
> its scema is (id, name), for example. After several times insert operation, 
> we add a directory named with "part=1" in the table location directory.  And 
> there is some data in the "part=1" directory.  Then do insert and select 
> operation, we will find the scan path is changed to "tablePath/part=1",so 
> that we will get a wrong result.
>  The right logic should be that if a table is a non-partitioned table, adding 
> a partition-like folder under tablePath should not change its schema and 
> select result.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to