[jira] [Commented] (SPARK-26407) For an external non-partitioned table, if add a directory named with k=v to the table path, select result will be wrong
[ https://issues.apache.org/jira/browse/SPARK-26407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16728368#comment-16728368 ] Bao Yunz commented on SPARK-26407: -- We can't prohibit users from placing partition-like folders under external table path, so need to modify the processing logic of spark for this kind of folder. > For an external non-partitioned table, if add a directory named with k=v to > the table path, select result will be wrong > --- > > Key: SPARK-26407 > URL: https://issues.apache.org/jira/browse/SPARK-26407 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Bao Yunz >Priority: Major > Labels: usability > > Scenario 1 > Create an external non-partitioned table, in which location directory has a > directory named with "part=1" and its schema is (id, name), for example. And > there is some data in the "part=1" directory. Then desc the table, we will > find the "part" is added in table schema as table column. when insert into > the table with two columns data, will throw a exception that target table > has 3 columns but the inserted data has 2 columns. > Scenario 2 > Create an external non-partitioned table, which location path is empty and > its scema is (id, name), for example. After several times insert operation, > we add a directory named with "part=1" in the table location directory. And > there is some data in the "part=1" directory. Then do insert and select > operation, we will find the scan path is changed to "tablePath/part=1",so > that we will get a wrong result. > The right logic should be that if a table is a non-partitioned table, adding > a partition-like folder under tablePath should not change its schema and > select result. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26407) For an external non-partitioned table, if add a directory named with k=v to the table path, select result will be wrong
[ https://issues.apache.org/jira/browse/SPARK-26407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16728142#comment-16728142 ] Hyukjin Kwon commented on SPARK-26407: -- Why don't you just avoid the directory names like part=1 or empty strings? It doesn't looks a good practice to allow. > For an external non-partitioned table, if add a directory named with k=v to > the table path, select result will be wrong > --- > > Key: SPARK-26407 > URL: https://issues.apache.org/jira/browse/SPARK-26407 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Bao Yunz >Priority: Major > Labels: usability > > Scenario 1 > Create an external non-partitioned table, in which location directory has a > directory named with "part=1" and its schema is (id, name), for example. And > there is some data in the "part=1" directory. Then desc the table, we will > find the "part" is added in table scehma as table column. when insert into > the table with two columns data, will throw a exception that target table > has 3 columns but the inserted data has 2 columns. > Scenario 2 > Create an external non-partitioned table, which location path is empty and > its scema is (id, name), for example. After several times insert operation, > we add a directory named with "part=1" in the table location directory. And > there is some data in the "part=1" directory. Then do insert and select > operation, we will find the scan path is changed to "tablePath/part=1",so > that we will get a wrong result. > The right logic should be that if a table is a non-partitioned table, adding > a partition-like folder under tablePath should not change its schema and > select result. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org