[jira] [Commented] (SPARK-18752) "isSrcLocal" parameter to Hive loadTable / loadPartition should come from user
[ https://issues.apache.org/jira/browse/SPARK-18752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15743486#comment-15743486 ] Apache Spark commented on SPARK-18752: -- User 'vanzin' has created a pull request for this issue: https://github.com/apache/spark/pull/16257 > "isSrcLocal" parameter to Hive loadTable / loadPartition should come from user > -- > > Key: SPARK-18752 > URL: https://issues.apache.org/jira/browse/SPARK-18752 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Marcelo Vanzin >Priority: Minor > Fix For: 2.2.0 > > > We ran into an issue with the HiveShim code that calls "loadTable" and > "loadPartition" while testing with some recent changes in upstream Hive. > The semantics in Hive changed slightly, and if you provide the wrong value > for "isSrcLocal" you now can end up with an invalid table: the Hive code will > move the temp directory to the final destination instead of moving its > children. > The problem in Spark is that HiveShim.scala tries to figure out the value of > "isSrcLocal" based on where the source and target directories are; that's not > correct. "isSrcLocal" should be set based on the user query (e.g. "LOAD DATA > LOCAL" would set it to "true"). So we need to propagate that information from > the user query down to HiveShim. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18752) "isSrcLocal" parameter to Hive loadTable / loadPartition should come from user
[ https://issues.apache.org/jira/browse/SPARK-18752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15733397#comment-15733397 ] Marcelo Vanzin commented on SPARK-18752: For posterity, here's the exception we hit in our tests: {noformat} java.io.IOException: Not a file: file:/tmp/warehouse--ccc2e52d-5760-4cfe-84d3-79ec2bb4b03c/hivetablewitharrayvalue/-ext-1 at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:322) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239 {noformat} (This is trying to query a table that was populated using "LOAD DATA" in the unit tests.) > "isSrcLocal" parameter to Hive loadTable / loadPartition should come from user > -- > > Key: SPARK-18752 > URL: https://issues.apache.org/jira/browse/SPARK-18752 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Marcelo Vanzin >Priority: Minor > > We ran into an issue with the HiveShim code that calls "loadTable" and > "loadPartition" while testing with some recent changes in upstream Hive. > The semantics in Hive changed slightly, and if you provide the wrong value > for "isSrcLocal" you now can end up with an invalid table: the Hive code will > move the temp directory to the final destination instead of moving its > children. > The problem in Spark is that HiveShim.scala tries to figure out the value of > "isSrcLocal" based on where the source and target directories are; that's not > correct. "isSrcLocal" should be set based on the user query (e.g. "LOAD DATA > LOCAL" would set it to "true"). So we need to propagate that information from > the user query down to HiveShim. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18752) "isSrcLocal" parameter to Hive loadTable / loadPartition should come from user
[ https://issues.apache.org/jira/browse/SPARK-18752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727203#comment-15727203 ] Apache Spark commented on SPARK-18752: -- User 'vanzin' has created a pull request for this issue: https://github.com/apache/spark/pull/16179 > "isSrcLocal" parameter to Hive loadTable / loadPartition should come from user > -- > > Key: SPARK-18752 > URL: https://issues.apache.org/jira/browse/SPARK-18752 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Marcelo Vanzin >Priority: Minor > > We ran into an issue with the HiveShim code that calls "loadTable" and > "loadPartition" while testing with some recent changes in upstream Hive. > The semantics in Hive changed slightly, and if you provide the wrong value > for "isSrcLocal" you now can end up with an invalid table: the Hive code will > move the temp directory to the final destination instead of moving its > children. > The problem in Spark is that HiveShim.scala tries to figure out the value of > "isSrcLocal" based on where the source and target directories are; that's not > correct. "isSrcLocal" should be set based on the user query (e.g. "LOAD DATA > LOCAL" would set it to "true"). So we need to propagate that information from > the user query down to HiveShim. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org