[ https://issues.apache.org/jira/browse/SPARK-18752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15733397#comment-15733397 ]
Marcelo Vanzin commented on SPARK-18752: ---------------------------------------- For posterity, here's the exception we hit in our tests: {noformat} java.io.IOException: Not a file: file:/tmp/warehouse--ccc2e52d-5760-4cfe-84d3-79ec2bb4b03c/hivetablewitharrayvalue/-ext-10000 at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:322) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239 {noformat} (This is trying to query a table that was populated using "LOAD DATA" in the unit tests.) > "isSrcLocal" parameter to Hive loadTable / loadPartition should come from user > ------------------------------------------------------------------------------ > > Key: SPARK-18752 > URL: https://issues.apache.org/jira/browse/SPARK-18752 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.1.0 > Reporter: Marcelo Vanzin > Priority: Minor > > We ran into an issue with the HiveShim code that calls "loadTable" and > "loadPartition" while testing with some recent changes in upstream Hive. > The semantics in Hive changed slightly, and if you provide the wrong value > for "isSrcLocal" you now can end up with an invalid table: the Hive code will > move the temp directory to the final destination instead of moving its > children. > The problem in Spark is that HiveShim.scala tries to figure out the value of > "isSrcLocal" based on where the source and target directories are; that's not > correct. "isSrcLocal" should be set based on the user query (e.g. "LOAD DATA > LOCAL" would set it to "true"). So we need to propagate that information from > the user query down to HiveShim. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org