Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/9490#discussion_r44538127 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala --- @@ -604,10 +609,33 @@ abstract class HadoopFsRelation private[sql](maybePartitionSpec: Option[Partitio } } - buildInternalScan(requiredColumns, filters, inputStatuses, broadcastedConf) + if (!inputExists) { + throw new IOException("Input paths do not exist, input paths=" + + inputPaths.mkString("[", ",", "]")) + } else { + if (inputStatuses.isEmpty && readFromHDFS) { + logWarning("Input paths are empty, input paths=" + inputPaths.mkString("[", ",", "]")) + sqlContext.sparkContext.emptyRDD[InternalRow] + } else { + buildInternalScan(requiredColumns, filters, inputStatuses, broadcastedConf) + } + } } /** + * Most of time, HadoopFsRelation should check the inputPaths, but for some cases it is not, + * e.g. JsonRelation may read from RDD[String] + */ + def inputExists: Boolean = fileStatusCache.inputExists + + /** + * Most of time, HadoopFsRelation should read from hdfs, but some cases it is not, + * e.g. JsonRelation may read from RDD[String] + * @return + */ + def readFromHDFS: Boolean = true --- End diff -- Yeah, I was thinking exactly the same thing. @yhuai How do you think? Basically `JSONRelation` breaks a basic assumption of `HadoopFsRelation` here. However, I wonder is it worth making such big a change, just to get a more informative error message?
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org