Hi,

I'm wondering what's the rationale for checking the path option
eagerly in FileStreamSource? My thinking is that until start is called
there's no processing going on that is supposed to happen on executors
(not the driver) with the path available.

I could (and perhaps should) use dfs but IMHO that just hides the real
question of the text source eagerness.

Please help me understand the rationale of the choice. Thanks!

scala> spark.version
res0: String = 2.1.0-SNAPSHOT

scala> spark.readStream.format("text").load("/var/logs")
org.apache.spark.sql.AnalysisException: Path does not exist: /var/logs;
  at 
org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(DataSource.scala:229)
  at 
org.apache.spark.sql.execution.datasources.DataSource.sourceInfo$lzycompute(DataSource.scala:81)
  at 
org.apache.spark.sql.execution.datasources.DataSource.sourceInfo(DataSource.scala:81)
  at 
org.apache.spark.sql.execution.streaming.StreamingRelation$.apply(StreamingRelation.scala:30)
  at 
org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:142)
  at 
org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:153)
  ... 48 elided

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to