I have a use case where I want to build a dataset based off of conditionally available data. I thought I'd do something like this:
case class SomeData( ... ) // parameters are basic encodable types like strings and BigDecimals var data = spark.emptyDataset[SomeData] // loop, determining what data to ingest and process into datasets data = data.union(someCode.thatReturnsADataset) // end loop However I get a runtime exception: Exception in thread "main" org.apache.spark.sql.AnalysisException: unresolved operator 'Union; at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:40) at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:58) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:361) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:67) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:67) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:58) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49) at org.apache.spark.sql.Dataset.<init>(Dataset.scala:161) at org.apache.spark.sql.Dataset.<init>(Dataset.scala:167) at org.apache.spark.sql.Dataset$.apply(Dataset.scala:59) at org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:2594) at org.apache.spark.sql.Dataset.union(Dataset.scala:1459) Granted, I'm new at Spark so this might be an anti-pattern, so I'm open to suggestions. However it doesn't seem like I'm doing anything incorrect here, the types are correct. Searching for this error online returns results seemingly about working in dataframes and having mismatching schemas or a different order of fields, and it seems like bugfixes have gone into place for those cases. Thanks in advance. Efe