Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20726#discussion_r172324207 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala --- @@ -46,34 +48,46 @@ case class DataSourceV2ScanExec( new DataSourcePartitioning( s.outputPartitioning(), AttributeMap(output.map(a => a -> a.name))) + case _ if readerFactories.size == 1 => SinglePartition + case _ => super.outputPartitioning } - private lazy val readerFactories: java.util.List[DataReaderFactory[UnsafeRow]] = reader match { - case r: SupportsScanUnsafeRow => r.createUnsafeRowReaderFactories() + private lazy val readerFactories: Seq[DataReaderFactory[_]] = reader match { --- End diff -- I think it is better to have a lazy val throw an exception if it is called at the wrong time (if it is private) than to add the cast because the exception at least validates that assumptions are met and can throw a reasonable error message. The cast might hide the problem, particularly over time as this code evolves. It would be reasonable to add another path that returns `Seq[DataReaderFactory[_]]` because that's the method contract, even though there is an assumption in the callers about how it will behave. As much of the contract between methods as possible should be expressed in types.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org