I think this is not a problem in 3.0 anymore, see
https://issues.apache.org/jira/browse/SPARK-27638

On Wed, Aug 5, 2020 at 12:08 AM Russell Spitzer <russell.spit...@gmail.com>
wrote:

> I've just run into this issue again with another user and I feel like most
> folks here have seen some flavor of this at some point.
>
> The user registers a Datasource with a column of type Date (or some non
> string) then performs a query that looks like.
>
> *SELECT * from Source WHERE date_col > '2020-08-03'*
>
> Seeing that the predicate literal here is a String, Spark needs to make a
> change so that the DataSource column will be of the same type (Date),
> so it places a "Cast" on the Datasource column so our plan ends up looking
> like.
>
> Cast(date_col as String) > '2020-08-03'
>
> Since the Datasource Strategies can't handle a push down of the "Cast"
> function we lose the predicate pushdown we could
> have had. This can change a Job from a single partition lookup into a full
> scan leading to a very confusing situation for
> the end user. I also wonder about the relative cost here since we could be
> avoiding doing X casts and instead just do a single
> one on the predicate, in addition we could be doing the cast at the
> Analysis phase and cut the run short before any work even
> starts rather than doing a perhaps meaningless comparison between a date
> and a non-date string.
>
> I think we should seriously consider whether in cases like this we should
> attempt to cast the literal rather than casting the
> source column.
>
> Please let me know if anyone has thoughts on this, or has some previous
> Jiras I could dig into if it's been discussed before,
> Russ
>

Reply via email to