On Sun, 5 Mar 2023 at 18:27, zhangliyun <kelly...@126.com> wrote:
> Hi all > > > i have a spark sql , before in spark 2.4.2 it runs correctly, when i > upgrade to spark 3.1.3, it has some problem. > > the sql > > ``` > > select * from eds_rds.cdh_prpc63cgudba_pp_index_disputecasedetails_hourly > where dt >= date_sub('${today}',30); > > > ``` > > it will load the data of past 30 days of table > eds_rds.cdh_prpc63cgudba_pp_index_disputecasedetails_hourly, here > today='2023-03-01' > > > in spark2 i saw the physical plan the partition Filter is PartitionFilters: > [isnotnull(dt#1461), (dt#1461 >= 2023-01-31)] > > +- *(4) FileScan parquet > eds_rds.cdh_prpc63cgudba_pp_index_disputecasedetails_hourly[disputeid#1327,statuswork#1330,opTs#1457,trailSeqno#1459,trailRba#1460,dt#1461,hr#1462] > Batched: true, Format: Parquet, Location: > PrunedInMemoryFileIndex[gs://pypl-bkt-prd-row-std-gds-non-edw-tables/apps/risk/eds/eds_risk/eds_r..., > PartitionCount: 805, PartitionFilters: [isnotnull(dt#1461), (dt#1461 >= > 2023-01-31)], PushedFilters: [IsNotNull(disputeid)], ReadSchema: > struct<disputeid:string,statuswork:string,opTs:string,trailSeqno:string,trailRba:string> > > > > in spark3 , i saw the physical plan , the partitionFilter is > [isnotnull(dt#1602), > (cast(dt#1602 as date) >= 19387)] > ``` > > (8) Scan parquet eds_rds.cdh_prpc63cgudba_pp_index_disputecasedetails_hourly > Output [7]: [disputeid#1468, statuswork#1471, opTs#1598, trailSeqno#1600, > trailRba#1601, dt#1602, hr#1603] > Batched: true > Location: InMemoryFileIndex > [gs://pypl-bkt-prd-row-std-gds-non-edw-tables/apps/risk/eds/eds_risk/eds_rds/cdh/prpc63cgudba_pp_index_disputecasedetails/dt=2023-01-30/hr=00, > ... 784 entries] > PartitionFilters: [isnotnull(dt#1602), (cast(dt#1602 as date) >= 19387)] > PushedFilters: [IsNotNull(disputeid)] > ReadSchema: > struct<disputeid:string,statuswork:string,opTs:string,trailSeqno:string,trailRba:string> > > > ``` > > here i want to ask why there is big difference in partitionFitler in > spark2 and spark3, i guess most my spark configure is similar in spark2 > and spark3 to run the same sql >