[no subject]

2023-03-06 Thread ansel boero
unsubscribe

unsubscribe

2023-03-06 Thread Shiv Prashant Sood
On Sun, 5 Mar 2023 at 18:27, zhangliyun  wrote:

> Hi all
>
>
>   i have a spark sql  , before in  spark 2.4.2 it runs correctly, when i
> upgrade to spark 3.1.3, it has some problem.
>
>  the sql
>
>  ```
>
> select * from eds_rds.cdh_prpc63cgudba_pp_index_disputecasedetails_hourly
> where dt >= date_sub('${today}',30);
>
>
> ```
>
> it will load the data of past 30 days of table
> eds_rds.cdh_prpc63cgudba_pp_index_disputecasedetails_hourly, here
> today='2023-03-01'
>
>
> in spark2  i saw the physical plan  the partition Filter is PartitionFilters:
> [isnotnull(dt#1461), (dt#1461 >= 2023-01-31)]
>
>  +- *(4) FileScan parquet 
> eds_rds.cdh_prpc63cgudba_pp_index_disputecasedetails_hourly[disputeid#1327,statuswork#1330,opTs#1457,trailSeqno#1459,trailRba#1460,dt#1461,hr#1462]
>  Batched: true, Format: Parquet, Location: 
> PrunedInMemoryFileIndex[gs://pypl-bkt-prd-row-std-gds-non-edw-tables/apps/risk/eds/eds_risk/eds_r...,
>  PartitionCount: 805, PartitionFilters: [isnotnull(dt#1461), (dt#1461 >= 
> 2023-01-31)], PushedFilters: [IsNotNull(disputeid)], ReadSchema: 
> struct
>
>
>
> in spark3 , i saw the physical plan ,  the partitionFilter is 
> [isnotnull(dt#1602),
> (cast(dt#1602 as date) >= 19387)]
> ```
>
> (8) Scan parquet eds_rds.cdh_prpc63cgudba_pp_index_disputecasedetails_hourly
> Output [7]: [disputeid#1468, statuswork#1471, opTs#1598, trailSeqno#1600, 
> trailRba#1601, dt#1602, hr#1603]
> Batched: true
> Location: InMemoryFileIndex 
> [gs://pypl-bkt-prd-row-std-gds-non-edw-tables/apps/risk/eds/eds_risk/eds_rds/cdh/prpc63cgudba_pp_index_disputecasedetails/dt=2023-01-30/hr=00,
>  ... 784 entries]
> PartitionFilters: [isnotnull(dt#1602), (cast(dt#1602 as date) >= 19387)]
> PushedFilters: [IsNotNull(disputeid)]
> ReadSchema: 
> struct
>
>
> ```
>
> here i want to ask why there is big difference in partitionFitler in
> spark2 and spark3,  i guess most my spark configure is similar in spark2
> and spark3 to run the same sql
>


Re:Re: please help why there is big difference in partitionFilter in spark2.4.2 and spark3.1.3.

2023-03-06 Thread zhangliyun
Hi Yuming
  thanks for the suggestion , now the partitionFilter is normal. Appreciate for 
your help!


Best Regards
Kelly Zhang, Liyun Zhang
















在 2023-03-06 11:04:55,"Yuming Wang"  写道:

Hi Liyun,


This is because of this change: 
https://issues.apache.org/jira/browse/SPARK-27638.
You can set spark.sql.legacy.typeCoercion.datetimeToString.enabled to true to 
restore the old behavior.


On Mon, Mar 6, 2023 at 10:27 AM zhangliyun  wrote:


Hi all 




  i have a spark sql  , before in  spark 2.4.2 it runs correctly, when i 
upgrade to spark 3.1.3, it has some problem.

 the sql

 ```

select * from eds_rds.cdh_prpc63cgudba_pp_index_disputecasedetails_hourly where 
dt >= date_sub('${today}',30);




```

it will load the data of past 30 days of table  
eds_rds.cdh_prpc63cgudba_pp_index_disputecasedetails_hourly, here 
today='2023-03-01'




in spark2  i saw the physical plan  the partition Filter is PartitionFilters: 
[isnotnull(dt#1461), (dt#1461 >= 2023-01-31)]
 +- *(4) FileScan parquet 
eds_rds.cdh_prpc63cgudba_pp_index_disputecasedetails_hourly[disputeid#1327,statuswork#1330,opTs#1457,trailSeqno#1459,trailRba#1460,dt#1461,hr#1462]
 Batched: true, Format: Parquet, Location: 
PrunedInMemoryFileIndex[gs://pypl-bkt-prd-row-std-gds-non-edw-tables/apps/risk/eds/eds_risk/eds_r...,
 PartitionCount: 805, PartitionFilters: [isnotnull(dt#1461), (dt#1461 >= 
2023-01-31)], PushedFilters: [IsNotNull(disputeid)], ReadSchema: 
struct







in spark3 , i saw the physical plan ,  the partitionFilter is 
[isnotnull(dt#1602), (cast(dt#1602 as date) >= 19387)]

```
(8) Scan parquet eds_rds.cdh_prpc63cgudba_pp_index_disputecasedetails_hourly
Output [7]: [disputeid#1468, statuswork#1471, opTs#1598, trailSeqno#1600, 
trailRba#1601, dt#1602, hr#1603]
Batched: true
Location: InMemoryFileIndex 
[gs://pypl-bkt-prd-row-std-gds-non-edw-tables/apps/risk/eds/eds_risk/eds_rds/cdh/prpc63cgudba_pp_index_disputecasedetails/dt=2023-01-30/hr=00,
 ... 784 entries]
PartitionFilters: [isnotnull(dt#1602), (cast(dt#1602 as date) >= 19387)]
PushedFilters: [IsNotNull(disputeid)]
ReadSchema: 
struct


```


here i want to ask why there is big difference in partitionFitler in spark2 and 
spark3,  i guess most my spark configure is similar in spark2 and spark3 to run 
the same sql