[
https://issues.apache.org/jira/browse/SPARK-33559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gaetan updated SPARK-33559:
---
Description:
{code:java}
df = ss.read.parquet("/path/to/parquet/dataset")
df.select("partnerid").withColumn("index",
sf.monotonically_increasing_id()).explain(True){code}
{{We should expect to only read partnerid from parquet dataset but we actually
read the whole dataset:}}
{code:java}
... == Physical Plan == Project [partnerid#6794, monotonically_increasing_id()
AS index#24939L] +- FileScan parquet
[impression_id#6550,arbitrage_id#6551,display_timestamp#6552L,requesttimestamputc#6553,affiliateid#6554,amp_adrequest_type#6555,app_id#6556,app_name#6557,appnexus_viewability#6558,apxpagevertical#6559,arbitrage_time#6560,banner_type#6561,display_type_int#6562,has_multiple_display_types#6563,bannerid#6564,bid_app_id_hash#6565,bid_url_domain_hash#6566,bidding_details#6567,bid_level_core#6568,bidrandomization_user_factor#6569,bidrandomization_user_mu#6570,bidrandomization_user_sigma#6571,big_lastrequesttimestampsession#6572,big_nbrequestaffiliatesession#6573,...
566 more fields] ...{code}
was:
{{}}{{}}
{code:java}
df = ss.read.parquet("/path/to/parquet/dataset")
df.select("partnerid").withColumn("index",
sf.monotonically_increasing_id()).explain(True){code}
{{}}
{{We should expect to only read partnerid from parquet dataset but we actually
read the whole dataset:}}
{{}}
{code:java}
... == Physical Plan == Project [partnerid#6794, monotonically_increasing_id()
AS index#24939L] +- FileScan parquet
[impression_id#6550,arbitrage_id#6551,display_timestamp#6552L,requesttimestamputc#6553,affiliateid#6554,amp_adrequest_type#6555,app_id#6556,app_name#6557,appnexus_viewability#6558,apxpagevertical#6559,arbitrage_time#6560,banner_type#6561,display_type_int#6562,has_multiple_display_types#6563,bannerid#6564,bid_app_id_hash#6565,bid_url_domain_hash#6566,bidding_details#6567,bid_level_core#6568,bidrandomization_user_factor#6569,bidrandomization_user_mu#6570,bidrandomization_user_sigma#6571,big_lastrequesttimestampsession#6572,big_nbrequestaffiliatesession#6573,...
566 more fields] ...{code}
{{}}
> Column pruning with monotonically_increasing_id
> ---
>
> Key: SPARK-33559
> URL: https://issues.apache.org/jira/browse/SPARK-33559
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Gaetan
>Priority: Minor
>
> {code:java}
> df = ss.read.parquet("/path/to/parquet/dataset")
> df.select("partnerid").withColumn("index",
> sf.monotonically_increasing_id()).explain(True){code}
> {{We should expect to only read partnerid from parquet dataset but we
> actually read the whole dataset:}}
> {code:java}
> ... == Physical Plan == Project [partnerid#6794,
> monotonically_increasing_id() AS index#24939L] +- FileScan parquet
> [impression_id#6550,arbitrage_id#6551,display_timestamp#6552L,requesttimestamputc#6553,affiliateid#6554,amp_adrequest_type#6555,app_id#6556,app_name#6557,appnexus_viewability#6558,apxpagevertical#6559,arbitrage_time#6560,banner_type#6561,display_type_int#6562,has_multiple_display_types#6563,bannerid#6564,bid_app_id_hash#6565,bid_url_domain_hash#6566,bidding_details#6567,bid_level_core#6568,bidrandomization_user_factor#6569,bidrandomization_user_mu#6570,bidrandomization_user_sigma#6571,big_lastrequesttimestampsession#6572,big_nbrequestaffiliatesession#6573,...
> 566 more fields] ...{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org