Gaetan created SPARK-33559:
------------------------------

             Summary: Column pruning with monotonically_increasing_id
                 Key: SPARK-33559
                 URL: https://issues.apache.org/jira/browse/SPARK-33559
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 3.0.1
            Reporter: Gaetan


{{}}{{}}
{code:java}
df = ss.read.parquet("/path/to/parquet/dataset") 
df.select("partnerid").withColumn("index", 
sf.monotonically_increasing_id()).explain(True){code}
{{}}
{{We should expect to only read partnerid from parquet dataset but we actually 
read the whole dataset:}}
{{}}
{code:java}
... == Physical Plan == Project [partnerid#6794, monotonically_increasing_id() 
AS index#24939L] +- FileScan parquet 
[impression_id#6550,arbitrage_id#6551,display_timestamp#6552L,requesttimestamputc#6553,affiliateid#6554,amp_adrequest_type#6555,app_id#6556,app_name#6557,appnexus_viewability#6558,apxpagevertical#6559,arbitrage_time#6560,banner_type#6561,display_type_int#6562,has_multiple_display_types#6563,bannerid#6564,bid_app_id_hash#6565,bid_url_domain_hash#6566,bidding_details#6567,bid_level_core#6568,bidrandomization_user_factor#6569,bidrandomization_user_mu#6570,bidrandomization_user_sigma#6571,big_lastrequesttimestampsession#6572,big_nbrequestaffiliatesession#6573,...
 566 more fields] ...{code}
{{}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to