[ https://issues.apache.org/jira/browse/SPARK-28613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902704#comment-16902704 ]
Shivu Sondur commented on SPARK-28613: -------------------------------------- I will check > Spark SQL action collect just judge size of compressed RDD's size, not > accurate enough > -------------------------------------------------------------------------------------- > > Key: SPARK-28613 > URL: https://issues.apache.org/jira/browse/SPARK-28613 > Project: Spark > Issue Type: Wish > Components: SQL > Affects Versions: 2.4.0 > Reporter: angerszhu > Priority: Major > > When we run action DataFrame.collect() , for the configuration > *spark.driver.maxResultSize ,*when determine if the returned data exceeds > the limit, it will use the compressed byte array's size, it is not accurate. > Since when we get data when use SparkThriftServer, when not use incremental > colletion. It will get all data of datafrme for each partition. > For return data, it has the preocess" > # compress data's byte array > # Being packaged as ResultSet > # return to driver and judge by *spark.Driver.resultMaxSize* > # *decode(uncompress) data as Array[Row]* > The amount of data unzipped differs significantly from the amount of data > unzipped, The difference in the size of the data is more than ten times > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org