[ 
https://issues.apache.org/jira/browse/SPARK-25237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

du updated SPARK-25237:
-----------------------
    Description: 
In FileScanRdd, we will update inputMetrics's bytesRead using updateBytesRead  
every 1000 rows or when close the iterator.

but when close the iterator,  we will invoke updateBytesReadWithFileSize to 
increase the inputMetrics's bytesRead with file's length.

this will result in the inputMetrics's bytesRead is wrong when run the query 
with limit such as select * from table limit 1.

because we do not support for Hadoop 2.5 and earlier now, we always get the 
bytesRead from  Hadoop FileSystem statistics other than files's length.

 

  was:
In FileScanRdd, we will update inputMetrics's bytesRead using updateBytesRead  
every 1000 rows or when close the iterator.

but when close the iterator,  we will invoke updateBytesReadWithFileSize to 
increase the inputMetrics's bytesRead with file's length.

this will result in the inputMetrics's bytesRead is wrong when run the query 
with limit such as select * from table limit 1。

because we do not support for Hadoop 2.5 and earlier now, we always get the 
bytesRead from  Hadoop FileSystem statistics other than files's length.

 


> FileScanRdd's inputMetrics is wrong  when select the datasource table with 
> limit
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-25237
>                 URL: https://issues.apache.org/jira/browse/SPARK-25237
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.2, 2.3.1
>            Reporter: du
>            Priority: Major
>
> In FileScanRdd, we will update inputMetrics's bytesRead using updateBytesRead 
>  every 1000 rows or when close the iterator.
> but when close the iterator,  we will invoke updateBytesReadWithFileSize to 
> increase the inputMetrics's bytesRead with file's length.
> this will result in the inputMetrics's bytesRead is wrong when run the query 
> with limit such as select * from table limit 1.
> because we do not support for Hadoop 2.5 and earlier now, we always get the 
> bytesRead from  Hadoop FileSystem statistics other than files's length.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to