[ 
https://issues.apache.org/jira/browse/SPARK-28208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28208:
------------------------------------

    Assignee:     (was: Apache Spark)

> When upgrading to ORC 1.5.6, the reader needs to be closed.
> -----------------------------------------------------------
>
>                 Key: SPARK-28208
>                 URL: https://issues.apache.org/jira/browse/SPARK-28208
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Owen O'Malley
>            Priority: Major
>
> As part of the ORC 1.5.6 release, we optimized the common pattern of:
> {code:java}
> Reader reader = OrcFile.createReader(...);
> RecordReader rows = reader.rows(...);{code}
> which used to open one file handle in the Reader and a second one in the 
> RecordReader. Users were seeing this as a regression when moving from the old 
> Spark ORC reader via hive to the new native reader, because it opened twice 
> as many files on the NameNode.
> In ORC 1.5.6, we changed the ORC library so that it keeps the file handle in 
> the Reader until it is either closed or a RecordReader is created from it. 
> This has cut down the number of file open requests on the NameNode by half in 
> typical spark applications. (Hive's ORC code avoided this problem by putting 
> the file footer in to the input splits, but that has other problems.)
> To get the new optimization without leaking file handles, Spark needs to be 
> close the readers that aren't used to create RecordReaders.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to