Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/15568#discussion_r84421537
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/TableFileCatalog.scala
 ---
    @@ -102,6 +95,13 @@ class TableFileCatalog(
       }
     
       override def inputFiles: Array[String] = allPartitions.inputFiles
    +
    +  override def equals(o: Any): Boolean = o match {
    --- End diff --
    
    Under hive context, we will cache the `LogicalRelation` for every data 
source table(including converted from hive), which means every table will 
always have a `TableFileCatalog` of same instance.
    
    However, it's not true in sql core. We will re-construct the 
`TableFileCatalog` and `LogicalRelation` everytime we look up a table. Thus we 
may encounter cache miss even if the table is cached, because 
`TableFileCatalog` of difference instances never equal to each other.
    
    Although it's not a real problem now, I think it's reasonable to follow 
`ListFileCatalong` and add the `equals` and `hashCode`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to