Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22721#discussion_r231790964 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala --- @@ -183,13 +183,14 @@ case class InsertIntoHadoopFsRelationCommand( refreshUpdatedPartitions(updatedPartitionPaths) } - // refresh cached files in FileIndex - fileIndex.foreach(_.refresh()) - // refresh data cache if table is cached - sparkSession.catalog.refreshByPath(outputPath.toString) - if (catalogTable.nonEmpty) { + sparkSession.sessionState.catalog.refreshTable(catalogTable.get.identifier) --- End diff -- This is the reason i asked why in some flow we are initializing the stats and for some flow we are not because of which stats will be none and refreshTable will be never called. in my PR i told the flow where i saw in insert flow we are not nitializing the stats because of which refreshTable () flow will never be executed. But before insert command you execute a select statement where stats will be intialized and the relation will be cached, now if you execute insert query refreshTable() will be called as this time the stats will be nonempty
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org