[ https://issues.apache.org/jira/browse/SPARK-18856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15748253#comment-15748253 ]
Apache Spark commented on SPARK-18856: -------------------------------------- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/16280 > Newly created catalog table assumed to have 0 rows and 0 bytes > -------------------------------------------------------------- > > Key: SPARK-18856 > URL: https://issues.apache.org/jira/browse/SPARK-18856 > Project: Spark > Issue Type: Bug > Components: SQL > Reporter: Reynold Xin > Priority: Blocker > > {code} > scala> spark.range(100).selectExpr("id % 10 p", > "id").write.partitionBy("p").format("json").saveAsTable("testjson") > scala> spark.table("testjson").queryExecution.optimizedPlan.statistics > res6: org.apache.spark.sql.catalyst.plans.logical.Statistics = > Statistics(sizeInBytes=0, isBroadcastable=false) > {code} > It shouldn't be 0. The issue is that in DataSource.scala, we do: > {code} > val fileCatalog = if > (sparkSession.sqlContext.conf.manageFilesourcePartitions && > catalogTable.isDefined && > catalogTable.get.tracksPartitionsInCatalog) { > new CatalogFileIndex( > sparkSession, > catalogTable.get, > catalogTable.get.stats.map(_.sizeInBytes.toLong).getOrElse(0L)) > } else { > new InMemoryFileIndex(sparkSession, globbedPaths, options, > Some(partitionSchema)) > } > {code} > We shouldn't use 0L as the fallback. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org