Reynold Xin created SPARK-18856: ----------------------------------- Summary: Newly created catalog table assumed to have 0 rows and 0 bytes Key: SPARK-18856 URL: https://issues.apache.org/jira/browse/SPARK-18856 Project: Spark Issue Type: Bug Components: SQL Reporter: Reynold Xin Priority: Blocker
{code} scala> spark.range(100).selectExpr("id % 10 p", "id").write.partitionBy("p").format("json").saveAsTable("testjson") scala> spark.table("testjson").queryExecution.optimizedPlan.statistics res6: org.apache.spark.sql.catalyst.plans.logical.Statistics = Statistics(sizeInBytes=0, isBroadcastable=false) {code} It shouldn't be 0. The issue is that in DataSource.scala, we do: {code} val fileCatalog = if (sparkSession.sqlContext.conf.manageFilesourcePartitions && catalogTable.isDefined && catalogTable.get.tracksPartitionsInCatalog) { new CatalogFileIndex( sparkSession, catalogTable.get, catalogTable.get.stats.map(_.sizeInBytes.toLong).getOrElse(0L)) } else { new InMemoryFileIndex(sparkSession, globbedPaths, options, Some(partitionSchema)) } {code} We shouldn't use 0L as the fallback. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org