[ https://issues.apache.org/jira/browse/SPARK-10287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yin Huai updated SPARK-10287: ----------------------------- Labels: releasenotes (was: ) > After processing a query using JSON data, Spark SQL continuously refreshes > metadata of the table > ------------------------------------------------------------------------------------------------ > > Key: SPARK-10287 > URL: https://issues.apache.org/jira/browse/SPARK-10287 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.5.0 > Reporter: Yin Huai > Assignee: Yin Huai > Priority: Critical > Labels: releasenotes > Fix For: 1.5.1 > > > I have a partitioned json table with 1824 partitions. > {code} > val df = sqlContext.read.format("json").load("aPartitionedJsonData") > val columnStr = df.schema.map(_.name).mkString(",") > println(s"columns: $columnStr") > val hash = df > .selectExpr(s"hash($columnStr) as hashValue") > .groupBy() > .sum("hashValue") > .head() > .getLong(0) > {code} > Looks like for JSON, we refresh metadata when we call buildScan. For a > partitioned table, we call buildScan for every partition. So, looks like we > will refresh this table 1824 times. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org