[GitHub] spark pull request #15360: [SPARK-17073] [SQL] [FOLLOWUP] generate column-le...

gatorsmile Thu, 06 Oct 2016 15:18:49 -0700

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/15360#discussion_r82297698
  
    --- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
    @@ -405,6 +405,78 @@ class StatisticsSuite extends QueryTest with 
TestHiveSingleton with SQLTestUtils
         }
       }
     
    +  test("check column statistics for case sensitive columns") {
    +    val tableName = "tbl"
    +    // scalastyle:off
    +    // non ascii characters are not allowed in the source code, so we 
disable the scalastyle.
    +    val columnGroups: Seq[(String, String)] = Seq(("c1", "C1"), ("åc", 
"åC"))
    +    // scalastyle:on
    +    columnGroups.foreach { case (column1, column2) =>
    +      withTable(tableName) {
    +        withSQLConf("spark.sql.caseSensitive" -> "true") {
    +          sql(s"CREATE TABLE $tableName (`$column1` int, `$column2` 
double) USING PARQUET")
    +          sql(s"INSERT INTO $tableName SELECT 1, 3.0")
    +          sql(s"ANALYZE TABLE $tableName COMPUTE STATISTICS FOR COLUMNS 
`$column1`, `$column2`")
    +          val readback = spark.table(tableName)
    +          val relations = readback.queryExecution.analyzed.collect { case 
rel: LogicalRelation =>
    +            val columnStats = rel.catalogTable.get.stats.get.colStats
    +            assert(columnStats.size == 2)
    +            StatisticsTest.checkColStat(
    +              dataType = IntegerType,
    +              colStat = columnStats(column1),
    +              expectedColStat = ColumnStat(InternalRow(0L, 1, 1, 1L)),
    +              rsd = spark.sessionState.conf.ndvMaxError)
    +            StatisticsTest.checkColStat(
    +              dataType = DoubleType,
    +              colStat = columnStats(column2),
    +              expectedColStat = ColumnStat(InternalRow(0L, 3.0d, 3.0d, 
1L)),
    +              rsd = spark.sessionState.conf.ndvMaxError)
    +            rel
    +          }
    +          assert(relations.size == 1)
    +        }
    +      }
    +    }
    +  }
    +
    +  test("test refreshing statistics of cached data source table") {
    +    val tableName = "tbl"
    +    withTable(tableName) {
    +      val tableIndent = TableIdentifier(tableName, Some("default"))
    +      val catalog = 
spark.sessionState.catalog.asInstanceOf[HiveSessionCatalog]
    +      sql(s"CREATE TABLE $tableName (key int) USING PARQUET")
    +      sql(s"INSERT INTO $tableName SELECT 1")
    +      sql(s"ANALYZE TABLE $tableName COMPUTE STATISTICS")
    +      sql(s"ANALYZE TABLE $tableName COMPUTE STATISTICS FOR COLUMNS key")
    +      // Table lookup will make the table cached.
    +      catalog.lookupRelation(tableIndent)
    +
    +      val cachedTable1 = catalog.getCachedDataSourceTable(tableIndent)
    +      assert(cachedTable1.statistics.sizeInBytes > 0)
    +      assert(cachedTable1.statistics.rowCount.contains(1))
    +      StatisticsTest.checkColStat(
    +        dataType = IntegerType,
    +        colStat = cachedTable1.statistics.colStats("key"),
    +        expectedColStat = ColumnStat(InternalRow(0L, 1, 1, 1L)),
    +        rsd = spark.sessionState.conf.ndvMaxError)
    +
    +      sql(s"INSERT INTO $tableName SELECT 2")
    +      sql(s"ANALYZE TABLE $tableName COMPUTE STATISTICS")
    +      sql(s"ANALYZE TABLE $tableName COMPUTE STATISTICS FOR COLUMNS key")
    --- End diff --
    
    The above both DDL will call `refreshTable` with the same table name. 
Right? If the source codes remove any `refreshTable`, the test case still 
passes. Right?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15360: [SPARK-17073] [SQL] [FOLLOWUP] generate column-le...

Reply via email to