[GitHub] spark pull request #22759: [MINOR][SQL][DOC] Correct parquet nullability doc...

srowen Thu, 06 Dec 2018 06:36:56 -0800

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22759#discussion_r239475203
  
    --- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
 ---
    @@ -542,6 +551,35 @@ class DataFrameReaderWriterSuite extends QueryTest 
with SharedSQLContext with Be
         }
       }
     
    +  test("parquet - column nullability -- write only") {
    +    val schema = StructType(
    +      StructField("cl1", IntegerType, nullable = false) ::
    +        StructField("cl2", IntegerType, nullable = true) :: Nil)
    +    val row = Row(3, 4)
    +    val df = spark.createDataFrame(sparkContext.parallelize(row :: Nil), 
schema)
    +
    +    withTempPath { dir =>
    +      val path = dir.getAbsolutePath
    +      df.write.mode("overwrite").parquet(path)
    +      val file = SpecificParquetRecordReaderBase.listDirectory(dir).get(0)
    +
    +      val hadoopInputFile = HadoopInputFile.fromPath(new Path(file), new 
Configuration())
    +      val f = ParquetFileReader.open(hadoopInputFile)
    +      val parquetSchema = f.getFileMetaData.getSchema.getColumns.asScala
    +                          .map(_.getPrimitiveType)
    +      f.close
    +
    +      // the write keeps nullable info from the schema
    +      val expectedParquetSchema: Seq[PrimitiveType] = Seq(
    +        new PrimitiveType(Repetition.REQUIRED, PrimitiveTypeName.INT32, 
"cl1"),
    +        new PrimitiveType(Repetition.OPTIONAL, PrimitiveTypeName.INT32, 
"cl2")
    +      )
    +
    +      assert (expectedParquetSchema == parquetSchema)
    --- End diff --
    
    Nit: I think ideally you use the `===` test operator, so that failures 
generated a better message



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22759: [MINOR][SQL][DOC] Correct parquet nullability doc...

Reply via email to