Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21056#discussion_r183227071 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -2128,38 +2128,77 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData { } } - test("SPARK-23849: schema inferring touches less data if samplingRation < 1.0") { - val predefinedSample = Set[Int](2, 8, 15, 27, 30, 34, 35, 37, 44, 46, + val sampledTestData = (value: java.lang.Long) => { --- End diff -- @MaxGekk, can we have the data in `TestJsonData`, for example, ```scala def sampledTestData: Dataset[String] = spark.createDataset(spark.sparkContext.parallelize( ... )(Encoders.STRING) ``` and use it, for example, `sampledTestData.coalesce(1)` in `JsonSuite`?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org