Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22275#discussion_r219404072
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -4434,6 +4434,12 @@ def test_timestamp_dst(self):
             self.assertPandasEqual(pdf, df_from_python.toPandas())
             self.assertPandasEqual(pdf, df_from_pandas.toPandas())
     
    +    def test_toPandas_batch_order(self):
    +        df = self.spark.range(64, numPartitions=8).toDF("a")
    +        with 
self.sql_conf({"spark.sql.execution.arrow.maxRecordsPerBatch": 4}):
    +            pdf, pdf_arrow = self._toPandas_arrow_toggle(df)
    +            self.assertPandasEqual(pdf, pdf_arrow)
    --- End diff --
    
    hm, is this test case "enough" to trigger any possible problem just by 
random? would increasing the number of batch or num record per batch increase 
the chance of streaming order or concurrency issue perhaps?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to