[GitHub] spark pull request #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow...

felixcheung Fri, 21 Sep 2018 00:22:02 -0700

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22275#discussion_r219404072
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -4434,6 +4434,12 @@ def test_timestamp_dst(self):
             self.assertPandasEqual(pdf, df_from_python.toPandas())
             self.assertPandasEqual(pdf, df_from_pandas.toPandas())
     
    +    def test_toPandas_batch_order(self):
    +        df = self.spark.range(64, numPartitions=8).toDF("a")
    +        with 
self.sql_conf({"spark.sql.execution.arrow.maxRecordsPerBatch": 4}):
    +            pdf, pdf_arrow = self._toPandas_arrow_toggle(df)
    +            self.assertPandasEqual(pdf, pdf_arrow)
    --- End diff --
    
    hm, is this test case "enough" to trigger any possible problem just by 
random? would increasing the number of batch or num record per batch increase 
the chance of streaming order or concurrency issue perhaps?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow...

Reply via email to