pan3793 commented on code in PR #56092:
URL: https://github.com/apache/spark/pull/56092#discussion_r3299615400


##########
connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala:
##########
@@ -789,6 +802,27 @@ private[v2] trait V2JDBCTest
         checkSamplePushed(df8, false)
         checkFilterPushed(df8)
         assert(df8.collect().length < 10)
+
+        // SYSTEM sampling pushdown
+        if (supportsTableSampleSystem) {
+          val df9 = sql(s"SELECT * FROM $catalogName.new_table $tableOptions " 
+
+            "TABLESAMPLE SYSTEM (50 PERCENT)")
+          checkSamplePushed(df9)
+          if (partitioningEnabled) {
+            multiplePartitionAdditionalCheck(df1, partitionInfo)
+          }
+          assert(df6.collect().length <= 10)

Review Comment:
   fix the copy-paste issue - wrong variable reference.
   
   > With PG TABLESAMPLE SYSTEM on a 10-row, single-block table at 50%, the 
result will commonly be all 10 rows or 0 rows, so consider asserting something 
stronger than `<= 10`
   
   it's true, but I think it's about PG implementation details, not something 
in the contract - it has no guarantee that a few rows will be stored in a 
single physical block, so I keep the `<= 10` assertion.



##########
connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala:
##########
@@ -789,6 +802,27 @@ private[v2] trait V2JDBCTest
         checkSamplePushed(df8, false)
         checkFilterPushed(df8)
         assert(df8.collect().length < 10)
+
+        // SYSTEM sampling pushdown
+        if (supportsTableSampleSystem) {
+          val df9 = sql(s"SELECT * FROM $catalogName.new_table $tableOptions " 
+
+            "TABLESAMPLE SYSTEM (50 PERCENT)")
+          checkSamplePushed(df9)
+          if (partitioningEnabled) {
+            multiplePartitionAdditionalCheck(df1, partitionInfo)
+          }
+          assert(df6.collect().length <= 10)

Review Comment:
   fixed the copy-paste issue - wrong variable reference.
   
   > With PG TABLESAMPLE SYSTEM on a 10-row, single-block table at 50%, the 
result will commonly be all 10 rows or 0 rows, so consider asserting something 
stronger than `<= 10`
   
   it's true, but I think it's about PG implementation details, not something 
in the contract - it has no guarantee that a few rows will be stored in a 
single physical block, so I keep the `<= 10` assertion.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to