[GitHub] [spark] cloud-fan commented on a diff in pull request #40359: [SPARK-42740][SQL] Fix the bug that pushdown offset or paging is invalid for some built-in dialect

2023-03-10 Thread via GitHub


cloud-fan commented on code in PR #40359:
URL: https://github.com/apache/spark/pull/40359#discussion_r1132460230


##
sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala:
##
@@ -181,19 +181,35 @@ private case object OracleDialect extends JdbcDialect {
 if (limit > 0) s"WHERE rownum <= $limit" else ""
   }
 
+  override def getOffsetClause(offset: Integer): String = {
+// Oracle doesn't support OFFSET clause.
+// We can use rownum > n to skip some rows in the result set.
+// Note: rn is an alias of rownum.
+if (offset > 0) s"WHERE rn > $offset" else ""
+  }
+
   class OracleSQLQueryBuilder(dialect: JdbcDialect, options: JDBCOptions)
 extends JdbcSQLQueryBuilder(dialect, options) {
 
-// TODO[SPARK-42289]: DS V2 pushdown could let JDBC dialect decide to push 
down offset
 override def build(): String = {
   val selectStmt = s"SELECT $columnList FROM ${options.tableOrQuery} 
$tableSampleClause" +
 s" $whereClause $groupByClause $orderByClause"
-  if (limit > 0) {
-val limitClause = dialect.getLimitClause(limit)
-options.prepareQuery + s"SELECT tab.* FROM ($selectStmt) tab 
$limitClause"
+  val finalSelectStmt = if (limit > 0) {
+if (offset > 0) {
+  s"SELECT $columnList FROM (SELECT tab.*, rownum rn FROM 
($selectStmt) tab)" +

Review Comment:
   or we can use the new syntax in oracle 12+, which should be the widely used 
versions today.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #40359: [SPARK-42740][SQL] Fix the bug that pushdown offset or paging is invalid for some built-in dialect

2023-03-10 Thread via GitHub


cloud-fan commented on code in PR #40359:
URL: https://github.com/apache/spark/pull/40359#discussion_r1132457170


##
sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala:
##
@@ -181,19 +181,35 @@ private case object OracleDialect extends JdbcDialect {
 if (limit > 0) s"WHERE rownum <= $limit" else ""
   }
 
+  override def getOffsetClause(offset: Integer): String = {
+// Oracle doesn't support OFFSET clause.
+// We can use rownum > n to skip some rows in the result set.
+// Note: rn is an alias of rownum.
+if (offset > 0) s"WHERE rn > $offset" else ""
+  }
+
   class OracleSQLQueryBuilder(dialect: JdbcDialect, options: JDBCOptions)
 extends JdbcSQLQueryBuilder(dialect, options) {
 
-// TODO[SPARK-42289]: DS V2 pushdown could let JDBC dialect decide to push 
down offset
 override def build(): String = {
   val selectStmt = s"SELECT $columnList FROM ${options.tableOrQuery} 
$tableSampleClause" +
 s" $whereClause $groupByClause $orderByClause"
-  if (limit > 0) {
-val limitClause = dialect.getLimitClause(limit)
-options.prepareQuery + s"SELECT tab.* FROM ($selectStmt) tab 
$limitClause"
+  val finalSelectStmt = if (limit > 0) {
+if (offset > 0) {
+  s"SELECT $columnList FROM (SELECT tab.*, rownum rn FROM 
($selectStmt) tab)" +

Review Comment:
   
https://stackoverflow.com/questions/31186166/oracle-sql-filtering-by-rownum-not-returning-results-when-it-should
   
   Let's mention the reason as well: the `rownum` is calculated when the value 
is returned.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #40359: [SPARK-42740][SQL] Fix the bug that pushdown offset or paging is invalid for some built-in dialect

2023-03-10 Thread via GitHub


cloud-fan commented on code in PR #40359:
URL: https://github.com/apache/spark/pull/40359#discussion_r1132448651


##
sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala:
##
@@ -291,4 +291,26 @@ private case object MySQLDialect extends JdbcDialect with 
SQLConfHelper {
   throw QueryExecutionErrors.unsupportedDropNamespaceRestrictError()
 }
   }
+
+  class MySQLSQLQueryBuilder(dialect: JdbcDialect, options: JDBCOptions)
+extends JdbcSQLQueryBuilder(dialect, options) {
+
+override def build(): String = {
+  if (limit < 1 && offset > 0) {

Review Comment:
   is it possible to have limit = 0? seems safer to use `limit < 0` to indicate 
no limit.



##
sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala:
##
@@ -291,4 +291,26 @@ private case object MySQLDialect extends JdbcDialect with 
SQLConfHelper {
   throw QueryExecutionErrors.unsupportedDropNamespaceRestrictError()
 }
   }
+
+  class MySQLSQLQueryBuilder(dialect: JdbcDialect, options: JDBCOptions)
+extends JdbcSQLQueryBuilder(dialect, options) {
+
+override def build(): String = {
+  if (limit < 1 && offset > 0) {

Review Comment:
   is it possible to have limit = 0? seems safer to use `limit < 0` to indicate 
no limit, as the default value is -1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #40359: [SPARK-42740][SQL] Fix the bug that pushdown offset or paging is invalid for some built-in dialect

2023-03-09 Thread via GitHub


cloud-fan commented on code in PR #40359:
URL: https://github.com/apache/spark/pull/40359#discussion_r1132007081


##
sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala:
##
@@ -181,19 +181,35 @@ private case object OracleDialect extends JdbcDialect {
 if (limit > 0) s"WHERE rownum <= $limit" else ""
   }
 
+  override def getOffsetClause(offset: Integer): String = {
+// Oracle doesn't support OFFSET clause.
+// We can use rownum > n to skip some rows in the result set.
+// Note: rn is an alias of rownum.
+if (offset > 0) s"WHERE rn > $offset" else ""
+  }
+
   class OracleSQLQueryBuilder(dialect: JdbcDialect, options: JDBCOptions)
 extends JdbcSQLQueryBuilder(dialect, options) {
 
-// TODO[SPARK-42289]: DS V2 pushdown could let JDBC dialect decide to push 
down offset
 override def build(): String = {
   val selectStmt = s"SELECT $columnList FROM ${options.tableOrQuery} 
$tableSampleClause" +
 s" $whereClause $groupByClause $orderByClause"
-  if (limit > 0) {
-val limitClause = dialect.getLimitClause(limit)
-options.prepareQuery + s"SELECT tab.* FROM ($selectStmt) tab 
$limitClause"
+  val finalSelectStmt = if (limit > 0) {
+if (offset > 0) {
+  s"SELECT $columnList FROM (SELECT tab.*, rownum rn FROM 
($selectStmt) tab)" +

Review Comment:
   how about
   ```
   SELECT * FROM ($selectStmt) tab WHERE rownum > ...
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #40359: [SPARK-42740][SQL] Fix the bug that pushdown offset or paging is invalid for some built-in dialect

2023-03-09 Thread via GitHub


cloud-fan commented on code in PR #40359:
URL: https://github.com/apache/spark/pull/40359#discussion_r1132005986


##
sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala:
##
@@ -181,19 +181,35 @@ private case object OracleDialect extends JdbcDialect {
 if (limit > 0) s"WHERE rownum <= $limit" else ""
   }
 
+  override def getOffsetClause(offset: Integer): String = {
+// Oracle doesn't support OFFSET clause.
+// We can use rownum > n to skip some rows in the result set.
+// Note: rn is an alias of rownum.

Review Comment:
   nvm, I see the implementation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #40359: [SPARK-42740][SQL] Fix the bug that pushdown offset or paging is invalid for some built-in dialect

2023-03-09 Thread via GitHub


cloud-fan commented on code in PR #40359:
URL: https://github.com/apache/spark/pull/40359#discussion_r1132005699


##
sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala:
##
@@ -181,19 +181,35 @@ private case object OracleDialect extends JdbcDialect {
 if (limit > 0) s"WHERE rownum <= $limit" else ""
   }
 
+  override def getOffsetClause(offset: Integer): String = {
+// Oracle doesn't support OFFSET clause.
+// We can use rownum > n to skip some rows in the result set.
+// Note: rn is an alias of rownum.

Review Comment:
   every table in oracle has the `rn` column?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #40359: [SPARK-42740][SQL] Fix the bug that pushdown offset or paging is invalid for some built-in dialect

2023-03-09 Thread via GitHub


cloud-fan commented on code in PR #40359:
URL: https://github.com/apache/spark/pull/40359#discussion_r1132005206


##
sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala:
##
@@ -291,4 +291,22 @@ private case object MySQLDialect extends JdbcDialect with 
SQLConfHelper {
   throw QueryExecutionErrors.unsupportedDropNamespaceRestrictError()
 }
   }
+
+  class MySQLSQLQueryBuilder(dialect: JdbcDialect, options: JDBCOptions)
+extends JdbcSQLQueryBuilder(dialect, options) {
+
+override def build(): String = {
+  if (limit < 1 && offset > 0) {
+val offsetClause = dialect.getOffsetClause(offset)
+options.prepareQuery +
+  s"SELECT $columnList FROM ${options.tableOrQuery} 
$tableSampleClause" +
+  s" $whereClause $groupByClause $orderByClause LIMIT 
18446744073709551610 $offsetClause"

Review Comment:
   what does this `LIMIT 18446744073709551610` mean?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #40359: [SPARK-42740][SQL] Fix the bug that pushdown offset or paging is invalid for some built-in dialect

2023-03-09 Thread via GitHub


cloud-fan commented on code in PR #40359:
URL: https://github.com/apache/spark/pull/40359#discussion_r1132004161


##
connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala:
##
@@ -410,6 +410,15 @@ private[v2] trait V2JDBCTest extends SharedSparkSession 
with DockerIntegrationFu
 assert(sorts.isEmpty)
   }
 
+  private def checkOffsetPushed(df: DataFrame, offset: Option[Int]): Unit = {

Review Comment:
   can we rename `limitPushed` to `checkLimitPushed` and follow the 
implementation here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org