[GitHub] [spark] cloud-fan commented on a diff in pull request #40359: [SPARK-42740][SQL] Fix the bug that pushdown offset or paging is invalid for some built-in dialect
cloud-fan commented on code in PR #40359: URL: https://github.com/apache/spark/pull/40359#discussion_r1132460230 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala: ## @@ -181,19 +181,35 @@ private case object OracleDialect extends JdbcDialect { if (limit > 0) s"WHERE rownum <= $limit" else "" } + override def getOffsetClause(offset: Integer): String = { +// Oracle doesn't support OFFSET clause. +// We can use rownum > n to skip some rows in the result set. +// Note: rn is an alias of rownum. +if (offset > 0) s"WHERE rn > $offset" else "" + } + class OracleSQLQueryBuilder(dialect: JdbcDialect, options: JDBCOptions) extends JdbcSQLQueryBuilder(dialect, options) { -// TODO[SPARK-42289]: DS V2 pushdown could let JDBC dialect decide to push down offset override def build(): String = { val selectStmt = s"SELECT $columnList FROM ${options.tableOrQuery} $tableSampleClause" + s" $whereClause $groupByClause $orderByClause" - if (limit > 0) { -val limitClause = dialect.getLimitClause(limit) -options.prepareQuery + s"SELECT tab.* FROM ($selectStmt) tab $limitClause" + val finalSelectStmt = if (limit > 0) { +if (offset > 0) { + s"SELECT $columnList FROM (SELECT tab.*, rownum rn FROM ($selectStmt) tab)" + Review Comment: or we can use the new syntax in oracle 12+, which should be the widely used versions today. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #40359: [SPARK-42740][SQL] Fix the bug that pushdown offset or paging is invalid for some built-in dialect
cloud-fan commented on code in PR #40359: URL: https://github.com/apache/spark/pull/40359#discussion_r1132457170 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala: ## @@ -181,19 +181,35 @@ private case object OracleDialect extends JdbcDialect { if (limit > 0) s"WHERE rownum <= $limit" else "" } + override def getOffsetClause(offset: Integer): String = { +// Oracle doesn't support OFFSET clause. +// We can use rownum > n to skip some rows in the result set. +// Note: rn is an alias of rownum. +if (offset > 0) s"WHERE rn > $offset" else "" + } + class OracleSQLQueryBuilder(dialect: JdbcDialect, options: JDBCOptions) extends JdbcSQLQueryBuilder(dialect, options) { -// TODO[SPARK-42289]: DS V2 pushdown could let JDBC dialect decide to push down offset override def build(): String = { val selectStmt = s"SELECT $columnList FROM ${options.tableOrQuery} $tableSampleClause" + s" $whereClause $groupByClause $orderByClause" - if (limit > 0) { -val limitClause = dialect.getLimitClause(limit) -options.prepareQuery + s"SELECT tab.* FROM ($selectStmt) tab $limitClause" + val finalSelectStmt = if (limit > 0) { +if (offset > 0) { + s"SELECT $columnList FROM (SELECT tab.*, rownum rn FROM ($selectStmt) tab)" + Review Comment: https://stackoverflow.com/questions/31186166/oracle-sql-filtering-by-rownum-not-returning-results-when-it-should Let's mention the reason as well: the `rownum` is calculated when the value is returned. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #40359: [SPARK-42740][SQL] Fix the bug that pushdown offset or paging is invalid for some built-in dialect
cloud-fan commented on code in PR #40359: URL: https://github.com/apache/spark/pull/40359#discussion_r1132448651 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala: ## @@ -291,4 +291,26 @@ private case object MySQLDialect extends JdbcDialect with SQLConfHelper { throw QueryExecutionErrors.unsupportedDropNamespaceRestrictError() } } + + class MySQLSQLQueryBuilder(dialect: JdbcDialect, options: JDBCOptions) +extends JdbcSQLQueryBuilder(dialect, options) { + +override def build(): String = { + if (limit < 1 && offset > 0) { Review Comment: is it possible to have limit = 0? seems safer to use `limit < 0` to indicate no limit. ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala: ## @@ -291,4 +291,26 @@ private case object MySQLDialect extends JdbcDialect with SQLConfHelper { throw QueryExecutionErrors.unsupportedDropNamespaceRestrictError() } } + + class MySQLSQLQueryBuilder(dialect: JdbcDialect, options: JDBCOptions) +extends JdbcSQLQueryBuilder(dialect, options) { + +override def build(): String = { + if (limit < 1 && offset > 0) { Review Comment: is it possible to have limit = 0? seems safer to use `limit < 0` to indicate no limit, as the default value is -1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #40359: [SPARK-42740][SQL] Fix the bug that pushdown offset or paging is invalid for some built-in dialect
cloud-fan commented on code in PR #40359: URL: https://github.com/apache/spark/pull/40359#discussion_r1132007081 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala: ## @@ -181,19 +181,35 @@ private case object OracleDialect extends JdbcDialect { if (limit > 0) s"WHERE rownum <= $limit" else "" } + override def getOffsetClause(offset: Integer): String = { +// Oracle doesn't support OFFSET clause. +// We can use rownum > n to skip some rows in the result set. +// Note: rn is an alias of rownum. +if (offset > 0) s"WHERE rn > $offset" else "" + } + class OracleSQLQueryBuilder(dialect: JdbcDialect, options: JDBCOptions) extends JdbcSQLQueryBuilder(dialect, options) { -// TODO[SPARK-42289]: DS V2 pushdown could let JDBC dialect decide to push down offset override def build(): String = { val selectStmt = s"SELECT $columnList FROM ${options.tableOrQuery} $tableSampleClause" + s" $whereClause $groupByClause $orderByClause" - if (limit > 0) { -val limitClause = dialect.getLimitClause(limit) -options.prepareQuery + s"SELECT tab.* FROM ($selectStmt) tab $limitClause" + val finalSelectStmt = if (limit > 0) { +if (offset > 0) { + s"SELECT $columnList FROM (SELECT tab.*, rownum rn FROM ($selectStmt) tab)" + Review Comment: how about ``` SELECT * FROM ($selectStmt) tab WHERE rownum > ... ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #40359: [SPARK-42740][SQL] Fix the bug that pushdown offset or paging is invalid for some built-in dialect
cloud-fan commented on code in PR #40359: URL: https://github.com/apache/spark/pull/40359#discussion_r1132005986 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala: ## @@ -181,19 +181,35 @@ private case object OracleDialect extends JdbcDialect { if (limit > 0) s"WHERE rownum <= $limit" else "" } + override def getOffsetClause(offset: Integer): String = { +// Oracle doesn't support OFFSET clause. +// We can use rownum > n to skip some rows in the result set. +// Note: rn is an alias of rownum. Review Comment: nvm, I see the implementation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #40359: [SPARK-42740][SQL] Fix the bug that pushdown offset or paging is invalid for some built-in dialect
cloud-fan commented on code in PR #40359: URL: https://github.com/apache/spark/pull/40359#discussion_r1132005699 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala: ## @@ -181,19 +181,35 @@ private case object OracleDialect extends JdbcDialect { if (limit > 0) s"WHERE rownum <= $limit" else "" } + override def getOffsetClause(offset: Integer): String = { +// Oracle doesn't support OFFSET clause. +// We can use rownum > n to skip some rows in the result set. +// Note: rn is an alias of rownum. Review Comment: every table in oracle has the `rn` column? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #40359: [SPARK-42740][SQL] Fix the bug that pushdown offset or paging is invalid for some built-in dialect
cloud-fan commented on code in PR #40359: URL: https://github.com/apache/spark/pull/40359#discussion_r1132005206 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala: ## @@ -291,4 +291,22 @@ private case object MySQLDialect extends JdbcDialect with SQLConfHelper { throw QueryExecutionErrors.unsupportedDropNamespaceRestrictError() } } + + class MySQLSQLQueryBuilder(dialect: JdbcDialect, options: JDBCOptions) +extends JdbcSQLQueryBuilder(dialect, options) { + +override def build(): String = { + if (limit < 1 && offset > 0) { +val offsetClause = dialect.getOffsetClause(offset) +options.prepareQuery + + s"SELECT $columnList FROM ${options.tableOrQuery} $tableSampleClause" + + s" $whereClause $groupByClause $orderByClause LIMIT 18446744073709551610 $offsetClause" Review Comment: what does this `LIMIT 18446744073709551610` mean? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #40359: [SPARK-42740][SQL] Fix the bug that pushdown offset or paging is invalid for some built-in dialect
cloud-fan commented on code in PR #40359: URL: https://github.com/apache/spark/pull/40359#discussion_r1132004161 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala: ## @@ -410,6 +410,15 @@ private[v2] trait V2JDBCTest extends SharedSparkSession with DockerIntegrationFu assert(sorts.isEmpty) } + private def checkOffsetPushed(df: DataFrame, offset: Option[Int]): Unit = { Review Comment: can we rename `limitPushed` to `checkLimitPushed` and follow the implementation here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org