yyh2954360585 commented on issue #9471: URL: https://github.com/apache/hudi/issues/9471#issuecomment-1687328500
> > @yyh2954360585 JDBC is slow and put lot of load on source system. So full query a full query on large table can cause high load or even downtime to the database server. You can set the value of source-limit according to your dataset and requirement. You can even set it to a very high value. > > If I set source limit=1000, then I can only extract 1000 pieces of data from the source table, which is not reasonable. Because it has no offset. > > https://github.com/apache/hudi/blob/ba5ab8ca46863a67023e7172fb16a9a36d3b5acb/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JdbcSource.java#L239-L252 So there will be another issue. If I have a table with a data volume of 10 million and do not set the source limit, it will perform a full query on the source table. PpdQuery is a subquery that, according to the SQL execution plan, executes the subquery first and then the outer layer. If using jdbc. fetchsize, the condition for fetchsize will only be at the outermost layer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org