suryaprasanna commented on code in PR #17931:
URL: https://github.com/apache/hudi/pull/17931#discussion_r2730572924
##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/validator/SqlQueryEqualityPreCommitValidator.java:
##########
@@ -52,23 +52,34 @@ protected String getQueryConfigName() {
@Override
protected void validateUsingQuery(String query, String prevTableSnapshot,
String newTableSnapshot, SQLContext sqlContext) {
- Dataset<Row> prevRows = executeSqlQuery(
- sqlContext, query, prevTableSnapshot, "previous state").cache();
- log.info("Total rows in prevRows " + prevRows.count());
- Dataset<Row> newRows = executeSqlQuery(
- sqlContext, query, newTableSnapshot, "new state").cache();
- log.info("Total rows in newRows " + newRows.count());
- printAllRowsIfDebugEnabled(prevRows);
- printAllRowsIfDebugEnabled(newRows);
- boolean areDatasetsEqual = prevRows.intersect(newRows).count() ==
prevRows.count();
- log.info("Completed Equality Validation, datasets equal? " +
areDatasetsEqual);
- if (!areDatasetsEqual) {
- log.error("query validation failed. See stdout for sample query results.
Query: " + query);
- System.out.println("Expected result (sample records only):");
- prevRows.show();
- System.out.println("Actual result (sample records only):");
- newRows.show();
- throw new HoodieValidationException("Query validation failed for '" +
query + "'. See stdout for expected vs actual records");
+ Dataset<Row> prevRows = null;
+ Dataset<Row> newRows = null;
+ try {
+ prevRows = executeSqlQuery(
+ sqlContext, query, prevTableSnapshot, "previous state").cache();
+ log.info("Total rows in prevRows " + prevRows.count());
+ newRows = executeSqlQuery(
+ sqlContext, query, newTableSnapshot, "new state").cache();
+ log.info("Total rows in newRows " + newRows.count());
+ printAllRowsIfDebugEnabled(prevRows);
+ printAllRowsIfDebugEnabled(newRows);
+ boolean areDatasetsEqual = prevRows.intersect(newRows).count() ==
prevRows.count();
+ log.info("Completed Equality Validation, datasets equal? " +
areDatasetsEqual);
+ if (!areDatasetsEqual) {
+ log.error("query validation failed. See stdout for sample query
results. Query: " + query);
+ System.out.println("Expected result (sample records only):");
Review Comment:
The reason why the `System.out.println` statements were used is because,
when we execute show method on dataframe. The logs are prinited in stdout
instead of stderr. So, as a complementary these specific log statements are
also added as part of the stdout.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]