Re: [PR] [MINOR] Fix usages of orElse [hudi]

2024-01-10 Thread via GitHub


yihua merged PR #10435:
URL: https://github.com/apache/hudi/pull/10435


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Fix usages of orElse [hudi]

2024-01-10 Thread via GitHub


the-other-tim-brown commented on code in PR #10435:
URL: https://github.com/apache/hudi/pull/10435#discussion_r1447758324


##
hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/StreamSync.java:
##
@@ -775,9 +775,11 @@ private Pair, JavaRDD> 
writeToSinkAndDoMetaSync(Stri
   
Timer.Context overallTimerContext) {
 Option scheduledCompactionInstant = Option.empty();
 // write to hudi and fetch result
+long startWrite = System.currentTimeMillis();

Review Comment:
   Yes, cleaning up now
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Fix usages of orElse [hudi]

2024-01-10 Thread via GitHub


yihua commented on code in PR #10435:
URL: https://github.com/apache/hudi/pull/10435#discussion_r1447741360


##
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieSparkUtils.scala:
##
@@ -107,23 +107,19 @@ object HoodieSparkUtils extends SparkAdapterSupport with 
SparkVersionsSupport wi
 //   injecting [[SQLConf]], which by default isn't propagated by Spark 
to the executor(s).
 //   [[SQLConf]] is required by [[AvroSerializer]]
 injectSQLConf(df.queryExecution.toRdd.mapPartitions { rows =>
-  if (rows.isEmpty) {
-Iterator.empty

Review Comment:
   Got it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Fix usages of orElse [hudi]

2024-01-10 Thread via GitHub


yihua commented on code in PR #10435:
URL: https://github.com/apache/hudi/pull/10435#discussion_r1447740407


##
hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/StreamSync.java:
##
@@ -775,9 +775,11 @@ private Pair, JavaRDD> 
writeToSinkAndDoMetaSync(Stri
   
Timer.Context overallTimerContext) {
 Option scheduledCompactionInstant = Option.empty();
 // write to hudi and fetch result
+long startWrite = System.currentTimeMillis();

Review Comment:
   This should be removed?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Fix usages of orElse [hudi]

2024-01-06 Thread via GitHub


hudi-bot commented on PR #10435:
URL: https://github.com/apache/hudi/pull/10435#issuecomment-1879831660

   
   ## CI report:
   
   * 9674bfb02ef5142e13d9f327ff40fa7b9eed4596 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21852)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Fix usages of orElse [hudi]

2024-01-06 Thread via GitHub


hudi-bot commented on PR #10435:
URL: https://github.com/apache/hudi/pull/10435#issuecomment-1879762105

   
   ## CI report:
   
   * 2684527c2add9d1dc6f4f81bcf874f9b5ff86414 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21811)
 
   * 9674bfb02ef5142e13d9f327ff40fa7b9eed4596 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21852)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Fix usages of orElse [hudi]

2024-01-06 Thread via GitHub


hudi-bot commented on PR #10435:
URL: https://github.com/apache/hudi/pull/10435#issuecomment-1879760094

   
   ## CI report:
   
   * 2684527c2add9d1dc6f4f81bcf874f9b5ff86414 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21811)
 
   * 9674bfb02ef5142e13d9f327ff40fa7b9eed4596 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Fix usages of orElse [hudi]

2024-01-06 Thread via GitHub


the-other-tim-brown commented on code in PR #10435:
URL: https://github.com/apache/hudi/pull/10435#discussion_r1443803996


##
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieSparkUtils.scala:
##
@@ -107,23 +107,19 @@ object HoodieSparkUtils extends SparkAdapterSupport with 
SparkVersionsSupport wi
 //   injecting [[SQLConf]], which by default isn't propagated by Spark 
to the executor(s).
 //   [[SQLConf]] is required by [[AvroSerializer]]
 injectSQLConf(df.queryExecution.toRdd.mapPartitions { rows =>
-  if (rows.isEmpty) {
-Iterator.empty

Review Comment:
   I was looking at the wrong place. This is the spot that needed to be 
updated: 
https://github.com/apache/hudi/pull/10435/files#diff-21ccee08cebace9de801e104f356bf4333c017d5d5c0cac4e3de63c7861f3c13R100



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Fix usages of orElse [hudi]

2024-01-03 Thread via GitHub


hudi-bot commented on PR #10435:
URL: https://github.com/apache/hudi/pull/10435#issuecomment-1875758780

   
   ## CI report:
   
   * 2684527c2add9d1dc6f4f81bcf874f9b5ff86414 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21811)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Fix usages of orElse [hudi]

2024-01-03 Thread via GitHub


hudi-bot commented on PR #10435:
URL: https://github.com/apache/hudi/pull/10435#issuecomment-1875439908

   
   ## CI report:
   
   * 402d1eb5d0ea586d3a4afbf736dbb843809f7bb4 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21796)
 
   * 2684527c2add9d1dc6f4f81bcf874f9b5ff86414 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21811)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Fix usages of orElse [hudi]

2024-01-03 Thread via GitHub


hudi-bot commented on PR #10435:
URL: https://github.com/apache/hudi/pull/10435#issuecomment-1875429126

   
   ## CI report:
   
   * 402d1eb5d0ea586d3a4afbf736dbb843809f7bb4 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21796)
 
   * 2684527c2add9d1dc6f4f81bcf874f9b5ff86414 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Fix usages of orElse [hudi]

2024-01-03 Thread via GitHub


the-other-tim-brown commented on code in PR #10435:
URL: https://github.com/apache/hudi/pull/10435#discussion_r1440442408


##
hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/StreamSync.java:
##
@@ -402,7 +402,9 @@ public Pair, JavaRDD> 
syncOnce() throws IOException
 .build();
 String instantTime = metaClient.createNewInstantTime();
 
+long startInput = System.currentTimeMillis();
 InputBatch inputBatch = readFromSource(instantTime, metaClient);
+LOG.error("Time to read from source : " + (System.currentTimeMillis() - 
startInput));

Review Comment:
   This was from my own debugging of slow tests. I will revert



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Fix usages of orElse [hudi]

2024-01-03 Thread via GitHub


the-other-tim-brown commented on code in PR #10435:
URL: https://github.com/apache/hudi/pull/10435#discussion_r1440441996


##
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieSparkUtils.scala:
##
@@ -107,23 +107,19 @@ object HoodieSparkUtils extends SparkAdapterSupport with 
SparkVersionsSupport wi
 //   injecting [[SQLConf]], which by default isn't propagated by Spark 
to the executor(s).
 //   [[SQLConf]] is required by [[AvroSerializer]]
 injectSQLConf(df.queryExecution.toRdd.mapPartitions { rows =>
-  if (rows.isEmpty) {
-Iterator.empty

Review Comment:
   Yes, this will trigger the dag to see if it is in fact returning an empty 
set of rows. You can see this on the spark UI when running your jobs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Fix usages of orElse [hudi]

2024-01-02 Thread via GitHub


yihua commented on code in PR #10435:
URL: https://github.com/apache/hudi/pull/10435#discussion_r1440140227


##
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieSparkUtils.scala:
##
@@ -107,23 +107,19 @@ object HoodieSparkUtils extends SparkAdapterSupport with 
SparkVersionsSupport wi
 //   injecting [[SQLConf]], which by default isn't propagated by Spark 
to the executor(s).
 //   [[SQLConf]] is required by [[AvroSerializer]]
 injectSQLConf(df.queryExecution.toRdd.mapPartitions { rows =>
-  if (rows.isEmpty) {
-Iterator.empty

Review Comment:
   Does removal of this provide any benefit?



##
hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/StreamSync.java:
##
@@ -448,7 +450,9 @@ public Pair, JavaRDD> 
syncOnce() throws IOException
 }
   }
 
+  long startWrite = System.currentTimeMillis();

Review Comment:
   Similar here on using `HoodieTimer` and below.



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java:
##
@@ -1016,7 +1016,7 @@ private List 
getInstantsToRollbackForLazyCleanPolicy(HoodieTableMetaClie
   @Deprecated
   public boolean rollback(final String commitInstantTime, 
Option pendingRollbackInfo, boolean skipLocking) 
throws HoodieRollbackException {
 final String rollbackInstantTime = pendingRollbackInfo.map(entry -> 
entry.getRollbackInstant().getTimestamp())
-.orElse(createNewInstantTime(!skipLocking));
+.orElseGet(() -> createNewInstantTime(!skipLocking));
 return rollback(commitInstantTime, pendingRollbackInfo, 
rollbackInstantTime, skipLocking);

Review Comment:
   Good catch!



##
hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/StreamSync.java:
##
@@ -402,7 +402,9 @@ public Pair, JavaRDD> 
syncOnce() throws IOException
 .build();
 String instantTime = metaClient.createNewInstantTime();
 
+long startInput = System.currentTimeMillis();
 InputBatch inputBatch = readFromSource(instantTime, metaClient);
+LOG.error("Time to read from source : " + (System.currentTimeMillis() - 
startInput));

Review Comment:
   Use `HoodieTimer` to track execution time?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Fix usages of orElse [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10435:
URL: https://github.com/apache/hudi/pull/10435#issuecomment-1874763454

   
   ## CI report:
   
   * 402d1eb5d0ea586d3a4afbf736dbb843809f7bb4 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21796)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Fix usages of orElse [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10435:
URL: https://github.com/apache/hudi/pull/10435#issuecomment-1874674372

   
   ## CI report:
   
   * 072ac266eb2cf4b81d62cafda0c2579b88b43d5b Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21778)
 
   * 402d1eb5d0ea586d3a4afbf736dbb843809f7bb4 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21796)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Fix usages of orElse [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10435:
URL: https://github.com/apache/hudi/pull/10435#issuecomment-1874669320

   
   ## CI report:
   
   * 072ac266eb2cf4b81d62cafda0c2579b88b43d5b Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21778)
 
   * 402d1eb5d0ea586d3a4afbf736dbb843809f7bb4 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Fix usages of orElse [hudi]

2024-01-01 Thread via GitHub


the-other-tim-brown commented on code in PR #10435:
URL: https://github.com/apache/hudi/pull/10435#discussion_r1439115690


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java:
##
@@ -1016,7 +1016,7 @@ private List 
getInstantsToRollbackForLazyCleanPolicy(HoodieTableMetaClie
   @Deprecated
   public boolean rollback(final String commitInstantTime, 
Option pendingRollbackInfo, boolean skipLocking) 
throws HoodieRollbackException {
 final String rollbackInstantTime = pendingRollbackInfo.map(entry -> 
entry.getRollbackInstant().getTimestamp())
-.orElse(createNewInstantTime(!skipLocking));
+.orElseGet(() -> createNewInstantTime(!skipLocking));
 return rollback(commitInstantTime, pendingRollbackInfo, 
rollbackInstantTime, skipLocking);

Review Comment:
   In general, you do not want to execute methods or create objects you will 
not use. Therefore you can use `orElse` when returning a constant but otherwise 
you should avoid it. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Fix usages of orElse [hudi]

2024-01-01 Thread via GitHub


danny0405 commented on code in PR #10435:
URL: https://github.com/apache/hudi/pull/10435#discussion_r1439111567


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java:
##
@@ -1016,7 +1016,7 @@ private List 
getInstantsToRollbackForLazyCleanPolicy(HoodieTableMetaClie
   @Deprecated
   public boolean rollback(final String commitInstantTime, 
Option pendingRollbackInfo, boolean skipLocking) 
throws HoodieRollbackException {
 final String rollbackInstantTime = pendingRollbackInfo.map(entry -> 
entry.getRollbackInstant().getTimestamp())
-.orElse(createNewInstantTime(!skipLocking));
+.orElseGet(() -> createNewInstantTime(!skipLocking));
 return rollback(commitInstantTime, pendingRollbackInfo, 
rollbackInstantTime, skipLocking);

Review Comment:
   Okay, so `#orElseGet` is always preferrable than `#orElse`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Fix usages of orElse [hudi]

2024-01-01 Thread via GitHub


hudi-bot commented on PR #10435:
URL: https://github.com/apache/hudi/pull/10435#issuecomment-1873476307

   
   ## CI report:
   
   * 072ac266eb2cf4b81d62cafda0c2579b88b43d5b Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21778)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org