[GitHub] [incubator-hudi] cdmikechen commented on a change in pull request #1119: Fix: HoodieCommitMetadata only show first commit insert rows.

2019-12-25 Thread GitBox
cdmikechen commented on a change in pull request #1119: Fix: 
HoodieCommitMetadata only show first commit insert rows.
URL: https://github.com/apache/incubator-hudi/pull/1119#discussion_r361278242
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
 ##
 @@ -175,7 +175,9 @@ public long fetchTotalInsertRecordsWritten() {
 long totalInsertRecordsWritten = 0;
 for (List stats : partitionToWriteStats.values()) {
   for (HoodieWriteStat stat : stats) {
-if (stat.getPrevCommit() != null && 
stat.getPrevCommit().equalsIgnoreCase("null")) {
 
 Review comment:
   @n3nash  Yes, that's almost what it means! 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] cdmikechen commented on a change in pull request #1119: Fix: HoodieCommitMetadata only show first commit insert rows.

2019-12-23 Thread GitBox
cdmikechen commented on a change in pull request #1119: Fix: 
HoodieCommitMetadata only show first commit insert rows.
URL: https://github.com/apache/incubator-hudi/pull/1119#discussion_r361078012
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
 ##
 @@ -175,7 +175,9 @@ public long fetchTotalInsertRecordsWritten() {
 long totalInsertRecordsWritten = 0;
 for (List stats : partitionToWriteStats.values()) {
   for (HoodieWriteStat stat : stats) {
-if (stat.getPrevCommit() != null && 
stat.getPrevCommit().equalsIgnoreCase("null")) {
 
 Review comment:
   @n3nash 
   It may be in 
https://github.com/apache/incubator-hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/io/HoodieCreateHandle.java#L165
   ```java
 HoodieWriteStat stat = new HoodieWriteStat();
 stat.setPartitionPath(writeStatus.getPartitionPath());
 stat.setNumWrites(recordsWritten);
 stat.setNumDeletes(recordsDeleted);
 stat.setNumInserts(insertRecordsWritten);
 stat.setPrevCommit(HoodieWriteStat.NULL_COMMIT);
 stat.setFileId(writeStatus.getFileId());
 stat.setPath(new Path(config.getBasePath()), path);
 long fileSizeInBytes = FSUtils.getFileSize(fs, path);
 stat.setTotalWriteBytes(fileSizeInBytes);
 stat.setFileSizeInBytes(fileSizeInBytes);
 stat.setTotalWriteErrors(writeStatus.getTotalErrorRecords());
 RuntimeStats runtimeStats = new RuntimeStats();
 runtimeStats.setTotalCreateTime(timer.endTimer());
 stat.setRuntimeStats(runtimeStats);
 writeStatus.setStat(stat);
   ```
   In `org.apache.hudi.common.model.HoodieWriteStat` it is a "null", in other 
cases prevCommit will be set a real commit time.
   ```java
   public static final String NULL_COMMIT = "null";
   ```
   So that `fetchTotalFilesInsert` can recognize first commit file and pass the 
condition, meanwhile `fetchTotalInsertRecordsWritten` can not.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] cdmikechen commented on a change in pull request #1119: Fix: HoodieCommitMetadata only show first commit insert rows.

2019-12-23 Thread GitBox
cdmikechen commented on a change in pull request #1119: Fix: 
HoodieCommitMetadata only show first commit insert rows.
URL: https://github.com/apache/incubator-hudi/pull/1119#discussion_r361078012
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
 ##
 @@ -175,7 +175,9 @@ public long fetchTotalInsertRecordsWritten() {
 long totalInsertRecordsWritten = 0;
 for (List stats : partitionToWriteStats.values()) {
   for (HoodieWriteStat stat : stats) {
-if (stat.getPrevCommit() != null && 
stat.getPrevCommit().equalsIgnoreCase("null")) {
 
 Review comment:
   @n3nash 
   It may be in 
https://github.com/apache/incubator-hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/io/HoodieCreateHandle.java#L165
   ```java
 HoodieWriteStat stat = new HoodieWriteStat();
 stat.setPartitionPath(writeStatus.getPartitionPath());
 stat.setNumWrites(recordsWritten);
 stat.setNumDeletes(recordsDeleted);
 stat.setNumInserts(insertRecordsWritten);
 stat.setPrevCommit(HoodieWriteStat.NULL_COMMIT);
 stat.setFileId(writeStatus.getFileId());
 stat.setPath(new Path(config.getBasePath()), path);
 long fileSizeInBytes = FSUtils.getFileSize(fs, path);
 stat.setTotalWriteBytes(fileSizeInBytes);
 stat.setFileSizeInBytes(fileSizeInBytes);
 stat.setTotalWriteErrors(writeStatus.getTotalErrorRecords());
 RuntimeStats runtimeStats = new RuntimeStats();
 runtimeStats.setTotalCreateTime(timer.endTimer());
 stat.setRuntimeStats(runtimeStats);
 writeStatus.setStat(stat);
   ```
   In `org.apache.hudi.common.model.HoodieWriteStat` it is a "null", in other 
cases prevCommit will be set a real commit time.
   ```java
   public static final String NULL_COMMIT = "null";
   ```
   So that `fetchTotalFilesInsert` can recognize first commit file and pass the 
condition.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] cdmikechen commented on a change in pull request #1119: Fix: HoodieCommitMetadata only show first commit insert rows.

2019-12-23 Thread GitBox
cdmikechen commented on a change in pull request #1119: Fix: 
HoodieCommitMetadata only show first commit insert rows.
URL: https://github.com/apache/incubator-hudi/pull/1119#discussion_r361074375
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
 ##
 @@ -175,7 +175,9 @@ public long fetchTotalInsertRecordsWritten() {
 long totalInsertRecordsWritten = 0;
 for (List stats : partitionToWriteStats.values()) {
   for (HoodieWriteStat stat : stats) {
-if (stat.getPrevCommit() != null && 
stat.getPrevCommit().equalsIgnoreCase("null")) {
 
 Review comment:
   @n3nash
   sorry I may misunderstand your mind, I'll take a look at what I've tested 
later.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] cdmikechen commented on a change in pull request #1119: Fix: HoodieCommitMetadata only show first commit insert rows.

2019-12-23 Thread GitBox
cdmikechen commented on a change in pull request #1119: Fix: 
HoodieCommitMetadata only show first commit insert rows.
URL: https://github.com/apache/incubator-hudi/pull/1119#discussion_r361044029
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
 ##
 @@ -175,7 +175,9 @@ public long fetchTotalInsertRecordsWritten() {
 long totalInsertRecordsWritten = 0;
 for (List stats : partitionToWriteStats.values()) {
   for (HoodieWriteStat stat : stats) {
-if (stat.getPrevCommit() != null && 
stat.getPrevCommit().equalsIgnoreCase("null")) {
 
 Review comment:
   @n3nash
   I think the processing logic of this method is the same as that of 
`fetchTotalUpdateRecordsWritten `
   ```
 public long fetchTotalUpdateRecordsWritten() {
   long totalUpdateRecordsWritten = 0;
   for (List stats : partitionToWriteStats.values()) {
 for (HoodieWriteStat stat : stats) {
   totalUpdateRecordsWritten += stat.getNumUpdateWrites();
 }
   }
   return totalUpdateRecordsWritten;
 }
   ```
   Let me give you an example in my test. I create a hudi data with 3 rows 
first, and then insert a row, at last insert a row and delete a row. I print 
all method return number and test the number to check.
   ```java
   for (HoodieInstant commit : commits) {
   HoodieCommitMetadata commitMetadata = 
HoodieCommitMetadata.fromBytes(timeline.getInstantDetails(commit).get(),
   HoodieCommitMetadata.class);
   
   JSONObject data = new JSONObject();
   data.put("timestamp", commit.getTimestamp());
   data.put("fetchTotalBytesWritten", 
commitMetadata.fetchTotalBytesWritten());
   data.put("fetchTotalFilesInsert", 
commitMetadata.fetchTotalFilesInsert());
   data.put("fetchTotalFilesUpdated", 
commitMetadata.fetchTotalFilesUpdated());
   data.put("fetchTotalPartitionsWritten", 
commitMetadata.fetchTotalPartitionsWritten());
   data.put("fetchTotalRecordsWritten", 
commitMetadata.fetchTotalRecordsWritten());
   data.put("fetchTotalUpdateRecordsWritten", 
commitMetadata.fetchTotalUpdateRecordsWritten());
   data.put("fetchTotalInsertRecordsWritten", 
commitMetadata.fetchTotalInsertRecordsWritten());
   data.put("fetchTotalWriteErrors", 
commitMetadata.fetchTotalWriteErrors());
   
   long totalDeleteRecordsWritten= 0;
   for (List stats : 
commitMetadata.getPartitionToWriteStats().values()) {
   for (HoodieWriteStat stat : stats) {
   if (stat.getPrevCommit() != null) {
   totalDeleteRecordsWritten += stat.getNumDeletes();
   }
   }
   }
   data.put("fetchTotalDeleteRecordsWritten", totalDeleteRecordsWritten);
   
   datas.add(data);
   }
   log.debug("datas = {}", datas);
   ```
   If I use old if `(stat.getPrevCommit() != null && 
stat.getPrevCommit().equalsIgnoreCase("null"))` it will print:
   ```json
   [
{
"fetchTotalBytesWritten": 435258,
"fetchTotalDeleteRecordsWritten": 1,
"fetchTotalFilesInsert": 0,
"fetchTotalFilesUpdated": 1,
"fetchTotalInsertRecordsWritten": 0,
"fetchTotalPartitionsWritten": 1,
"fetchTotalRecordsWritten": 4,
"fetchTotalUpdateRecordsWritten": 0,
"fetchTotalWriteErrors": 0,
"timestamp": "20190923172508"
},
{
"fetchTotalBytesWritten": 435227,
"fetchTotalDeleteRecordsWritten": 0,
"fetchTotalFilesInsert": 0,
"fetchTotalFilesUpdated": 1,
"fetchTotalInsertRecordsWritten": 0,
"fetchTotalPartitionsWritten": 1,
"fetchTotalRecordsWritten": 4,
"fetchTotalUpdateRecordsWritten": 0,
"fetchTotalWriteErrors": 0,
"timestamp": "20190923160439"
},
{
"fetchTotalBytesWritten": 435478,
"fetchTotalDeleteRecordsWritten": 0,
"fetchTotalFilesInsert": 1,
"fetchTotalFilesUpdated": 0,
"fetchTotalInsertRecordsWritten": 3,
"fetchTotalPartitionsWritten": 1,
"fetchTotalRecordsWritten": 3,
"fetchTotalUpdateRecordsWritten": 0,
"fetchTotalWriteErrors": 0,
"timestamp": "20190923160340"
}
   ]
   ```
   If I use modified if `(stat.getPrevCommit() != null)` it will print:
   ```json
   [
{
"fetchTotalBytesWritten": 435258,
"fetchTotalDeleteRecordsWritten": 1,
"fetchTotalFilesInsert": 0,
"fetchTotalFilesUpdated": 1,
"fetchTotalInsertRecordsWritten": 1,
"fetchTotalPartitionsWritten": 1,
"fetchTotalRecordsWritten": 4,
"fetchTotalUpdateRecordsWritten": 0,
 

[GitHub] [incubator-hudi] cdmikechen commented on a change in pull request #1119: Fix: HoodieCommitMetadata only show first commit insert rows.

2019-12-23 Thread GitBox
cdmikechen commented on a change in pull request #1119: Fix: 
HoodieCommitMetadata only show first commit insert rows.
URL: https://github.com/apache/incubator-hudi/pull/1119#discussion_r361044029
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
 ##
 @@ -175,7 +175,9 @@ public long fetchTotalInsertRecordsWritten() {
 long totalInsertRecordsWritten = 0;
 for (List stats : partitionToWriteStats.values()) {
   for (HoodieWriteStat stat : stats) {
-if (stat.getPrevCommit() != null && 
stat.getPrevCommit().equalsIgnoreCase("null")) {
 
 Review comment:
   @n3nash
   I think the processing logic of this method is the same as that of 
`fetchTotalFilesUpdated`
   ```
 public long fetchTotalUpdateRecordsWritten() {
   long totalUpdateRecordsWritten = 0;
   for (List stats : partitionToWriteStats.values()) {
 for (HoodieWriteStat stat : stats) {
   totalUpdateRecordsWritten += stat.getNumUpdateWrites();
 }
   }
   return totalUpdateRecordsWritten;
 }
   ```
   Let me give you an example in my test. I create a hudi data with 3 rows 
first, and then insert a row, at last insert a row and delete a row. I print 
all method return number and test the number to check.
   ```java
   for (HoodieInstant commit : commits) {
   HoodieCommitMetadata commitMetadata = 
HoodieCommitMetadata.fromBytes(timeline.getInstantDetails(commit).get(),
   HoodieCommitMetadata.class);
   
   JSONObject data = new JSONObject();
   data.put("timestamp", commit.getTimestamp());
   data.put("fetchTotalBytesWritten", 
commitMetadata.fetchTotalBytesWritten());
   data.put("fetchTotalFilesInsert", 
commitMetadata.fetchTotalFilesInsert());
   data.put("fetchTotalFilesUpdated", 
commitMetadata.fetchTotalFilesUpdated());
   data.put("fetchTotalPartitionsWritten", 
commitMetadata.fetchTotalPartitionsWritten());
   data.put("fetchTotalRecordsWritten", 
commitMetadata.fetchTotalRecordsWritten());
   data.put("fetchTotalUpdateRecordsWritten", 
commitMetadata.fetchTotalUpdateRecordsWritten());
   data.put("fetchTotalInsertRecordsWritten", 
commitMetadata.fetchTotalInsertRecordsWritten());
   data.put("fetchTotalWriteErrors", 
commitMetadata.fetchTotalWriteErrors());
   
   long totalDeleteRecordsWritten= 0;
   for (List stats : 
commitMetadata.getPartitionToWriteStats().values()) {
   for (HoodieWriteStat stat : stats) {
   if (stat.getPrevCommit() != null) {
   totalDeleteRecordsWritten += stat.getNumDeletes();
   }
   }
   }
   data.put("fetchTotalDeleteRecordsWritten", totalDeleteRecordsWritten);
   
   datas.add(data);
   }
   log.debug("datas = {}", datas);
   ```
   If I use old if `(stat.getPrevCommit() != null && 
stat.getPrevCommit().equalsIgnoreCase("null"))` it will print:
   ```json
   [
{
"fetchTotalBytesWritten": 435258,
"fetchTotalDeleteRecordsWritten": 1,
"fetchTotalFilesInsert": 0,
"fetchTotalFilesUpdated": 1,
"fetchTotalInsertRecordsWritten": 0,
"fetchTotalPartitionsWritten": 1,
"fetchTotalRecordsWritten": 4,
"fetchTotalUpdateRecordsWritten": 0,
"fetchTotalWriteErrors": 0,
"timestamp": "20190923172508"
},
{
"fetchTotalBytesWritten": 435227,
"fetchTotalDeleteRecordsWritten": 0,
"fetchTotalFilesInsert": 0,
"fetchTotalFilesUpdated": 1,
"fetchTotalInsertRecordsWritten": 0,
"fetchTotalPartitionsWritten": 1,
"fetchTotalRecordsWritten": 4,
"fetchTotalUpdateRecordsWritten": 0,
"fetchTotalWriteErrors": 0,
"timestamp": "20190923160439"
},
{
"fetchTotalBytesWritten": 435478,
"fetchTotalDeleteRecordsWritten": 0,
"fetchTotalFilesInsert": 1,
"fetchTotalFilesUpdated": 0,
"fetchTotalInsertRecordsWritten": 3,
"fetchTotalPartitionsWritten": 1,
"fetchTotalRecordsWritten": 3,
"fetchTotalUpdateRecordsWritten": 0,
"fetchTotalWriteErrors": 0,
"timestamp": "20190923160340"
}
   ]
   ```
   If I use modified if `(stat.getPrevCommit() != null)` it will print:
   ```json
   [
{
"fetchTotalBytesWritten": 435258,
"fetchTotalDeleteRecordsWritten": 1,
"fetchTotalFilesInsert": 0,
"fetchTotalFilesUpdated": 1,
"fetchTotalInsertRecordsWritten": 1,
"fetchTotalPartitionsWritten": 1,
"fetchTotalRecordsWritten": 4,
"fetchTotalUpdateRecordsWritten": 0,
  

[GitHub] [incubator-hudi] cdmikechen commented on a change in pull request #1119: Fix: HoodieCommitMetadata only show first commit insert rows.

2019-12-23 Thread GitBox
cdmikechen commented on a change in pull request #1119: Fix: 
HoodieCommitMetadata only show first commit insert rows.
URL: https://github.com/apache/incubator-hudi/pull/1119#discussion_r361044029
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
 ##
 @@ -175,7 +175,9 @@ public long fetchTotalInsertRecordsWritten() {
 long totalInsertRecordsWritten = 0;
 for (List stats : partitionToWriteStats.values()) {
   for (HoodieWriteStat stat : stats) {
-if (stat.getPrevCommit() != null && 
stat.getPrevCommit().equalsIgnoreCase("null")) {
 
 Review comment:
   @n3nash
   I think the processing logic of this method is the same as that of 
`fetchTotalFilesUpdated`
   ```
 public long fetchTotalUpdateRecordsWritten() {
   long totalUpdateRecordsWritten = 0;
   for (List stats : partitionToWriteStats.values()) {
 for (HoodieWriteStat stat : stats) {
   totalUpdateRecordsWritten += stat.getNumUpdateWrites();
 }
   }
   return totalUpdateRecordsWritten;
 }
   ```
   Let me give you an example in my test. I create a hudi data with 3 rows 
first, and then insert a row, at last insert a row and delete a row. I print 
all method return number and test the number to check.
   ```java
   for (HoodieInstant commit : commits) {
   HoodieCommitMetadata commitMetadata = 
HoodieCommitMetadata.fromBytes(timeline.getInstantDetails(commit).get(),
   HoodieCommitMetadata.class);
   
   JSONObject data = new JSONObject();
   data.put("timestamp", commit.getTimestamp());
   data.put("fetchTotalBytesWritten", 
commitMetadata.fetchTotalBytesWritten());
   data.put("fetchTotalFilesInsert", 
commitMetadata.fetchTotalFilesInsert());
   data.put("fetchTotalFilesUpdated", 
commitMetadata.fetchTotalFilesUpdated());
   data.put("fetchTotalPartitionsWritten", 
commitMetadata.fetchTotalPartitionsWritten());
   data.put("fetchTotalRecordsWritten", 
commitMetadata.fetchTotalRecordsWritten());
   data.put("fetchTotalUpdateRecordsWritten", 
commitMetadata.fetchTotalUpdateRecordsWritten());
   data.put("fetchTotalInsertRecordsWritten", 
commitMetadata.fetchTotalInsertRecordsWritten());
   data.put("fetchTotalWriteErrors", 
commitMetadata.fetchTotalWriteErrors());
   
   long totalDeleteRecordsWritten= 0;
   for (List stats : 
commitMetadata.getPartitionToWriteStats().values()) {
   for (HoodieWriteStat stat : stats) {
   if (stat.getPrevCommit() != null) {
   totalDeleteRecordsWritten += stat.getNumDeletes();
   }
   }
   }
   data.put("fetchTotalDeleteRecordsWritten", totalDeleteRecordsWritten);
   
   datas.add(data);
   }
   log.debug("datas = {}", datas);
   ```
   If I user old if `(stat.getPrevCommit() != null && 
stat.getPrevCommit().equalsIgnoreCase("null"))` it will print:
   ```json
   [
{
"fetchTotalBytesWritten": 435258,
"fetchTotalDeleteRecordsWritten": 1,
"fetchTotalFilesInsert": 0,
"fetchTotalFilesUpdated": 1,
"fetchTotalInsertRecordsWritten": 0,
"fetchTotalPartitionsWritten": 1,
"fetchTotalRecordsWritten": 4,
"fetchTotalUpdateRecordsWritten": 0,
"fetchTotalWriteErrors": 0,
"timestamp": "20190923172508"
},
{
"fetchTotalBytesWritten": 435227,
"fetchTotalDeleteRecordsWritten": 0,
"fetchTotalFilesInsert": 0,
"fetchTotalFilesUpdated": 1,
"fetchTotalInsertRecordsWritten": 0,
"fetchTotalPartitionsWritten": 1,
"fetchTotalRecordsWritten": 4,
"fetchTotalUpdateRecordsWritten": 0,
"fetchTotalWriteErrors": 0,
"timestamp": "20190923160439"
},
{
"fetchTotalBytesWritten": 435478,
"fetchTotalDeleteRecordsWritten": 0,
"fetchTotalFilesInsert": 1,
"fetchTotalFilesUpdated": 0,
"fetchTotalInsertRecordsWritten": 3,
"fetchTotalPartitionsWritten": 1,
"fetchTotalRecordsWritten": 3,
"fetchTotalUpdateRecordsWritten": 0,
"fetchTotalWriteErrors": 0,
"timestamp": "20190923160340"
}
   ]
   ```
   If I user modified if `(stat.getPrevCommit() != null)` it will print:
   ```json
   [
{
"fetchTotalBytesWritten": 435258,
"fetchTotalDeleteRecordsWritten": 1,
"fetchTotalFilesInsert": 0,
"fetchTotalFilesUpdated": 1,
"fetchTotalInsertRecordsWritten": 1,
"fetchTotalPartitionsWritten": 1,
"fetchTotalRecordsWritten": 4,
"fetchTotalUpdateRecordsWritten": 0,