[ 
https://issues.apache.org/jira/browse/GOBBLIN-1413?focusedWorklogId=573831&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-573831
 ]

ASF GitHub Bot logged work on GOBBLIN-1413:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 30/Mar/21 01:31
            Start Date: 30/Mar/21 01:31
    Worklog Time Spent: 10m 
      Work Description: hanghangliu commented on a change in pull request #3252:
URL: https://github.com/apache/gobblin/pull/3252#discussion_r603718830



##########
File path: 
gobblin-iceberg/src/main/java/org/apache/gobblin/iceberg/publisher/GobblinMCEPublisher.java
##########
@@ -132,6 +136,36 @@ public void publishData(Collection<? extends 
WorkUnitState> states) throws IOExc
     return newFiles;
   }
 
+  /**
+   * Choose one file from the work unit state. There will be no modification 
to the file.
+   * It's used in GMCE writer {@link GobblinMCEWriter} merely for getting the 
DB and table name.
+   * @throws IOException
+   */
+  private Map<Path, Metrics> computeDummyFile (State state) throws IOException 
{

Review comment:
       Used a priority queue to chose file based on max modification_time to 
ensure the dummy file is the latest available file.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 573831)
    Time Spent: 0.5h  (was: 20m)

> Emit GMCE as long as watermark moved
> ------------------------------------
>
>                 Key: GOBBLIN-1413
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1413
>             Project: Apache Gobblin
>          Issue Type: Bug
>            Reporter: Hanghang Liu
>            Priority: Critical
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Emit GMCE(Gobblin Metadata Change Event) as long as watermark moved on 
> streaming pipeline. 
> Currently the GMCE won't be triggered within streaming pipeline if no new 
> file being generated. This causes problem if watermarks moved, while no file 
> being generated(for example, data been filtered out by quality checker), GMCE 
> will be missed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to