[ 
https://issues.apache.org/jira/browse/GOBBLIN-1034?focusedWorklogId=379056&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-379056
 ]

ASF GitHub Bot logged work on GOBBLIN-1034:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 29/Jan/20 22:26
            Start Date: 29/Jan/20 22:26
    Worklog Time Spent: 10m 
      Work Description: htran1 commented on pull request #2876: GOBBLIN-1034: 
Ensure underlying writers are expired from the Partitio…
URL: https://github.com/apache/incubator-gobblin/pull/2876#discussion_r372665607
 
 

 ##########
 File path: 
gobblin-core/src/main/java/org/apache/gobblin/writer/PartitionedDataWriter.java
 ##########
 @@ -99,13 +118,32 @@ public PartitionedDataWriter(DataWriterBuilder<S, D> 
builder, final State state)
     if(builder.schema != null) {
       this.state.setProp(WRITER_LATEST_SCHEMA, builder.getSchema());
     }
-    this.partitionWriters = CacheBuilder.newBuilder().build(new 
CacheLoader<GenericRecord, DataWriter<D>>() {
+    Long cacheExpiryInterval = 
this.state.getPropAsLong(PARTITIONED_WRITER_CACHE_TTL_SECONDS, 
DEFAULT_PARTITIONED_WRITER_CACHE_TTL_SECONDS);
+
+    this.partitionWriters = CacheBuilder.newBuilder()
+        .expireAfterAccess(cacheExpiryInterval, TimeUnit.SECONDS)
+        .removalListener(new RemovalListener<GenericRecord, DataWriter<D>>() {
+      @Override
+      public void onRemoval(RemovalNotification<GenericRecord, DataWriter<D>> 
notification) {
+        synchronized (PartitionedDataWriter.this) {
+          if (notification.getValue() != null) {
+            try {
+              DataWriter<D> writer = notification.getValue();
+              totalRecordsFromEvictedWriters += writer.recordsWritten();
+              totalBytesFromEvictedWriters += writer.bytesWritten();
+              writer.close();
+            } catch (IOException e) {
+              log.error("Exception {} encountered when closing data writer on 
cache eviction", e);
 
 Review comment:
   The existing close that happens from the closer propagates errors. There are 
some writers like the `HiveWritableHdfsDataWriter` that finalizes files on 
close. If that fails then there may be an incomplete file and the task should 
fail instead of publish.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 379056)
    Time Spent: 50m  (was: 40m)

> Ensure underlying writers are expired from the PartitionedDataWriter cache to 
> avoid accumulation of writers for long running Gobblin jobs
> -----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GOBBLIN-1034
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1034
>             Project: Apache Gobblin
>          Issue Type: Improvement
>          Components: gobblin-core
>    Affects Versions: 0.15.0
>            Reporter: Sudarshan Vasudevan
>            Assignee: Abhishek Tiwari
>            Priority: Major
>             Fix For: 0.15.0
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently, the underlying writers are never evicted from the 
> PartitionedDataWriter cache. For long running Gobblin jobs (e.g. streaming), 
> this will cause a memory leak particularly if the underlying writers maintain 
> state. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to