htran1 commented on a change in pull request #2876: GOBBLIN-1034: Ensure 
underlying writers are expired from the Partitio…
URL: https://github.com/apache/incubator-gobblin/pull/2876#discussion_r372665607
 
 

 ##########
 File path: 
gobblin-core/src/main/java/org/apache/gobblin/writer/PartitionedDataWriter.java
 ##########
 @@ -99,13 +118,32 @@ public PartitionedDataWriter(DataWriterBuilder<S, D> 
builder, final State state)
     if(builder.schema != null) {
       this.state.setProp(WRITER_LATEST_SCHEMA, builder.getSchema());
     }
-    this.partitionWriters = CacheBuilder.newBuilder().build(new 
CacheLoader<GenericRecord, DataWriter<D>>() {
+    Long cacheExpiryInterval = 
this.state.getPropAsLong(PARTITIONED_WRITER_CACHE_TTL_SECONDS, 
DEFAULT_PARTITIONED_WRITER_CACHE_TTL_SECONDS);
+
+    this.partitionWriters = CacheBuilder.newBuilder()
+        .expireAfterAccess(cacheExpiryInterval, TimeUnit.SECONDS)
+        .removalListener(new RemovalListener<GenericRecord, DataWriter<D>>() {
+      @Override
+      public void onRemoval(RemovalNotification<GenericRecord, DataWriter<D>> 
notification) {
+        synchronized (PartitionedDataWriter.this) {
+          if (notification.getValue() != null) {
+            try {
+              DataWriter<D> writer = notification.getValue();
+              totalRecordsFromEvictedWriters += writer.recordsWritten();
+              totalBytesFromEvictedWriters += writer.bytesWritten();
+              writer.close();
+            } catch (IOException e) {
+              log.error("Exception {} encountered when closing data writer on 
cache eviction", e);
 
 Review comment:
   The existing close that happens from the closer propagates errors. There are 
some writers like the `HiveWritableHdfsDataWriter` that finalizes files on 
close. If that fails then there may be an incomplete file and the task should 
fail instead of publish.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to