htran1 commented on a change in pull request #2876: GOBBLIN-1034: Ensure underlying writers are expired from the Partitio… URL: https://github.com/apache/incubator-gobblin/pull/2876#discussion_r372665607
########## File path: gobblin-core/src/main/java/org/apache/gobblin/writer/PartitionedDataWriter.java ########## @@ -99,13 +118,32 @@ public PartitionedDataWriter(DataWriterBuilder<S, D> builder, final State state) if(builder.schema != null) { this.state.setProp(WRITER_LATEST_SCHEMA, builder.getSchema()); } - this.partitionWriters = CacheBuilder.newBuilder().build(new CacheLoader<GenericRecord, DataWriter<D>>() { + Long cacheExpiryInterval = this.state.getPropAsLong(PARTITIONED_WRITER_CACHE_TTL_SECONDS, DEFAULT_PARTITIONED_WRITER_CACHE_TTL_SECONDS); + + this.partitionWriters = CacheBuilder.newBuilder() + .expireAfterAccess(cacheExpiryInterval, TimeUnit.SECONDS) + .removalListener(new RemovalListener<GenericRecord, DataWriter<D>>() { + @Override + public void onRemoval(RemovalNotification<GenericRecord, DataWriter<D>> notification) { + synchronized (PartitionedDataWriter.this) { + if (notification.getValue() != null) { + try { + DataWriter<D> writer = notification.getValue(); + totalRecordsFromEvictedWriters += writer.recordsWritten(); + totalBytesFromEvictedWriters += writer.bytesWritten(); + writer.close(); + } catch (IOException e) { + log.error("Exception {} encountered when closing data writer on cache eviction", e); Review comment: The existing close that happens from the closer propagates errors. There are some writers like the `HiveWritableHdfsDataWriter` that finalizes files on close. If that fails then there may be an incomplete file and the task should fail instead of publish. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services