[ 
https://issues.apache.org/jira/browse/GOBBLIN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Varshney updated GOBBLIN-2026:
------------------------------------
    Description: 
Currently, while cleaning the log files, the Retention job goes into OOM and 
silently fails when the no of log files is too many. Workflow execution even 
after failure says Success.


{code:java}
21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO - 
java.lang.OutOfMemoryError: GC overhead limit exceeded
21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -     
at java.util.Arrays.copyOf(Arrays.java:3332)
21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -     
at 
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -     
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -     
at java.lang.StringBuffer.append(StringBuffer.java:270)
21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -     
at java.net.URI.appendSchemeSpecificPart(URI.java:1911)
21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -     
at java.net.URI.toString(URI.java:1941)
21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -     
at java.net.URI.<init>(URI.java:742)
21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -     
at org.apache.hadoop.fs.Path.makeQualified(Path.java:562)
21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -     
at 
org.apache.hadoop.hdfs.protocol.HdfsFileStatus.makeQualified(HdfsFileStatus.java:271)
21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -     
at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:997)
21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -     
at 
org.apache.hadoop.hdfs.DistributedFileSystem.access$1000(DistributedFileSystem.java:121)
21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -     
at 
org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1050)
21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -     
at 
org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1047)
21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -     
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -     
at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:1057)
21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -     
at 
org.apache.hadoop.fs.InstrumentedFileSystem.lambda$listStatus$17(InstrumentedFileSystem.java:379)
21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -     
at org.apache.hadoop.fs.InstrumentedFileSystem$$Lambda$69/231154485.get(Unknown 
Source)
21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -     
at 
com.linkedin.hadoop.metrics.fs.PerformanceTrackingFileSystem.process(PerformanceTrackingFileSystem.java:412)
21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -     
at 
org.apache.hadoop.fs.InstrumentedFileSystem.process(InstrumentedFileSystem.java:100)
21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -     
at 
org.apache.hadoop.fs.InstrumentedFileSystem.listStatus(InstrumentedFileSystem.java:379)
21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -     
at 
org.apache.hadoop.fs.PerformanceTrackingDistributedFileSystem.listStatus(PerformanceTrackingDistributedFileSystem.java:296)
21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -     
at org.apache.hadoop.fs.FilterFileSystem.listStatus(FilterFileSystem.java:258)
21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -     
at 
org.apache.hadoop.fs.viewfs.ChRootedFileSystem.listStatus(ChRootedFileSystem.java:253)
21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -     
at 
org.apache.hadoop.fs.viewfs.ViewFileSystem.listStatus(ViewFileSystem.java:528)
21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -     
at 
org.apache.hadoop.fs.GridFilesystem.lambda$listStatus$4(GridFilesystem.java:491)
21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -     
at org.apache.hadoop.fs.GridFilesystem$$Lambda$68/2109027988.doCall(Unknown 
Source) {code}

As the job silently fails, user doesn't get to know explicitly about it. Hence, 
when going into OOM, retention job should explicitly fail if it can't be 
proceeded further

  was:
Currently, while cleaning the log files, the Retention job goes into OOM and 
silently fails when the no of log files is too many. Workflow execution even 
after failure says Success.
As the job silently fails, user doesn't get to know explicitly about it. Hence, 
when going into OOM, retention job should explicitly fail if it can't be 
proceeded further


> Retention Job should fail on OOM
> --------------------------------
>
>                 Key: GOBBLIN-2026
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-2026
>             Project: Apache Gobblin
>          Issue Type: Improvement
>          Components: misc
>            Reporter: Arpit Varshney
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, while cleaning the log files, the Retention job goes into OOM and 
> silently fails when the no of log files is too many. Workflow execution even 
> after failure says Success.
> {code:java}
> 21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO - 
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> 21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -   
> at java.util.Arrays.copyOf(Arrays.java:3332)
> 21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -   
> at 
> java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
> 21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -   
> at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
> 21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -   
> at java.lang.StringBuffer.append(StringBuffer.java:270)
> 21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -   
> at java.net.URI.appendSchemeSpecificPart(URI.java:1911)
> 21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -   
> at java.net.URI.toString(URI.java:1941)
> 21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -   
> at java.net.URI.<init>(URI.java:742)
> 21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -   
> at org.apache.hadoop.fs.Path.makeQualified(Path.java:562)
> 21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -   
> at 
> org.apache.hadoop.hdfs.protocol.HdfsFileStatus.makeQualified(HdfsFileStatus.java:271)
> 21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -   
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:997)
> 21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -   
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$1000(DistributedFileSystem.java:121)
> 21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -   
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1050)
> 21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -   
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1047)
> 21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -   
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> 21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -   
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:1057)
> 21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -   
> at 
> org.apache.hadoop.fs.InstrumentedFileSystem.lambda$listStatus$17(InstrumentedFileSystem.java:379)
> 21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -   
> at 
> org.apache.hadoop.fs.InstrumentedFileSystem$$Lambda$69/231154485.get(Unknown 
> Source)
> 21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -   
> at 
> com.linkedin.hadoop.metrics.fs.PerformanceTrackingFileSystem.process(PerformanceTrackingFileSystem.java:412)
> 21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -   
> at 
> org.apache.hadoop.fs.InstrumentedFileSystem.process(InstrumentedFileSystem.java:100)
> 21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -   
> at 
> org.apache.hadoop.fs.InstrumentedFileSystem.listStatus(InstrumentedFileSystem.java:379)
> 21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -   
> at 
> org.apache.hadoop.fs.PerformanceTrackingDistributedFileSystem.listStatus(PerformanceTrackingDistributedFileSystem.java:296)
> 21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -   
> at org.apache.hadoop.fs.FilterFileSystem.listStatus(FilterFileSystem.java:258)
> 21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -   
> at 
> org.apache.hadoop.fs.viewfs.ChRootedFileSystem.listStatus(ChRootedFileSystem.java:253)
> 21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -   
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem.listStatus(ViewFileSystem.java:528)
> 21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -   
> at 
> org.apache.hadoop.fs.GridFilesystem.lambda$listStatus$4(GridFilesystem.java:491)
> 21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO -   
> at org.apache.hadoop.fs.GridFilesystem$$Lambda$68/2109027988.doCall(Unknown 
> Source) {code}
> As the job silently fails, user doesn't get to know explicitly about it. 
> Hence, when going into OOM, retention job should explicitly fail if it can't 
> be proceeded further



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to