[jira] [Commented] (FLINK-15447) To improve utilization of the `java.io.tmpdir` for YARN module
[ https://issues.apache.org/jira/browse/FLINK-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063064#comment-17063064 ] Victor Wong commented on FLINK-15447: - Currently, we can solve this issue through "env.java.opts: -Djava.io.tmpdir=./tmp", closing this issue now. > To improve utilization of the `java.io.tmpdir` for YARN module > -- > > Key: FLINK-15447 > URL: https://issues.apache.org/jira/browse/FLINK-15447 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN >Affects Versions: 1.9.1 >Reporter: Victor Wong >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > *#Background* > Currently, when running Flink on Yarn, the "java.io.tmpdir" property is set > to the default value, which is "/tmp". > Sometimes we ran into exceptions caused by a full "/tmp" directory, which > would not be cleaned automatically after applications finished. > > #*Goal* > quoted from: [HADOOP-2735|https://issues.apache.org/jira/browse/HADOOP-2735] > _1) Tasks can utilize all disks when using tmp_ > _2) Any undeleted tmp files will be deleted by the tasktracker when > task(job?) is done._ > > #*Suggestion* > I think we can set "java.io.tmpdir" to "PWD/tmp" directory, or > something similar. "PWD" will be replaced with the true working > directory of JM/TM by Yarn, which will be cleaned automatically. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-15447) To improve utilization of the `java.io.tmpdir` for YARN module
[ https://issues.apache.org/jira/browse/FLINK-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053156#comment-17053156 ] Victor Wong commented on FLINK-15447: - Hi, [~trohrmann], since this issue is still valid against the current master branch, I came up with a PR to demonstrate my intended change, which was implemented mainly based on previous discussions. Please give me some advice if available. > To improve utilization of the `java.io.tmpdir` for YARN module > -- > > Key: FLINK-15447 > URL: https://issues.apache.org/jira/browse/FLINK-15447 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN >Affects Versions: 1.9.1 >Reporter: Victor Wong >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > *#Background* > Currently, when running Flink on Yarn, the "java.io.tmpdir" property is set > to the default value, which is "/tmp". > Sometimes we ran into exceptions caused by a full "/tmp" directory, which > would not be cleaned automatically after applications finished. > > #*Goal* > quoted from: [HADOOP-2735|https://issues.apache.org/jira/browse/HADOOP-2735] > _1) Tasks can utilize all disks when using tmp_ > _2) Any undeleted tmp files will be deleted by the tasktracker when > task(job?) is done._ > > #*Suggestion* > I think we can set "java.io.tmpdir" to "PWD/tmp" directory, or > something similar. "PWD" will be replaced with the true working > directory of JM/TM by Yarn, which will be cleaned automatically. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-15447) To improve utilization of the `java.io.tmpdir` for YARN module
[ https://issues.apache.org/jira/browse/FLINK-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17028644#comment-17028644 ] Victor Wong commented on FLINK-15447: - [~trohrmann], thanks for your attention. _I am wondering whether you would like to configure the system property {{java.io.tmpdir}} to point towards {{./tmp}} or to only change Flink's temp directories._ _---_ The former, configure the system property java.io.tmpdir. _If not, then we would need to adapt the java command which starts the Flink processes._ _---_ I think this is the best choice, which has _the benefit that libraries, relying on {{java.io.tmpdir}}, will not write their temporary data to {{/tmp}}, too._ > To improve utilization of the `java.io.tmpdir` for YARN module > -- > > Key: FLINK-15447 > URL: https://issues.apache.org/jira/browse/FLINK-15447 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN >Affects Versions: 1.9.1 >Reporter: Victor Wong >Priority: Major > > *#Background* > Currently, when running Flink on Yarn, the "java.io.tmpdir" property is set > to the default value, which is "/tmp". > Sometimes we ran into exceptions caused by a full "/tmp" directory, which > would not be cleaned automatically after applications finished. > > #*Goal* > quoted from: [HADOOP-2735|https://issues.apache.org/jira/browse/HADOOP-2735] > _1) Tasks can utilize all disks when using tmp_ > _2) Any undeleted tmp files will be deleted by the tasktracker when > task(job?) is done._ > > #*Suggestion* > I think we can set "java.io.tmpdir" to "PWD/tmp" directory, or > something similar. "PWD" will be replaced with the true working > directory of JM/TM by Yarn, which will be cleaned automatically. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-15447) To improve utilization of the `java.io.tmpdir` for YARN module
[ https://issues.apache.org/jira/browse/FLINK-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023081#comment-17023081 ] Till Rohrmann commented on FLINK-15447: --- Thanks for starting this discussion [~victor-wong]. I am wondering whether you would like to configure the system property {{java.io.tmpdir}} to point towards {{./tmp}} or to only change Flink's temp directories. At the moment, we configure Flink's tmp directories {{TMP_DIRS}} to point to {{ApplicationConstants.Environment.LOCAL_DIRS}} on the master but not on the {{TaskExecutor}}. See FLINK-8350 and FLINK-9762 for more information. If the latter approach would be good enough, then one could set up the {{TaskExecutor}} process with {{TMP_DIRS}} pointing towards {{ApplicationConstants.Environment.LOCAL_DIRS}} as well. I think it has been an oversight that this is not done symmetrically atm. If not, then we would need to adapt the java command which starts the Flink processes. The former approach would also have the benefit that libraries, relying on {{java.io.tmpdir}}, will not write their temporary data to {{/tmp}}, too. > To improve utilization of the `java.io.tmpdir` for YARN module > -- > > Key: FLINK-15447 > URL: https://issues.apache.org/jira/browse/FLINK-15447 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN >Affects Versions: 1.9.1 >Reporter: Victor Wong >Priority: Major > > *#Background* > Currently, when running Flink on Yarn, the "java.io.tmpdir" property is set > to the default value, which is "/tmp". > Sometimes we ran into exceptions caused by a full "/tmp" directory, which > would not be cleaned automatically after applications finished. > > #*Goal* > quoted from: [HADOOP-2735|https://issues.apache.org/jira/browse/HADOOP-2735] > _1) Tasks can utilize all disks when using tmp_ > _2) Any undeleted tmp files will be deleted by the tasktracker when > task(job?) is done._ > > #*Suggestion* > I think we can set "java.io.tmpdir" to "PWD/tmp" directory, or > something similar. "PWD" will be replaced with the true working > directory of JM/TM by Yarn, which will be cleaned automatically. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-15447) To improve utilization of the `java.io.tmpdir` for YARN module
[ https://issues.apache.org/jira/browse/FLINK-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17021862#comment-17021862 ] Victor Wong commented on FLINK-15447: - [~fly_in_gis], it makes sense to make "java.io.tmpdir" configurable, we could add a new YarnOption configuration to achieve this. If this issue would be assigned to me, you could help me to review my PR. > To improve utilization of the `java.io.tmpdir` for YARN module > -- > > Key: FLINK-15447 > URL: https://issues.apache.org/jira/browse/FLINK-15447 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN >Affects Versions: 1.9.1 >Reporter: Victor Wong >Priority: Major > > *#Background* > Currently, when running Flink on Yarn, the "java.io.tmpdir" property is set > to the default value, which is "/tmp". > Sometimes we ran into exceptions caused by a full "/tmp" directory, which > would not be cleaned automatically after applications finished. > > #*Goal* > quoted from: [HADOOP-2735|https://issues.apache.org/jira/browse/HADOOP-2735] > _1) Tasks can utilize all disks when using tmp_ > _2) Any undeleted tmp files will be deleted by the tasktracker when > task(job?) is done._ > > #*Suggestion* > I think we can set "java.io.tmpdir" to "PWD/tmp" directory, or > something similar. "PWD" will be replaced with the true working > directory of JM/TM by Yarn, which will be cleaned automatically. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-15447) To improve utilization of the `java.io.tmpdir` for YARN module
[ https://issues.apache.org/jira/browse/FLINK-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17021858#comment-17021858 ] Victor Wong commented on FLINK-15447: - [~rongr] updated this issue based on your suggestion. Could you assign it to me? Thanks! > To improve utilization of the `java.io.tmpdir` for YARN module > -- > > Key: FLINK-15447 > URL: https://issues.apache.org/jira/browse/FLINK-15447 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN >Affects Versions: 1.9.1 >Reporter: Victor Wong >Priority: Major > > *#Background* > Currently, when running Flink on Yarn, the "java.io.tmpdir" property is set > to the default value, which is "/tmp". > Sometimes we ran into exceptions caused by a full "/tmp" directory, which > would not be cleaned automatically after applications finished. > > #*Goal* > quoted from: [HADOOP-2735|https://issues.apache.org/jira/browse/HADOOP-2735] > _1) Tasks can utilize all disks when using tmp_ > _2) Any undeleted tmp files will be deleted by the tasktracker when > task(job?) is done._ > > #*Suggestion* > I think we can set "java.io.tmpdir" to "PWD/tmp" directory, or > something similar. "PWD" will be replaced with the true working > directory of JM/TM by Yarn, which will be cleaned automatically. > -- This message was sent by Atlassian Jira (v8.3.4#803005)