[jira] [Commented] (FLINK-15447) To improve utilization of the `java.io.tmpdir` for YARN module

2020-03-19 Thread Victor Wong (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063064#comment-17063064
 ] 

Victor Wong commented on FLINK-15447:
-

Currently, we can solve this issue through "env.java.opts: 
-Djava.io.tmpdir=./tmp", closing this issue now.

> To improve utilization of the `java.io.tmpdir` for YARN module
> --
>
> Key: FLINK-15447
> URL: https://issues.apache.org/jira/browse/FLINK-15447
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.9.1
>Reporter: Victor Wong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> *#Background*
> Currently, when running Flink on Yarn, the "java.io.tmpdir" property is set 
> to the default value, which is "/tmp".  
> Sometimes we ran into exceptions caused by a full "/tmp" directory, which 
> would not be cleaned automatically after applications finished.
>  
> #*Goal*
> quoted from: [HADOOP-2735|https://issues.apache.org/jira/browse/HADOOP-2735]
> _1) Tasks can utilize all disks when using tmp_
>  _2) Any undeleted tmp files will be deleted by the tasktracker when 
> task(job?) is done._
>  
> #*Suggestion*
> I think we can set "java.io.tmpdir" to "PWD/tmp" directory, or 
> something similar. "PWD" will be replaced with the true working 
> directory of JM/TM by Yarn, which will be cleaned automatically.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-15447) To improve utilization of the `java.io.tmpdir` for YARN module

2020-03-06 Thread Victor Wong (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053156#comment-17053156
 ] 

Victor Wong commented on FLINK-15447:
-

Hi, [~trohrmann], since this issue is still valid against the current master 
branch, I came up with a PR to demonstrate my intended change, which was 
implemented mainly based on previous discussions. Please give me some advice if 
available. 

> To improve utilization of the `java.io.tmpdir` for YARN module
> --
>
> Key: FLINK-15447
> URL: https://issues.apache.org/jira/browse/FLINK-15447
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.9.1
>Reporter: Victor Wong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *#Background*
> Currently, when running Flink on Yarn, the "java.io.tmpdir" property is set 
> to the default value, which is "/tmp".  
> Sometimes we ran into exceptions caused by a full "/tmp" directory, which 
> would not be cleaned automatically after applications finished.
>  
> #*Goal*
> quoted from: [HADOOP-2735|https://issues.apache.org/jira/browse/HADOOP-2735]
> _1) Tasks can utilize all disks when using tmp_
>  _2) Any undeleted tmp files will be deleted by the tasktracker when 
> task(job?) is done._
>  
> #*Suggestion*
> I think we can set "java.io.tmpdir" to "PWD/tmp" directory, or 
> something similar. "PWD" will be replaced with the true working 
> directory of JM/TM by Yarn, which will be cleaned automatically.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-15447) To improve utilization of the `java.io.tmpdir` for YARN module

2020-02-02 Thread Victor Wong (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17028644#comment-17028644
 ] 

Victor Wong commented on FLINK-15447:
-

[~trohrmann], thanks for your attention. 

_I am wondering whether you would like to configure the system property 
{{java.io.tmpdir}} to point towards {{./tmp}} or to only change Flink's temp 
directories._

_---_

The former, configure the system property java.io.tmpdir.

 

_If not, then we would need to adapt the java command which starts the Flink 
processes._

_---_

I think this is the best choice, which has _the benefit that libraries, relying 
on {{java.io.tmpdir}}, will not write their temporary data to {{/tmp}}, too._

> To improve utilization of the `java.io.tmpdir` for YARN module
> --
>
> Key: FLINK-15447
> URL: https://issues.apache.org/jira/browse/FLINK-15447
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.9.1
>Reporter: Victor Wong
>Priority: Major
>
> *#Background*
> Currently, when running Flink on Yarn, the "java.io.tmpdir" property is set 
> to the default value, which is "/tmp".  
> Sometimes we ran into exceptions caused by a full "/tmp" directory, which 
> would not be cleaned automatically after applications finished.
>  
> #*Goal*
> quoted from: [HADOOP-2735|https://issues.apache.org/jira/browse/HADOOP-2735]
> _1) Tasks can utilize all disks when using tmp_
>  _2) Any undeleted tmp files will be deleted by the tasktracker when 
> task(job?) is done._
>  
> #*Suggestion*
> I think we can set "java.io.tmpdir" to "PWD/tmp" directory, or 
> something similar. "PWD" will be replaced with the true working 
> directory of JM/TM by Yarn, which will be cleaned automatically.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-15447) To improve utilization of the `java.io.tmpdir` for YARN module

2020-01-24 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023081#comment-17023081
 ] 

Till Rohrmann commented on FLINK-15447:
---

Thanks for starting this discussion [~victor-wong]. I am wondering whether you 
would like to configure the system property {{java.io.tmpdir}} to point towards 
{{./tmp}} or to only change Flink's temp directories. At the moment, we 
configure Flink's tmp directories {{TMP_DIRS}} to point to 
{{ApplicationConstants.Environment.LOCAL_DIRS}} on the master but not on the 
{{TaskExecutor}}. See FLINK-8350 and FLINK-9762 for more information.

If the latter approach would be good enough, then one could set up the 
{{TaskExecutor}} process with {{TMP_DIRS}} pointing towards 
{{ApplicationConstants.Environment.LOCAL_DIRS}} as well. I think it has been an 
oversight that this is not done symmetrically atm. If not, then we would need 
to adapt the java command which starts the Flink processes. The former approach 
would also have the benefit that libraries, relying on {{java.io.tmpdir}}, will 
not write their temporary data to {{/tmp}}, too.

> To improve utilization of the `java.io.tmpdir` for YARN module
> --
>
> Key: FLINK-15447
> URL: https://issues.apache.org/jira/browse/FLINK-15447
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.9.1
>Reporter: Victor Wong
>Priority: Major
>
> *#Background*
> Currently, when running Flink on Yarn, the "java.io.tmpdir" property is set 
> to the default value, which is "/tmp".  
> Sometimes we ran into exceptions caused by a full "/tmp" directory, which 
> would not be cleaned automatically after applications finished.
>  
> #*Goal*
> quoted from: [HADOOP-2735|https://issues.apache.org/jira/browse/HADOOP-2735]
> _1) Tasks can utilize all disks when using tmp_
>  _2) Any undeleted tmp files will be deleted by the tasktracker when 
> task(job?) is done._
>  
> #*Suggestion*
> I think we can set "java.io.tmpdir" to "PWD/tmp" directory, or 
> something similar. "PWD" will be replaced with the true working 
> directory of JM/TM by Yarn, which will be cleaned automatically.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-15447) To improve utilization of the `java.io.tmpdir` for YARN module

2020-01-23 Thread Victor Wong (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17021862#comment-17021862
 ] 

Victor Wong commented on FLINK-15447:
-

[~fly_in_gis], it makes sense to make "java.io.tmpdir" configurable, we could 
add a new YarnOption configuration to achieve this. If this issue would be 
assigned to me, you could help me to review my PR.

> To improve utilization of the `java.io.tmpdir` for YARN module
> --
>
> Key: FLINK-15447
> URL: https://issues.apache.org/jira/browse/FLINK-15447
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.9.1
>Reporter: Victor Wong
>Priority: Major
>
> *#Background*
> Currently, when running Flink on Yarn, the "java.io.tmpdir" property is set 
> to the default value, which is "/tmp".  
> Sometimes we ran into exceptions caused by a full "/tmp" directory, which 
> would not be cleaned automatically after applications finished.
>  
> #*Goal*
> quoted from: [HADOOP-2735|https://issues.apache.org/jira/browse/HADOOP-2735]
> _1) Tasks can utilize all disks when using tmp_
>  _2) Any undeleted tmp files will be deleted by the tasktracker when 
> task(job?) is done._
>  
> #*Suggestion*
> I think we can set "java.io.tmpdir" to "PWD/tmp" directory, or 
> something similar. "PWD" will be replaced with the true working 
> directory of JM/TM by Yarn, which will be cleaned automatically.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-15447) To improve utilization of the `java.io.tmpdir` for YARN module

2020-01-23 Thread Victor Wong (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17021858#comment-17021858
 ] 

Victor Wong commented on FLINK-15447:
-

[~rongr] updated this issue based on your suggestion. Could you assign it to 
me? Thanks!

> To improve utilization of the `java.io.tmpdir` for YARN module
> --
>
> Key: FLINK-15447
> URL: https://issues.apache.org/jira/browse/FLINK-15447
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.9.1
>Reporter: Victor Wong
>Priority: Major
>
> *#Background*
> Currently, when running Flink on Yarn, the "java.io.tmpdir" property is set 
> to the default value, which is "/tmp".  
> Sometimes we ran into exceptions caused by a full "/tmp" directory, which 
> would not be cleaned automatically after applications finished.
>  
> #*Goal*
> quoted from: [HADOOP-2735|https://issues.apache.org/jira/browse/HADOOP-2735]
> _1) Tasks can utilize all disks when using tmp_
>  _2) Any undeleted tmp files will be deleted by the tasktracker when 
> task(job?) is done._
>  
> #*Suggestion*
> I think we can set "java.io.tmpdir" to "PWD/tmp" directory, or 
> something similar. "PWD" will be replaced with the true working 
> directory of JM/TM by Yarn, which will be cleaned automatically.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)