[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=418973&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-418973 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 08/Apr/20 23:27 Start Date: 08/Apr/20 23:27 Worklog Time Spent: 10m Work Description: ibzib commented on issue #11052: [BEAM-9446] Add missing parallelism and execution mode args. URL: https://github.com/apache/beam/pull/11052#issuecomment-611244971 This PR is obsolete since #11189 is merged. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 418973) Time Spent: 5h 50m (was: 5h 40m) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Fix For: 2.21.0 > > Time Spent: 5h 50m > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=418974&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-418974 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 08/Apr/20 23:27 Start Date: 08/Apr/20 23:27 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #11052: [BEAM-9446] Add missing parallelism and execution mode args. URL: https://github.com/apache/beam/pull/11052 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 418974) Time Spent: 6h (was: 5h 50m) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Fix For: 2.21.0 > > Time Spent: 6h > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=412555&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-412555 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 30/Mar/20 20:33 Start Date: 30/Mar/20 20:33 Worklog Time Spent: 10m Work Description: mxm commented on pull request #11189: [BEAM-9446] Retain unknown arguments when using uber jar job server. URL: https://github.com/apache/beam/pull/11189#discussion_r400476144 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -285,10 +289,29 @@ def get_all_options( cls._add_argparse_args(parser) # pylint: disable=protected-access if add_extra_args_fn: add_extra_args_fn(parser) + known_args, unknown_args = parser.parse_known_args(self._flags) -if unknown_args: - _LOGGER.warning("Discarding unparseable args: %s", unknown_args) -result = vars(known_args) +if retain_unknown_options: + i = 0 + while i < len(unknown_args): +# Treat all unary flags as booleans, and all binary argument values as +# strings. +if i + 1 >= len(unknown_args) or unknown_args[i + 1].startswith('-'): + split = unknown_args[i].rsplit('=') Review comment: I was only referring to options values containing `=` which can be the case independently of append options. That is fixed now. So all good. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 412555) Time Spent: 5h 40m (was: 5.5h) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Fix For: 2.21.0 > > Time Spent: 5h 40m > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=412528&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-412528 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 30/Mar/20 19:43 Start Date: 30/Mar/20 19:43 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #11189: [BEAM-9446] Retain unknown arguments when using uber jar job server. URL: https://github.com/apache/beam/pull/11189 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 412528) Time Spent: 5.5h (was: 5h 20m) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Fix For: 2.21.0 > > Time Spent: 5.5h > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=412527&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-412527 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 30/Mar/20 19:43 Start Date: 30/Mar/20 19:43 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #11189: [BEAM-9446] Retain unknown arguments when using uber jar job server. URL: https://github.com/apache/beam/pull/11189#discussion_r400448482 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -285,10 +289,29 @@ def get_all_options( cls._add_argparse_args(parser) # pylint: disable=protected-access if add_extra_args_fn: add_extra_args_fn(parser) + known_args, unknown_args = parser.parse_known_args(self._flags) -if unknown_args: - _LOGGER.warning("Discarding unparseable args: %s", unknown_args) -result = vars(known_args) +if retain_unknown_options: + i = 0 + while i < len(unknown_args): +# Treat all unary flags as booleans, and all binary argument values as +# strings. +if i + 1 >= len(unknown_args) or unknown_args[i + 1].startswith('-'): + split = unknown_args[i].rsplit('=') Review comment: I will merge this now. If we need to support `append` arguments for some reason, we should be able to add that later. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 412527) Time Spent: 5h 20m (was: 5h 10m) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Fix For: 2.21.0 > > Time Spent: 5h 20m > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=411589&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-411589 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 28/Mar/20 01:54 Start Date: 28/Mar/20 01:54 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #11189: [BEAM-9446] Retain unknown arguments when using uber jar job server. URL: https://github.com/apache/beam/pull/11189#discussion_r399604848 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -285,10 +289,29 @@ def get_all_options( cls._add_argparse_args(parser) # pylint: disable=protected-access if add_extra_args_fn: add_extra_args_fn(parser) + known_args, unknown_args = parser.parse_known_args(self._flags) -if unknown_args: - _LOGGER.warning("Discarding unparseable args: %s", unknown_args) -result = vars(known_args) +if retain_unknown_options: + i = 0 + while i < len(unknown_args): +# Treat all unary flags as booleans, and all binary argument values as +# strings. +if i + 1 >= len(unknown_args) or unknown_args[i + 1].startswith('-'): + split = unknown_args[i].rsplit('=') Review comment: Oh, I see what you mean now. I will commit your suggestion to avoid confusion. However, I don't think we should support unrecognized `append` arguments. I don't want to have to infer complex argument types. As for the specific case of `experiments`, that will not be a problem because `experiments` is already recognized by the parser. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 411589) Time Spent: 5h 10m (was: 5h) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Fix For: 2.21.0 > > Time Spent: 5h 10m > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=411380&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-411380 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 27/Mar/20 21:07 Start Date: 27/Mar/20 21:07 Worklog Time Spent: 10m Work Description: mxm commented on pull request #11189: [BEAM-9446] Retain unknown arguments when using uber jar job server. URL: https://github.com/apache/beam/pull/11189#discussion_r399538469 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -285,10 +289,29 @@ def get_all_options( cls._add_argparse_args(parser) # pylint: disable=protected-access if add_extra_args_fn: add_extra_args_fn(parser) + known_args, unknown_args = parser.parse_known_args(self._flags) -if unknown_args: - _LOGGER.warning("Discarding unparseable args: %s", unknown_args) -result = vars(known_args) +if retain_unknown_options: + i = 0 + while i < len(unknown_args): +# Treat all unary flags as booleans, and all binary argument values as +# strings. +if i + 1 >= len(unknown_args) or unknown_args[i + 1].startswith('-'): + split = unknown_args[i].rsplit('=') Review comment: Comment still applies. Splitting should be performed from the left, not the right. At most one split has to be performed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 411380) Time Spent: 5h (was: 4h 50m) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Fix For: 2.21.0 > > Time Spent: 5h > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=411182&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-411182 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 27/Mar/20 16:52 Start Date: 27/Mar/20 16:52 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #11189: [BEAM-9446] Retain unknown arguments when using uber jar job server. URL: https://github.com/apache/beam/pull/11189#discussion_r399405064 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -285,10 +289,29 @@ def get_all_options( cls._add_argparse_args(parser) # pylint: disable=protected-access if add_extra_args_fn: add_extra_args_fn(parser) + known_args, unknown_args = parser.parse_known_args(self._flags) -if unknown_args: - _LOGGER.warning("Discarding unparseable args: %s", unknown_args) -result = vars(known_args) +if retain_unknown_options: + i = 0 + while i < len(unknown_args): +# Treat all unary flags as booleans, and all binary argument values as +# strings. +if i + 1 >= len(unknown_args) or unknown_args[i + 1].startswith('-'): + split = unknown_args[i].rsplit('=') Review comment: The split is only used to get the argument name, and the rest of the split isn't used. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 411182) Time Spent: 4h 50m (was: 4h 40m) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Fix For: 2.21.0 > > Time Spent: 4h 50m > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=410918&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410918 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 27/Mar/20 08:43 Start Date: 27/Mar/20 08:43 Worklog Time Spent: 10m Work Description: mxm commented on pull request #11189: [BEAM-9446] Retain unknown arguments when using uber jar job server. URL: https://github.com/apache/beam/pull/11189#discussion_r399107521 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -285,10 +289,29 @@ def get_all_options( cls._add_argparse_args(parser) # pylint: disable=protected-access if add_extra_args_fn: add_extra_args_fn(parser) + known_args, unknown_args = parser.parse_known_args(self._flags) -if unknown_args: - _LOGGER.warning("Discarding unparseable args: %s", unknown_args) -result = vars(known_args) +if retain_unknown_options: + i = 0 + while i < len(unknown_args): +# Treat all unary flags as booleans, and all binary argument values as +# strings. +if i + 1 >= len(unknown_args) or unknown_args[i + 1].startswith('-'): + split = unknown_args[i].rsplit('=') Review comment: Why `rsplit`? Shouldn't this be: ```suggestion split = unknown_args[i].split('=', 1) ``` ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410918) Time Spent: 4h 40m (was: 4.5h) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Fix For: 2.21.0 > > Time Spent: 4h 40m > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=410917&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410917 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 27/Mar/20 08:43 Start Date: 27/Mar/20 08:43 Worklog Time Spent: 10m Work Description: mxm commented on pull request #11189: [BEAM-9446] Retain unknown arguments when using uber jar job server. URL: https://github.com/apache/beam/pull/11189#discussion_r399108204 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -285,10 +289,29 @@ def get_all_options( cls._add_argparse_args(parser) # pylint: disable=protected-access if add_extra_args_fn: add_extra_args_fn(parser) + known_args, unknown_args = parser.parse_known_args(self._flags) -if unknown_args: - _LOGGER.warning("Discarding unparseable args: %s", unknown_args) -result = vars(known_args) +if retain_unknown_options: + i = 0 + while i < len(unknown_args): +# Treat all unary flags as booleans, and all binary argument values as +# strings. +if i + 1 >= len(unknown_args) or unknown_args[i + 1].startswith('-'): + split = unknown_args[i].rsplit('=') Review comment: Otherwise this will break options like `experiments=state_cache_size=1`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410917) Time Spent: 4h 40m (was: 4.5h) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Fix For: 2.21.0 > > Time Spent: 4h 40m > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=408925&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-408925 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 24/Mar/20 16:59 Start Date: 24/Mar/20 16:59 Worklog Time Spent: 10m Work Description: mxm commented on pull request #11189: [BEAM-9446] Retain unknown arguments when using uber jar job server. URL: https://github.com/apache/beam/pull/11189#discussion_r397313859 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -285,10 +289,25 @@ def get_all_options( cls._add_argparse_args(parser) # pylint: disable=protected-access if add_extra_args_fn: add_extra_args_fn(parser) + known_args, unknown_args = parser.parse_known_args(self._flags) -if unknown_args: - _LOGGER.warning("Discarding unparseable args: %s", unknown_args) -result = vars(known_args) +if retain_unknown_options: + i = 0 + while i < len(unknown_args): +# Treat all unary flags as booleans, and all binary argument values as +# strings. +if i + 1 >= len(unknown_args) or unknown_args[i + 1].startswith('--'): Review comment: That sounds good :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 408925) Time Spent: 4.5h (was: 4h 20m) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Fix For: 2.21.0 > > Time Spent: 4.5h > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=408923&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-408923 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 24/Mar/20 16:57 Start Date: 24/Mar/20 16:57 Worklog Time Spent: 10m Work Description: mxm commented on issue #11189: [BEAM-9446] Retain unknown arguments when using uber jar job server. URL: https://github.com/apache/beam/pull/11189#issuecomment-603368311 Thanks for filing the bug. Unfortunately we have some post commits failing after adding Flink 1.10. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 408923) Time Spent: 4h 20m (was: 4h 10m) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Fix For: 2.21.0 > > Time Spent: 4h 20m > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=408193&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-408193 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 23/Mar/20 19:27 Start Date: 23/Mar/20 19:27 Worklog Time Spent: 10m Work Description: ibzib commented on issue #11189: [BEAM-9446] Retain unknown arguments when using uber jar job server. URL: https://github.com/apache/beam/pull/11189#issuecomment-602808467 @mxm thanks for the feedback, I addressed your comments. Unrelated: I wasn't aware that the Flink portable jar postcommit has been failing. I filed and self-assigned BEAM-9575. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 408193) Time Spent: 4h 10m (was: 4h) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Fix For: 2.21.0 > > Time Spent: 4h 10m > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=408153&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-408153 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 23/Mar/20 18:56 Start Date: 23/Mar/20 18:56 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #11189: [BEAM-9446] Retain unknown arguments when using uber jar job server. URL: https://github.com/apache/beam/pull/11189#discussion_r396684486 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -285,10 +289,25 @@ def get_all_options( cls._add_argparse_args(parser) # pylint: disable=protected-access if add_extra_args_fn: add_extra_args_fn(parser) + known_args, unknown_args = parser.parse_known_args(self._flags) -if unknown_args: - _LOGGER.warning("Discarding unparseable args: %s", unknown_args) -result = vars(known_args) +if retain_unknown_options: + i = 0 + while i < len(unknown_args): +# Treat all unary flags as booleans, and all binary argument values as +# strings. +if i + 1 >= len(unknown_args) or unknown_args[i + 1].startswith('--'): Review comment: I guess I should change it to `-`, not `--`. `-` is the default prefix for argparse, and AFAICT that is true for Beam as well, even though most (all?) pipeline options use `--`. https://docs.python.org/3/library/argparse.html#prefix-chars So then the question becomes "Should we always check a parameter name starts with `-`?" And the answer is we don't have to, because the call to parse_args will do that for us. `test_retain_unknown_options_unary_missing_prefix` tests that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 408153) Time Spent: 4h (was: 3h 50m) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Fix For: 2.21.0 > > Time Spent: 4h > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=408139&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-408139 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 23/Mar/20 18:44 Start Date: 23/Mar/20 18:44 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #11189: [BEAM-9446] Retain unknown arguments when using uber jar job server. URL: https://github.com/apache/beam/pull/11189#discussion_r396677870 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -285,10 +289,25 @@ def get_all_options( cls._add_argparse_args(parser) # pylint: disable=protected-access if add_extra_args_fn: add_extra_args_fn(parser) + known_args, unknown_args = parser.parse_known_args(self._flags) -if unknown_args: - _LOGGER.warning("Discarding unparseable args: %s", unknown_args) -result = vars(known_args) +if retain_unknown_options: + i = 0 + while i < len(unknown_args): +# Treat all unary flags as booleans, and all binary argument values as +# strings. +if i + 1 >= len(unknown_args) or unknown_args[i + 1].startswith('--'): + parser.add_argument(unknown_args[i], action='store_true') + i += 1 +else: + parser.add_argument(unknown_args[i], type=str) + i += 2 Review comment: Good catch. I will have to add additional logic to handle that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 408139) Time Spent: 3h 50m (was: 3h 40m) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Fix For: 2.21.0 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=407791&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-407791 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 23/Mar/20 09:29 Start Date: 23/Mar/20 09:29 Worklog Time Spent: 10m Work Description: mxm commented on pull request #11189: [BEAM-9446] Retain unknown arguments when using uber jar job server. URL: https://github.com/apache/beam/pull/11189#discussion_r396309448 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -285,10 +289,25 @@ def get_all_options( cls._add_argparse_args(parser) # pylint: disable=protected-access if add_extra_args_fn: add_extra_args_fn(parser) + known_args, unknown_args = parser.parse_known_args(self._flags) -if unknown_args: - _LOGGER.warning("Discarding unparseable args: %s", unknown_args) -result = vars(known_args) +if retain_unknown_options: + i = 0 + while i < len(unknown_args): +# Treat all unary flags as booleans, and all binary argument values as +# strings. +if i + 1 >= len(unknown_args) or unknown_args[i + 1].startswith('--'): + parser.add_argument(unknown_args[i], action='store_true') + i += 1 +else: + parser.add_argument(unknown_args[i], type=str) + i += 2 Review comment: What about the `--arg=value` format? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 407791) Time Spent: 3h 40m (was: 3.5h) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Fix For: 2.21.0 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=407790&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-407790 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 23/Mar/20 09:29 Start Date: 23/Mar/20 09:29 Worklog Time Spent: 10m Work Description: mxm commented on pull request #11189: [BEAM-9446] Retain unknown arguments when using uber jar job server. URL: https://github.com/apache/beam/pull/11189#discussion_r396310204 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -285,10 +289,25 @@ def get_all_options( cls._add_argparse_args(parser) # pylint: disable=protected-access if add_extra_args_fn: add_extra_args_fn(parser) + known_args, unknown_args = parser.parse_known_args(self._flags) -if unknown_args: - _LOGGER.warning("Discarding unparseable args: %s", unknown_args) -result = vars(known_args) +if retain_unknown_options: + i = 0 + while i < len(unknown_args): +# Treat all unary flags as booleans, and all binary argument values as +# strings. +if i + 1 >= len(unknown_args) or unknown_args[i + 1].startswith('--'): Review comment: Should we always check a parameter name starts with `--`, like done here for boolean flags? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 407790) Time Spent: 3.5h (was: 3h 20m) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Fix For: 2.21.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=407335&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-407335 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 21/Mar/20 00:08 Start Date: 21/Mar/20 00:08 Worklog Time Spent: 10m Work Description: ibzib commented on issue #11189: [BEAM-9446] Retain unknown arguments when using uber jar job server. URL: https://github.com/apache/beam/pull/11189#issuecomment-601960942 Run PortableJar_Flink PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 407335) Time Spent: 3h 20m (was: 3h 10m) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Fix For: 2.21.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=407331&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-407331 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 20/Mar/20 23:50 Start Date: 20/Mar/20 23:50 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #11189: [BEAM-9446] Retain unknown arguments when using uber jar job server. URL: https://github.com/apache/beam/pull/11189 See #11052 for context. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStream
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=407133&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-407133 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 20/Mar/20 17:12 Start Date: 20/Mar/20 17:12 Worklog Time Spent: 10m Work Description: ibzib commented on issue #11052: [BEAM-9446] Add missing parallelism and execution mode args. URL: https://github.com/apache/beam/pull/11052#issuecomment-601814129 > @ibzib Could we try to keep the Runner options in case we are not reading them from the job server? We wouldn't have to manually add the options then. Yeah, that's the plan. Just haven't gotten around to it yet. Looking now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 407133) Time Spent: 3h (was: 2h 50m) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Fix For: 2.21.0 > > Time Spent: 3h > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=407100&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-407100 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 20/Mar/20 15:46 Start Date: 20/Mar/20 15:46 Worklog Time Spent: 10m Work Description: mxm commented on issue #11052: [BEAM-9446] Add missing parallelism and execution mode args. URL: https://github.com/apache/beam/pull/11052#issuecomment-601768565 @ibzib Could we try to keep the Runner options in case we are not reading them from the job server? We wouldn't have to manually add the options then. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 407100) Time Spent: 2h 50m (was: 2h 40m) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Fix For: 2.21.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=406452&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-406452 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 19/Mar/20 18:04 Start Date: 19/Mar/20 18:04 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11052: [BEAM-9446] Add missing parallelism and execution mode args. URL: https://github.com/apache/beam/pull/11052#discussion_r395220392 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -1075,6 +1075,22 @@ def _add_argparse_args(cls, parser): ' directly, rather than starting up a job server.' ' Only applies when flink_master is set to a' ' cluster address. Requires Python 3.6+.') +parser.add_argument( +'--parallelism', +default=-1, +type=int, +help='The degree of parallelism to be used when distributing ' + 'operations onto workers. If the parallelism is not set, the ' + 'configured Flink default is used, or 1 if none can be found.' +) +parser.add_argument( +'--execution_mode_for_batch', +default='PIPELINED', +help='Flink mode for data exchange of batch pipelines. ' Review comment: While experiments are preferred, I think we can document options such as this as experimental as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 406452) Time Spent: 2h 40m (was: 2.5h) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 2h 40m > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=402995&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-402995 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 13/Mar/20 17:14 Start Date: 13/Mar/20 17:14 Worklog Time Spent: 10m Work Description: ibzib commented on issue #11052: [BEAM-9446] Add missing parallelism and execution mode args. URL: https://github.com/apache/beam/pull/11052#issuecomment-598827334 > Integer options need to be converted to string and that is done in the portable runner: All the unknown options will be strings already, so that shouldn't be needed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 402995) Time Spent: 2.5h (was: 2h 20m) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 2.5h > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=402991&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-402991 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 13/Mar/20 17:10 Start Date: 13/Mar/20 17:10 Worklog Time Spent: 10m Work Description: tweise commented on issue #11052: [BEAM-9446] Add missing parallelism and execution mode args. URL: https://github.com/apache/beam/pull/11052#issuecomment-598825634 Integer options need to be converted to string and that is done in the portable runner: https://github.com/apache/beam/blob/6fce2528b0ff011ff3d416ac9e70c8d7264b1f3c/sdks/python/apache_beam/runners/portability/portable_runner.py#L162 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 402991) Time Spent: 2h 20m (was: 2h 10m) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 2h 20m > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=402963&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-402963 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 13/Mar/20 16:20 Start Date: 13/Mar/20 16:20 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #11052: [BEAM-9446] Add missing parallelism and execution mode args. URL: https://github.com/apache/beam/pull/11052#discussion_r392331379 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -1075,6 +1075,22 @@ def _add_argparse_args(cls, parser): ' directly, rather than starting up a job server.' ' Only applies when flink_master is set to a' ' cluster address. Requires Python 3.6+.') +parser.add_argument( +'--parallelism', +default=-1, +type=int, +help='The degree of parallelism to be used when distributing ' + 'operations onto workers. If the parallelism is not set, the ' + 'configured Flink default is used, or 1 if none can be found.' +) +parser.add_argument( +'--execution_mode_for_batch', Review comment: > I think we should continue to discard invalid options when the client is aware of the full set of options and can perform the validation Agreed, we would keep options only when we can't be sure (so just the uber jar job server for now). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 402963) Time Spent: 2h 10m (was: 2h) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 2h 10m > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=402957&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-402957 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 13/Mar/20 16:12 Start Date: 13/Mar/20 16:12 Worklog Time Spent: 10m Work Description: tweise commented on pull request #11052: [BEAM-9446] Add missing parallelism and execution mode args. URL: https://github.com/apache/beam/pull/11052#discussion_r392326569 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -1075,6 +1075,22 @@ def _add_argparse_args(cls, parser): ' directly, rather than starting up a job server.' ' Only applies when flink_master is set to a' ' cluster address. Requires Python 3.6+.') +parser.add_argument( +'--parallelism', +default=-1, +type=int, +help='The degree of parallelism to be used when distributing ' + 'operations onto workers. If the parallelism is not set, the ' + 'configured Flink default is used, or 1 if none can be found.' +) +parser.add_argument( +'--execution_mode_for_batch', Review comment: I think we should continue to discard invalid options when the client is aware of the full set of options and can perform the validation - as is the case with portable runner and job server. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 402957) Time Spent: 2h (was: 1h 50m) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 2h > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=402514&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-402514 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 12/Mar/20 21:53 Start Date: 12/Mar/20 21:53 Worklog Time Spent: 10m Work Description: mxm commented on issue #11052: [BEAM-9446] Add missing parallelism and execution mode args. URL: https://github.com/apache/beam/pull/11052#issuecomment-598434409 What I meant is keeping "unknown" pipeline options instead of discarding them. If we pass the unknown ones as strings I thought this could work. For example, if we pass `"123"` the parser on the Java side is smart enough to figure it is the integer `123`. I haven't tried this out but I'd suggest we do. Of course this is not very user-friendly. That's why I'd suggest to write pipeline options in a language-agnostic format which can be read by all SDKs. Considering the Flink pipeline options, what we usually want to set is what is called `ExecutionConfig` in Flink. With Flink they are configured directly in the API. For Beam if the parameters are just strings/ints we could come up with something reflection based to set arbitrary new settings. I agree this would be helpful. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 402514) Time Spent: 1h 50m (was: 1h 40m) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 1h 50m > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=401057&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-401057 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 10/Mar/20 23:55 Start Date: 10/Mar/20 23:55 Worklog Time Spent: 10m Work Description: ibzib commented on issue #11052: [BEAM-9446] Add missing parallelism and execution mode args. URL: https://github.com/apache/beam/pull/11052#issuecomment-597375999 > Have you tried working around this by not discarding these options? AFAIK the json parser is smart enough to read the stringified verison of all option values. I think this may be the best strategy for the uber jar job server, however I don't think we should change this behavior for other runners. (Not sure if that's what you were proposing, just organizing my thoughts here:) - In Dataflow, we seem to duplicate every runner option for each SDK, perhaps because there is no better choice due to the runner architecture. In that case, since all the args are presumably known by the SDK, it makes more sense to drop them (status quo) or maybe even error when arguments are unknown, because it usually means the user made a mistake. - With the "old" Flink job server, retrieving args from the job server is an adequate workaround, so again, there should be no need for unrecognized arguments. I discussed this with @angoenka today and he suggested that we consider the runner-Flink boundary as well -- i.e., if we should have some way of enabling _all_ Flink environment options to be set through Beam pipeline options instead of just adding the ones we need as we go. This would potentially save users from having to wait for a new release just for us to add a pipeline option that trivially maps 1:1 to Flink (of course, they can always change Flink's conf files, which was going to be my proposed workaround here, but AFAIK that requires a restart of the cluster and would affect all jobs run on the cluster). WDYT? +cc @tweise This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 401057) Time Spent: 1h 40m (was: 1.5h) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 1h 40m > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=400073&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400073 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 09/Mar/20 12:16 Start Date: 09/Mar/20 12:16 Worklog Time Spent: 10m Work Description: mxm commented on issue #11052: [BEAM-9446] Add missing parallelism and execution mode args. URL: https://github.com/apache/beam/pull/11052#issuecomment-596491709 Have you tried working around this by not discarding these options? AFAIK the json parser is smart enough to read the stringified verison of all option values. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 400073) Time Spent: 1.5h (was: 1h 20m) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 1.5h > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=399400&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-399400 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 06/Mar/20 22:22 Start Date: 06/Mar/20 22:22 Worklog Time Spent: 10m Work Description: ibzib commented on issue #11052: [BEAM-9446] Add missing parallelism and execution mode args. URL: https://github.com/apache/beam/pull/11052#issuecomment-595990099 > I'm OK with this as a short-term fix, but we should still pursue fixing this properly as this is not scalable. @robertwb for a short term fix, I think we can get by with setting these options in the Flink cluster configuration file instead. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 399400) Time Spent: 1h 20m (was: 1h 10m) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 1h 20m > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=399333&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-399333 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 06/Mar/20 19:51 Start Date: 06/Mar/20 19:51 Worklog Time Spent: 10m Work Description: robertwb commented on issue #11052: [BEAM-9446] Add missing parallelism and execution mode args. URL: https://github.com/apache/beam/pull/11052#issuecomment-595937673 I'm OK with this as a short-term fix, but we should still pursue fixing this properly as this is not scalable. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 399333) Time Spent: 1h 10m (was: 1h) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 1h 10m > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=398731&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398731 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 05/Mar/20 22:24 Start Date: 05/Mar/20 22:24 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #11052: [BEAM-9446] Add missing parallelism and execution mode args. URL: https://github.com/apache/beam/pull/11052#discussion_r388602678 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -1075,6 +1075,22 @@ def _add_argparse_args(cls, parser): ' directly, rather than starting up a job server.' ' Only applies when flink_master is set to a' ' cluster address. Requires Python 3.6+.') +parser.add_argument( +'--parallelism', +default=-1, +type=int, +help='The degree of parallelism to be used when distributing ' + 'operations onto workers. If the parallelism is not set, the ' + 'configured Flink default is used, or 1 if none can be found.' +) +parser.add_argument( +'--execution_mode_for_batch', Review comment: I agree, though as discussed earlier we might have difficulties parsing non-string options. I'll try it and see how it goes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398731) Time Spent: 1h (was: 50m) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 1h > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=398730&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398730 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 05/Mar/20 22:22 Start Date: 05/Mar/20 22:22 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #11052: [BEAM-9446] Add missing parallelism and execution mode args. URL: https://github.com/apache/beam/pull/11052#discussion_r388601783 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -1075,6 +1075,22 @@ def _add_argparse_args(cls, parser): ' directly, rather than starting up a job server.' ' Only applies when flink_master is set to a' ' cluster address. Requires Python 3.6+.') +parser.add_argument( +'--parallelism', +default=-1, +type=int, +help='The degree of parallelism to be used when distributing ' + 'operations onto workers. If the parallelism is not set, the ' + 'configured Flink default is used, or 1 if none can be found.' +) +parser.add_argument( +'--execution_mode_for_batch', +default='PIPELINED', +help='Flink mode for data exchange of batch pipelines. ' Review comment: I think that's what experiment(s) are for: https://github.com/apache/beam/blob/35beffc5775636eb96e33eb57c6e5f213cfe033a/sdks/python/apache_beam/options/pipeline_options.py#L803-L811 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398730) Time Spent: 50m (was: 40m) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 50m > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=398401&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398401 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 05/Mar/20 14:28 Start Date: 05/Mar/20 14:28 Worklog Time Spent: 10m Work Description: mxm commented on pull request #11052: [BEAM-9446] Add missing parallelism and execution mode args. URL: https://github.com/apache/beam/pull/11052#discussion_r388327586 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -1075,6 +1075,22 @@ def _add_argparse_args(cls, parser): ' directly, rather than starting up a job server.' ' Only applies when flink_master is set to a' ' cluster address. Requires Python 3.6+.') +parser.add_argument( +'--parallelism', +default=-1, +type=int, +help='The degree of parallelism to be used when distributing ' + 'operations onto workers. If the parallelism is not set, the ' + 'configured Flink default is used, or 1 if none can be found.' +) +parser.add_argument( +'--execution_mode_for_batch', Review comment: I'm not sure we should add these here because that's what we used to do and it was inconsistent and hard to maintain. What we want to do, is to not discard those options but still warn about them not being parsed in the Python SDK. This will allow us to still use them in the Runner code. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398401) Time Spent: 40m (was: 0.5h) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 40m > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=398323&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398323 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 05/Mar/20 11:21 Start Date: 05/Mar/20 11:21 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #11052: [BEAM-9446] Add missing parallelism and execution mode args. URL: https://github.com/apache/beam/pull/11052#discussion_r388232005 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -1075,6 +1075,22 @@ def _add_argparse_args(cls, parser): ' directly, rather than starting up a job server.' ' Only applies when flink_master is set to a' ' cluster address. Requires Python 3.6+.') +parser.add_argument( +'--parallelism', +default=-1, +type=int, +help='The degree of parallelism to be used when distributing ' + 'operations onto workers. If the parallelism is not set, the ' + 'configured Flink default is used, or 1 if none can be found.' +) +parser.add_argument( +'--execution_mode_for_batch', +default='PIPELINED', +help='Flink mode for data exchange of batch pipelines. ' Review comment: (slightly unrelated to the PR comment) Do we have a way to mark pipelineoptions as `@Experimental` in Python? PipelineOptions are critical from the point of view of backwards compatibility, so we should probably be marking non stable options (if we do not). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398323) Time Spent: 0.5h (was: 20m) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 0.5h > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=398059&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398059 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 05/Mar/20 01:30 Start Date: 05/Mar/20 01:30 Worklog Time Spent: 10m Work Description: angoenka commented on issue #11052: [BEAM-9446] Add missing parallelism and execution mode args. URL: https://github.com/apache/beam/pull/11052#issuecomment-594980292 If I understand correctly, this is needed because we don't go though job api for submission. In job api based submission, runner options are automatically pulled from the runner? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398059) Time Spent: 20m (was: 10m) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 20m > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=398053&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398053 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 05/Mar/20 01:18 Start Date: 05/Mar/20 01:18 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #11052: [BEAM-9446] Add missing parallelism and execution mode args. URL: https://github.com/apache/beam/pull/11052 I'm not thrilled about manually copying these over, later I might look into a long-term solution to this problem. But this works for now. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompleted