[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=418973&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-418973
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 08/Apr/20 23:27
Start Date: 08/Apr/20 23:27
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #11052: [BEAM-9446] Add 
missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#issuecomment-611244971
 
 
   This PR is obsolete since #11189 is merged.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 418973)
Time Spent: 5h 50m  (was: 5h 40m)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
> Fix For: 2.21.0
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=418974&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-418974
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 08/Apr/20 23:27
Start Date: 08/Apr/20 23:27
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #11052: [BEAM-9446] Add 
missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 418974)
Time Spent: 6h  (was: 5h 50m)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
> Fix For: 2.21.0
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=412555&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-412555
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 30/Mar/20 20:33
Start Date: 30/Mar/20 20:33
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #11189: [BEAM-9446] 
Retain unknown arguments when using uber jar job server.
URL: https://github.com/apache/beam/pull/11189#discussion_r400476144
 
 

 ##
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##
 @@ -285,10 +289,29 @@ def get_all_options(
   cls._add_argparse_args(parser)  # pylint: disable=protected-access
 if add_extra_args_fn:
   add_extra_args_fn(parser)
+
 known_args, unknown_args = parser.parse_known_args(self._flags)
-if unknown_args:
-  _LOGGER.warning("Discarding unparseable args: %s", unknown_args)
-result = vars(known_args)
+if retain_unknown_options:
+  i = 0
+  while i < len(unknown_args):
+# Treat all unary flags as booleans, and all binary argument values as
+# strings.
+if i + 1 >= len(unknown_args) or unknown_args[i + 1].startswith('-'):
+  split = unknown_args[i].rsplit('=')
 
 Review comment:
   I was only referring to options values containing `=` which can be the case 
independently of append options. That is fixed now. So all good.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 412555)
Time Spent: 5h 40m  (was: 5.5h)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
> Fix For: 2.21.0
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=412528&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-412528
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 30/Mar/20 19:43
Start Date: 30/Mar/20 19:43
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #11189: [BEAM-9446] 
Retain unknown arguments when using uber jar job server.
URL: https://github.com/apache/beam/pull/11189
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 412528)
Time Spent: 5.5h  (was: 5h 20m)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
> Fix For: 2.21.0
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=412527&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-412527
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 30/Mar/20 19:43
Start Date: 30/Mar/20 19:43
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #11189: [BEAM-9446] 
Retain unknown arguments when using uber jar job server.
URL: https://github.com/apache/beam/pull/11189#discussion_r400448482
 
 

 ##
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##
 @@ -285,10 +289,29 @@ def get_all_options(
   cls._add_argparse_args(parser)  # pylint: disable=protected-access
 if add_extra_args_fn:
   add_extra_args_fn(parser)
+
 known_args, unknown_args = parser.parse_known_args(self._flags)
-if unknown_args:
-  _LOGGER.warning("Discarding unparseable args: %s", unknown_args)
-result = vars(known_args)
+if retain_unknown_options:
+  i = 0
+  while i < len(unknown_args):
+# Treat all unary flags as booleans, and all binary argument values as
+# strings.
+if i + 1 >= len(unknown_args) or unknown_args[i + 1].startswith('-'):
+  split = unknown_args[i].rsplit('=')
 
 Review comment:
   I will merge this now. If we need to support `append` arguments for some 
reason, we should be able to add that later.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 412527)
Time Spent: 5h 20m  (was: 5h 10m)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
> Fix For: 2.21.0
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=411589&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-411589
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 28/Mar/20 01:54
Start Date: 28/Mar/20 01:54
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #11189: [BEAM-9446] 
Retain unknown arguments when using uber jar job server.
URL: https://github.com/apache/beam/pull/11189#discussion_r399604848
 
 

 ##
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##
 @@ -285,10 +289,29 @@ def get_all_options(
   cls._add_argparse_args(parser)  # pylint: disable=protected-access
 if add_extra_args_fn:
   add_extra_args_fn(parser)
+
 known_args, unknown_args = parser.parse_known_args(self._flags)
-if unknown_args:
-  _LOGGER.warning("Discarding unparseable args: %s", unknown_args)
-result = vars(known_args)
+if retain_unknown_options:
+  i = 0
+  while i < len(unknown_args):
+# Treat all unary flags as booleans, and all binary argument values as
+# strings.
+if i + 1 >= len(unknown_args) or unknown_args[i + 1].startswith('-'):
+  split = unknown_args[i].rsplit('=')
 
 Review comment:
   Oh, I see what you mean now. I will commit your suggestion to avoid 
confusion. However, I don't think we should support unrecognized `append` 
arguments. I don't want to have to infer complex argument types. As for the 
specific case of `experiments`, that will not be a problem because 
`experiments` is already recognized by the parser.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 411589)
Time Spent: 5h 10m  (was: 5h)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
> Fix For: 2.21.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=411380&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-411380
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 27/Mar/20 21:07
Start Date: 27/Mar/20 21:07
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #11189: [BEAM-9446] 
Retain unknown arguments when using uber jar job server.
URL: https://github.com/apache/beam/pull/11189#discussion_r399538469
 
 

 ##
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##
 @@ -285,10 +289,29 @@ def get_all_options(
   cls._add_argparse_args(parser)  # pylint: disable=protected-access
 if add_extra_args_fn:
   add_extra_args_fn(parser)
+
 known_args, unknown_args = parser.parse_known_args(self._flags)
-if unknown_args:
-  _LOGGER.warning("Discarding unparseable args: %s", unknown_args)
-result = vars(known_args)
+if retain_unknown_options:
+  i = 0
+  while i < len(unknown_args):
+# Treat all unary flags as booleans, and all binary argument values as
+# strings.
+if i + 1 >= len(unknown_args) or unknown_args[i + 1].startswith('-'):
+  split = unknown_args[i].rsplit('=')
 
 Review comment:
   Comment still applies. Splitting should be performed from the left, not the 
right. At most one split has to be performed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 411380)
Time Spent: 5h  (was: 4h 50m)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
> Fix For: 2.21.0
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=411182&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-411182
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 27/Mar/20 16:52
Start Date: 27/Mar/20 16:52
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #11189: [BEAM-9446] 
Retain unknown arguments when using uber jar job server.
URL: https://github.com/apache/beam/pull/11189#discussion_r399405064
 
 

 ##
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##
 @@ -285,10 +289,29 @@ def get_all_options(
   cls._add_argparse_args(parser)  # pylint: disable=protected-access
 if add_extra_args_fn:
   add_extra_args_fn(parser)
+
 known_args, unknown_args = parser.parse_known_args(self._flags)
-if unknown_args:
-  _LOGGER.warning("Discarding unparseable args: %s", unknown_args)
-result = vars(known_args)
+if retain_unknown_options:
+  i = 0
+  while i < len(unknown_args):
+# Treat all unary flags as booleans, and all binary argument values as
+# strings.
+if i + 1 >= len(unknown_args) or unknown_args[i + 1].startswith('-'):
+  split = unknown_args[i].rsplit('=')
 
 Review comment:
   The split is only used to get the argument name, and the rest of the split 
isn't used.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 411182)
Time Spent: 4h 50m  (was: 4h 40m)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
> Fix For: 2.21.0
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=410918&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410918
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 27/Mar/20 08:43
Start Date: 27/Mar/20 08:43
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #11189: [BEAM-9446] 
Retain unknown arguments when using uber jar job server.
URL: https://github.com/apache/beam/pull/11189#discussion_r399107521
 
 

 ##
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##
 @@ -285,10 +289,29 @@ def get_all_options(
   cls._add_argparse_args(parser)  # pylint: disable=protected-access
 if add_extra_args_fn:
   add_extra_args_fn(parser)
+
 known_args, unknown_args = parser.parse_known_args(self._flags)
-if unknown_args:
-  _LOGGER.warning("Discarding unparseable args: %s", unknown_args)
-result = vars(known_args)
+if retain_unknown_options:
+  i = 0
+  while i < len(unknown_args):
+# Treat all unary flags as booleans, and all binary argument values as
+# strings.
+if i + 1 >= len(unknown_args) or unknown_args[i + 1].startswith('-'):
+  split = unknown_args[i].rsplit('=')
 
 Review comment:
   Why `rsplit`? Shouldn't this be: 
   ```suggestion
 split = unknown_args[i].split('=', 1)
   ```
   
   ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 410918)
Time Spent: 4h 40m  (was: 4.5h)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
> Fix For: 2.21.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=410917&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410917
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 27/Mar/20 08:43
Start Date: 27/Mar/20 08:43
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #11189: [BEAM-9446] 
Retain unknown arguments when using uber jar job server.
URL: https://github.com/apache/beam/pull/11189#discussion_r399108204
 
 

 ##
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##
 @@ -285,10 +289,29 @@ def get_all_options(
   cls._add_argparse_args(parser)  # pylint: disable=protected-access
 if add_extra_args_fn:
   add_extra_args_fn(parser)
+
 known_args, unknown_args = parser.parse_known_args(self._flags)
-if unknown_args:
-  _LOGGER.warning("Discarding unparseable args: %s", unknown_args)
-result = vars(known_args)
+if retain_unknown_options:
+  i = 0
+  while i < len(unknown_args):
+# Treat all unary flags as booleans, and all binary argument values as
+# strings.
+if i + 1 >= len(unknown_args) or unknown_args[i + 1].startswith('-'):
+  split = unknown_args[i].rsplit('=')
 
 Review comment:
   Otherwise this will break options like `experiments=state_cache_size=1`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 410917)
Time Spent: 4h 40m  (was: 4.5h)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
> Fix For: 2.21.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=408925&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-408925
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 24/Mar/20 16:59
Start Date: 24/Mar/20 16:59
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #11189: [BEAM-9446] 
Retain unknown arguments when using uber jar job server.
URL: https://github.com/apache/beam/pull/11189#discussion_r397313859
 
 

 ##
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##
 @@ -285,10 +289,25 @@ def get_all_options(
   cls._add_argparse_args(parser)  # pylint: disable=protected-access
 if add_extra_args_fn:
   add_extra_args_fn(parser)
+
 known_args, unknown_args = parser.parse_known_args(self._flags)
-if unknown_args:
-  _LOGGER.warning("Discarding unparseable args: %s", unknown_args)
-result = vars(known_args)
+if retain_unknown_options:
+  i = 0
+  while i < len(unknown_args):
+# Treat all unary flags as booleans, and all binary argument values as
+# strings.
+if i + 1 >= len(unknown_args) or unknown_args[i + 1].startswith('--'):
 
 Review comment:
   That sounds good :)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 408925)
Time Spent: 4.5h  (was: 4h 20m)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
> Fix For: 2.21.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=408923&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-408923
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 24/Mar/20 16:57
Start Date: 24/Mar/20 16:57
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #11189: [BEAM-9446] Retain 
unknown arguments when using uber jar job server.
URL: https://github.com/apache/beam/pull/11189#issuecomment-603368311
 
 
   Thanks for filing the bug. Unfortunately we have some post commits failing 
after adding Flink 1.10.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 408923)
Time Spent: 4h 20m  (was: 4h 10m)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
> Fix For: 2.21.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=408193&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-408193
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 23/Mar/20 19:27
Start Date: 23/Mar/20 19:27
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #11189: [BEAM-9446] Retain 
unknown arguments when using uber jar job server.
URL: https://github.com/apache/beam/pull/11189#issuecomment-602808467
 
 
   @mxm thanks for the feedback, I addressed your comments.
   
   Unrelated: I wasn't aware that the Flink portable jar postcommit has been 
failing. I filed and self-assigned BEAM-9575.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 408193)
Time Spent: 4h 10m  (was: 4h)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
> Fix For: 2.21.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=408153&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-408153
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 23/Mar/20 18:56
Start Date: 23/Mar/20 18:56
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #11189: [BEAM-9446] 
Retain unknown arguments when using uber jar job server.
URL: https://github.com/apache/beam/pull/11189#discussion_r396684486
 
 

 ##
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##
 @@ -285,10 +289,25 @@ def get_all_options(
   cls._add_argparse_args(parser)  # pylint: disable=protected-access
 if add_extra_args_fn:
   add_extra_args_fn(parser)
+
 known_args, unknown_args = parser.parse_known_args(self._flags)
-if unknown_args:
-  _LOGGER.warning("Discarding unparseable args: %s", unknown_args)
-result = vars(known_args)
+if retain_unknown_options:
+  i = 0
+  while i < len(unknown_args):
+# Treat all unary flags as booleans, and all binary argument values as
+# strings.
+if i + 1 >= len(unknown_args) or unknown_args[i + 1].startswith('--'):
 
 Review comment:
   I guess I should change it to `-`, not `--`. `-` is the default prefix for 
argparse, and AFAICT that is true for Beam as well, even though most (all?) 
pipeline options use `--`.
   https://docs.python.org/3/library/argparse.html#prefix-chars
   
   So then the question becomes "Should we always check a parameter name starts 
with `-`?" And the answer is we don't have to, because the call to parse_args 
will do that for us. `test_retain_unknown_options_unary_missing_prefix` tests 
that.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 408153)
Time Spent: 4h  (was: 3h 50m)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
> Fix For: 2.21.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=408139&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-408139
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 23/Mar/20 18:44
Start Date: 23/Mar/20 18:44
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #11189: [BEAM-9446] 
Retain unknown arguments when using uber jar job server.
URL: https://github.com/apache/beam/pull/11189#discussion_r396677870
 
 

 ##
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##
 @@ -285,10 +289,25 @@ def get_all_options(
   cls._add_argparse_args(parser)  # pylint: disable=protected-access
 if add_extra_args_fn:
   add_extra_args_fn(parser)
+
 known_args, unknown_args = parser.parse_known_args(self._flags)
-if unknown_args:
-  _LOGGER.warning("Discarding unparseable args: %s", unknown_args)
-result = vars(known_args)
+if retain_unknown_options:
+  i = 0
+  while i < len(unknown_args):
+# Treat all unary flags as booleans, and all binary argument values as
+# strings.
+if i + 1 >= len(unknown_args) or unknown_args[i + 1].startswith('--'):
+  parser.add_argument(unknown_args[i], action='store_true')
+  i += 1
+else:
+  parser.add_argument(unknown_args[i], type=str)
+  i += 2
 
 Review comment:
   Good catch. I will have to add additional logic to handle that.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 408139)
Time Spent: 3h 50m  (was: 3h 40m)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
> Fix For: 2.21.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=407791&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-407791
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 23/Mar/20 09:29
Start Date: 23/Mar/20 09:29
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #11189: [BEAM-9446] 
Retain unknown arguments when using uber jar job server.
URL: https://github.com/apache/beam/pull/11189#discussion_r396309448
 
 

 ##
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##
 @@ -285,10 +289,25 @@ def get_all_options(
   cls._add_argparse_args(parser)  # pylint: disable=protected-access
 if add_extra_args_fn:
   add_extra_args_fn(parser)
+
 known_args, unknown_args = parser.parse_known_args(self._flags)
-if unknown_args:
-  _LOGGER.warning("Discarding unparseable args: %s", unknown_args)
-result = vars(known_args)
+if retain_unknown_options:
+  i = 0
+  while i < len(unknown_args):
+# Treat all unary flags as booleans, and all binary argument values as
+# strings.
+if i + 1 >= len(unknown_args) or unknown_args[i + 1].startswith('--'):
+  parser.add_argument(unknown_args[i], action='store_true')
+  i += 1
+else:
+  parser.add_argument(unknown_args[i], type=str)
+  i += 2
 
 Review comment:
   What about the `--arg=value` format?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 407791)
Time Spent: 3h 40m  (was: 3.5h)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
> Fix For: 2.21.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=407790&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-407790
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 23/Mar/20 09:29
Start Date: 23/Mar/20 09:29
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #11189: [BEAM-9446] 
Retain unknown arguments when using uber jar job server.
URL: https://github.com/apache/beam/pull/11189#discussion_r396310204
 
 

 ##
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##
 @@ -285,10 +289,25 @@ def get_all_options(
   cls._add_argparse_args(parser)  # pylint: disable=protected-access
 if add_extra_args_fn:
   add_extra_args_fn(parser)
+
 known_args, unknown_args = parser.parse_known_args(self._flags)
-if unknown_args:
-  _LOGGER.warning("Discarding unparseable args: %s", unknown_args)
-result = vars(known_args)
+if retain_unknown_options:
+  i = 0
+  while i < len(unknown_args):
+# Treat all unary flags as booleans, and all binary argument values as
+# strings.
+if i + 1 >= len(unknown_args) or unknown_args[i + 1].startswith('--'):
 
 Review comment:
   Should we always check a parameter name starts with `--`, like done here for 
boolean flags?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 407790)
Time Spent: 3.5h  (was: 3h 20m)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
> Fix For: 2.21.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=407335&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-407335
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 21/Mar/20 00:08
Start Date: 21/Mar/20 00:08
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #11189: [BEAM-9446] Retain 
unknown arguments when using uber jar job server.
URL: https://github.com/apache/beam/pull/11189#issuecomment-601960942
 
 
   Run PortableJar_Flink PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 407335)
Time Spent: 3h 20m  (was: 3h 10m)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
> Fix For: 2.21.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=407331&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-407331
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 20/Mar/20 23:50
Start Date: 20/Mar/20 23:50
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #11189: [BEAM-9446] 
Retain unknown arguments when using uber jar job server.
URL: https://github.com/apache/beam/pull/11189
 
 
   See #11052 for context.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStream

[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=407133&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-407133
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 20/Mar/20 17:12
Start Date: 20/Mar/20 17:12
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #11052: [BEAM-9446] Add 
missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#issuecomment-601814129
 
 
   > @ibzib Could we try to keep the Runner options in case we are not reading 
them from the job server? We wouldn't have to manually add the options then.
   
   Yeah, that's the plan. Just haven't gotten around to it yet. Looking now.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 407133)
Time Spent: 3h  (was: 2h 50m)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
> Fix For: 2.21.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=407100&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-407100
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 20/Mar/20 15:46
Start Date: 20/Mar/20 15:46
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #11052: [BEAM-9446] Add missing 
parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#issuecomment-601768565
 
 
   @ibzib Could we try to keep the Runner options in case we are not reading 
them from the job server? We wouldn't have to manually add the options then.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 407100)
Time Spent: 2h 50m  (was: 2h 40m)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
> Fix For: 2.21.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=406452&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-406452
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 19/Mar/20 18:04
Start Date: 19/Mar/20 18:04
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11052: [BEAM-9446] 
Add missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#discussion_r395220392
 
 

 ##
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##
 @@ -1075,6 +1075,22 @@ def _add_argparse_args(cls, parser):
 ' directly, rather than starting up a job server.'
 ' Only applies when flink_master is set to a'
 ' cluster address.  Requires Python 3.6+.')
+parser.add_argument(
+'--parallelism',
+default=-1,
+type=int,
+help='The degree of parallelism to be used when distributing '
+ 'operations onto workers. If the parallelism is not set, the '
+ 'configured Flink default is used, or 1 if none can be found.'
+)
+parser.add_argument(
+'--execution_mode_for_batch',
+default='PIPELINED',
+help='Flink mode for data exchange of batch pipelines. '
 
 Review comment:
   While experiments are preferred, I think we can document options such as 
this as experimental as well. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 406452)
Time Spent: 2h 40m  (was: 2.5h)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=402995&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-402995
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 13/Mar/20 17:14
Start Date: 13/Mar/20 17:14
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #11052: [BEAM-9446] Add 
missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#issuecomment-598827334
 
 
   > Integer options need to be converted to string and that is done in the 
portable runner:
   
   All the unknown options will be strings already, so that shouldn't be needed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 402995)
Time Spent: 2.5h  (was: 2h 20m)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=402991&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-402991
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 13/Mar/20 17:10
Start Date: 13/Mar/20 17:10
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #11052: [BEAM-9446] Add 
missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#issuecomment-598825634
 
 
   Integer options need to be converted to string and that is done in the 
portable runner: 
https://github.com/apache/beam/blob/6fce2528b0ff011ff3d416ac9e70c8d7264b1f3c/sdks/python/apache_beam/runners/portability/portable_runner.py#L162
 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 402991)
Time Spent: 2h 20m  (was: 2h 10m)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=402963&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-402963
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 13/Mar/20 16:20
Start Date: 13/Mar/20 16:20
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #11052: [BEAM-9446] Add 
missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#discussion_r392331379
 
 

 ##
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##
 @@ -1075,6 +1075,22 @@ def _add_argparse_args(cls, parser):
 ' directly, rather than starting up a job server.'
 ' Only applies when flink_master is set to a'
 ' cluster address.  Requires Python 3.6+.')
+parser.add_argument(
+'--parallelism',
+default=-1,
+type=int,
+help='The degree of parallelism to be used when distributing '
+ 'operations onto workers. If the parallelism is not set, the '
+ 'configured Flink default is used, or 1 if none can be found.'
+)
+parser.add_argument(
+'--execution_mode_for_batch',
 
 Review comment:
   > I think we should continue to discard invalid options when the client is 
aware of the full set of options and can perform the validation
   
   Agreed, we would keep options only when we can't be sure (so just the uber 
jar job server for now).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 402963)
Time Spent: 2h 10m  (was: 2h)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=402957&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-402957
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 13/Mar/20 16:12
Start Date: 13/Mar/20 16:12
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #11052: [BEAM-9446] 
Add missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#discussion_r392326569
 
 

 ##
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##
 @@ -1075,6 +1075,22 @@ def _add_argparse_args(cls, parser):
 ' directly, rather than starting up a job server.'
 ' Only applies when flink_master is set to a'
 ' cluster address.  Requires Python 3.6+.')
+parser.add_argument(
+'--parallelism',
+default=-1,
+type=int,
+help='The degree of parallelism to be used when distributing '
+ 'operations onto workers. If the parallelism is not set, the '
+ 'configured Flink default is used, or 1 if none can be found.'
+)
+parser.add_argument(
+'--execution_mode_for_batch',
 
 Review comment:
   I think we should continue to discard invalid options when the client is 
aware of the full set of options and can perform the validation - as is the 
case with portable runner and job server.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 402957)
Time Spent: 2h  (was: 1h 50m)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=402514&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-402514
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 12/Mar/20 21:53
Start Date: 12/Mar/20 21:53
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #11052: [BEAM-9446] Add missing 
parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#issuecomment-598434409
 
 
   What I meant is keeping "unknown" pipeline options instead of discarding 
them. If we pass the unknown ones as strings I thought this could work. For 
example, if we pass `"123"` the parser on the Java side is smart enough to 
figure it is the integer `123`. I haven't tried this out but I'd suggest we do.
   
   Of course this is not very user-friendly. That's why I'd suggest to write 
pipeline options in a language-agnostic format which can be read by all SDKs.
   
   Considering the Flink pipeline options, what we usually want to set is what 
is called `ExecutionConfig` in Flink. With Flink they are configured directly 
in the API. For Beam if the parameters are just strings/ints we could come up 
with something reflection based to set arbitrary new settings. I agree this 
would be helpful.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 402514)
Time Spent: 1h 50m  (was: 1h 40m)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=401057&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-401057
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 10/Mar/20 23:55
Start Date: 10/Mar/20 23:55
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #11052: [BEAM-9446] Add 
missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#issuecomment-597375999
 
 
   > Have you tried working around this by not discarding these options? AFAIK 
the json parser is smart enough to read the stringified verison of all option 
values.
   
   I think this may be the best strategy for the uber jar job server, however I 
don't think we should change this behavior for other runners. (Not sure if 
that's what you were proposing, just organizing my thoughts here:)
   - In Dataflow, we seem to duplicate every runner option for each SDK, 
perhaps because there is no better choice due to the runner architecture. In 
that case, since all the args are presumably known by the SDK, it makes more 
sense to drop them (status quo) or maybe even error when arguments are unknown, 
because it usually means the user made a mistake.
   - With the "old" Flink job server, retrieving args from the job server is an 
adequate workaround, so again, there should be no need for unrecognized 
arguments.
   
   I discussed this with @angoenka today and he suggested that we consider the 
runner-Flink boundary as well -- i.e., if we should have some way of enabling 
_all_ Flink environment options to be set through Beam pipeline options instead 
of just adding the ones we need as we go. This would potentially save users 
from having to wait for a new release just for us to add a pipeline option that 
trivially maps 1:1 to Flink (of course, they can always change Flink's conf 
files, which was going to be my proposed workaround here, but AFAIK that 
requires a restart of the cluster and would affect all jobs run on the 
cluster). WDYT?
   
   +cc @tweise 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 401057)
Time Spent: 1h 40m  (was: 1.5h)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=400073&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400073
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 09/Mar/20 12:16
Start Date: 09/Mar/20 12:16
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #11052: [BEAM-9446] Add missing 
parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#issuecomment-596491709
 
 
   Have you tried working around this by not discarding these options? AFAIK 
the json parser is smart enough to read the stringified verison of all option 
values.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400073)
Time Spent: 1.5h  (was: 1h 20m)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=399400&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-399400
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 06/Mar/20 22:22
Start Date: 06/Mar/20 22:22
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #11052: [BEAM-9446] Add 
missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#issuecomment-595990099
 
 
   > I'm OK with this as a short-term fix, but we should still pursue fixing 
this properly as this is not scalable.
   
   @robertwb for a short term fix, I think we can get by with setting these 
options in the Flink cluster configuration file instead.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 399400)
Time Spent: 1h 20m  (was: 1h 10m)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=399333&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-399333
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 06/Mar/20 19:51
Start Date: 06/Mar/20 19:51
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #11052: [BEAM-9446] Add 
missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#issuecomment-595937673
 
 
   I'm OK with this as a short-term fix, but we should still pursue fixing this 
properly as this is not scalable. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 399333)
Time Spent: 1h 10m  (was: 1h)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=398731&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398731
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 05/Mar/20 22:24
Start Date: 05/Mar/20 22:24
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #11052: [BEAM-9446] Add 
missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#discussion_r388602678
 
 

 ##
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##
 @@ -1075,6 +1075,22 @@ def _add_argparse_args(cls, parser):
 ' directly, rather than starting up a job server.'
 ' Only applies when flink_master is set to a'
 ' cluster address.  Requires Python 3.6+.')
+parser.add_argument(
+'--parallelism',
+default=-1,
+type=int,
+help='The degree of parallelism to be used when distributing '
+ 'operations onto workers. If the parallelism is not set, the '
+ 'configured Flink default is used, or 1 if none can be found.'
+)
+parser.add_argument(
+'--execution_mode_for_batch',
 
 Review comment:
   I agree, though as discussed earlier we might have difficulties parsing 
non-string options. I'll try it and see how it goes.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398731)
Time Spent: 1h  (was: 50m)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=398730&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398730
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 05/Mar/20 22:22
Start Date: 05/Mar/20 22:22
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #11052: [BEAM-9446] Add 
missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#discussion_r388601783
 
 

 ##
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##
 @@ -1075,6 +1075,22 @@ def _add_argparse_args(cls, parser):
 ' directly, rather than starting up a job server.'
 ' Only applies when flink_master is set to a'
 ' cluster address.  Requires Python 3.6+.')
+parser.add_argument(
+'--parallelism',
+default=-1,
+type=int,
+help='The degree of parallelism to be used when distributing '
+ 'operations onto workers. If the parallelism is not set, the '
+ 'configured Flink default is used, or 1 if none can be found.'
+)
+parser.add_argument(
+'--execution_mode_for_batch',
+default='PIPELINED',
+help='Flink mode for data exchange of batch pipelines. '
 
 Review comment:
   I think that's what experiment(s) are for: 
https://github.com/apache/beam/blob/35beffc5775636eb96e33eb57c6e5f213cfe033a/sdks/python/apache_beam/options/pipeline_options.py#L803-L811
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398730)
Time Spent: 50m  (was: 40m)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=398401&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398401
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 05/Mar/20 14:28
Start Date: 05/Mar/20 14:28
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #11052: [BEAM-9446] Add 
missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#discussion_r388327586
 
 

 ##
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##
 @@ -1075,6 +1075,22 @@ def _add_argparse_args(cls, parser):
 ' directly, rather than starting up a job server.'
 ' Only applies when flink_master is set to a'
 ' cluster address.  Requires Python 3.6+.')
+parser.add_argument(
+'--parallelism',
+default=-1,
+type=int,
+help='The degree of parallelism to be used when distributing '
+ 'operations onto workers. If the parallelism is not set, the '
+ 'configured Flink default is used, or 1 if none can be found.'
+)
+parser.add_argument(
+'--execution_mode_for_batch',
 
 Review comment:
   I'm not sure we should add these here because that's what we used to do and 
it was inconsistent and hard to maintain. What we want to do, is to not discard 
those options but still warn about them not being parsed in the Python SDK. 
This will allow us to still use them in the Runner code.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398401)
Time Spent: 40m  (was: 0.5h)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=398323&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398323
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 05/Mar/20 11:21
Start Date: 05/Mar/20 11:21
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #11052: [BEAM-9446] 
Add missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#discussion_r388232005
 
 

 ##
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##
 @@ -1075,6 +1075,22 @@ def _add_argparse_args(cls, parser):
 ' directly, rather than starting up a job server.'
 ' Only applies when flink_master is set to a'
 ' cluster address.  Requires Python 3.6+.')
+parser.add_argument(
+'--parallelism',
+default=-1,
+type=int,
+help='The degree of parallelism to be used when distributing '
+ 'operations onto workers. If the parallelism is not set, the '
+ 'configured Flink default is used, or 1 if none can be found.'
+)
+parser.add_argument(
+'--execution_mode_for_batch',
+default='PIPELINED',
+help='Flink mode for data exchange of batch pipelines. '
 
 Review comment:
   (slightly unrelated to the PR comment) Do we have a way to mark 
pipelineoptions as `@Experimental` in Python? PipelineOptions are critical from 
the point of view of backwards compatibility, so we should probably be marking 
non stable options (if we do not).
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398323)
Time Spent: 0.5h  (was: 20m)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=398059&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398059
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 05/Mar/20 01:30
Start Date: 05/Mar/20 01:30
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #11052: [BEAM-9446] Add 
missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#issuecomment-594980292
 
 
   If I understand correctly, this is needed because we don't go though job api 
for submission. 
   In job api based submission, runner options are automatically pulled from 
the runner?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398059)
Time Spent: 20m  (was: 10m)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=398053&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398053
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 05/Mar/20 01:18
Start Date: 05/Mar/20 01:18
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #11052: [BEAM-9446] Add 
missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052
 
 
   I'm not thrilled about manually copying these over, later I might look into 
a long-term solution to this problem. But this works for now.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompleted