[jira] [Work logged] (BEAM-3819) Add withLimit() option to KinesisIO
[ https://issues.apache.org/jira/browse/BEAM-3819?focusedWorklogId=85296&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-85296 ] ASF GitHub Bot logged work on BEAM-3819: Author: ASF GitHub Bot Created on: 28/Mar/18 15:38 Start Date: 28/Mar/18 15:38 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on issue #4851: [BEAM-3819] Add withLimit() option to KinesisIO URL: https://github.com/apache/beam/pull/4851#issuecomment-376932424 @iemejia Thank you for review, I agree that it's not just a tuning knob but rather a parameter which can be required to run a pipeline and do not overload backend Kinesis. Talking about the name of this method, I'd choose the name suggested by @pawel-kaczmarczyk - I think it well explains what actually it does. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 85296) Time Spent: 2h 10m (was: 2h) > Add withLimit() option to KinesisIO > --- > > Key: BEAM-3819 > URL: https://issues.apache.org/jira/browse/BEAM-3819 > Project: Beam > Issue Type: Improvement > Components: io-java-kinesis >Reporter: Jean-Baptiste Onofré >Assignee: Alexey Romanenko >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > In some cases, the user might need to set the {{limit}} on the > {{SimplifiedKinesisClient}}, especially for performance reason, depending of > the number of records. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3819) Add withLimit() option to KinesisIO
[ https://issues.apache.org/jira/browse/BEAM-3819?focusedWorklogId=85266&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-85266 ] ASF GitHub Bot logged work on BEAM-3819: Author: ASF GitHub Bot Created on: 28/Mar/18 13:33 Start Date: 28/Mar/18 13:33 Worklog Time Spent: 10m Work Description: jbonofre commented on issue #4851: [BEAM-3819] Add withLimit() option to KinesisIO URL: https://github.com/apache/beam/pull/4851#issuecomment-376888197 Fair enough, +1 with @iemejia. Just added a reminder about what we should "expose" or not for end users. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 85266) Time Spent: 2h (was: 1h 50m) > Add withLimit() option to KinesisIO > --- > > Key: BEAM-3819 > URL: https://issues.apache.org/jira/browse/BEAM-3819 > Project: Beam > Issue Type: Improvement > Components: io-java-kinesis >Reporter: Jean-Baptiste Onofré >Assignee: Alexey Romanenko >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > In some cases, the user might need to set the {{limit}} on the > {{SimplifiedKinesisClient}}, especially for performance reason, depending of > the number of records. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3819) Add withLimit() option to KinesisIO
[ https://issues.apache.org/jira/browse/BEAM-3819?focusedWorklogId=85265&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-85265 ] ASF GitHub Bot logged work on BEAM-3819: Author: ASF GitHub Bot Created on: 28/Mar/18 13:31 Start Date: 28/Mar/18 13:31 Worklog Time Spent: 10m Work Description: iemejia commented on issue #4851: [BEAM-3819] Add withLimit() option to KinesisIO URL: https://github.com/apache/beam/pull/4851#issuecomment-376887515 Hi, I have forgotten to take a look on this one My excuses Alexey. I think that this 'knob' is quite useful (and that it can even be required) because Kinesis can be overwhelmed if the shards are not able to keep with the readers so rate-limiting make sense. Remember the limits of kinesis are let's put it mildly 'restrictive' https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html However I am not sure the suggested approach will work considering that we cannot know in advance how many workers are actively reading from each shard to have an accurate level of precision (the total). So I would suggest probably to let it like it is and let this calculation to the user because we cannot do it precisely from each individual source/reader. I agree with @pawel-kaczmarczyk that the name is not that right, maybe `withMaxRecordsPerRequest`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 85265) Time Spent: 1h 50m (was: 1h 40m) > Add withLimit() option to KinesisIO > --- > > Key: BEAM-3819 > URL: https://issues.apache.org/jira/browse/BEAM-3819 > Project: Beam > Issue Type: Improvement > Components: io-java-kinesis >Reporter: Jean-Baptiste Onofré >Assignee: Alexey Romanenko >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > In some cases, the user might need to set the {{limit}} on the > {{SimplifiedKinesisClient}}, especially for performance reason, depending of > the number of records. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3819) Add withLimit() option to KinesisIO
[ https://issues.apache.org/jira/browse/BEAM-3819?focusedWorklogId=81881&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81881 ] ASF GitHub Bot logged work on BEAM-3819: Author: ASF GitHub Bot Created on: 19/Mar/18 15:25 Start Date: 19/Mar/18 15:25 Worklog Time Spent: 10m Work Description: jbonofre commented on issue #4851: [BEAM-3819] Add withLimit() option to KinesisIO URL: https://github.com/apache/beam/pull/4851#issuecomment-374251880 Agree that this approach would be better. Agree as well to close the Jira and PR to submit a new one. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 81881) Time Spent: 1h 40m (was: 1.5h) > Add withLimit() option to KinesisIO > --- > > Key: BEAM-3819 > URL: https://issues.apache.org/jira/browse/BEAM-3819 > Project: Beam > Issue Type: Improvement > Components: io-java-kinesis >Reporter: Jean-Baptiste Onofré >Assignee: Alexey Romanenko >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > In some cases, the user might need to set the {{limit}} on the > {{SimplifiedKinesisClient}}, especially for performance reason, depending of > the number of records. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3819) Add withLimit() option to KinesisIO
[ https://issues.apache.org/jira/browse/BEAM-3819?focusedWorklogId=81879&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81879 ] ASF GitHub Bot logged work on BEAM-3819: Author: ASF GitHub Bot Created on: 19/Mar/18 15:22 Start Date: 19/Mar/18 15:22 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on issue #4851: [BEAM-3819] Add withLimit() option to KinesisIO URL: https://github.com/apache/beam/pull/4851#issuecomment-373664555 @jbonofre Potentially, we can collect some statistics of how many records were read and what is total data size. Then, we can calculate the average size of record and set this `limit` accordingly. From time to time we have to recalculate the average size and adjust new `limit` value if this size has been changed significantly. But in this case, we need to close old jira and open new one and create new PR because originally, if I understand correctly, the idea was to allow user to configure this parameter by itself. WDYT? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 81879) Time Spent: 1.5h (was: 1h 20m) > Add withLimit() option to KinesisIO > --- > > Key: BEAM-3819 > URL: https://issues.apache.org/jira/browse/BEAM-3819 > Project: Beam > Issue Type: Improvement > Components: io-java-kinesis >Reporter: Jean-Baptiste Onofré >Assignee: Alexey Romanenko >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > In some cases, the user might need to set the {{limit}} on the > {{SimplifiedKinesisClient}}, especially for performance reason, depending of > the number of records. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3819) Add withLimit() option to KinesisIO
[ https://issues.apache.org/jira/browse/BEAM-3819?focusedWorklogId=81127&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81127 ] ASF GitHub Bot logged work on BEAM-3819: Author: ASF GitHub Bot Created on: 16/Mar/18 10:08 Start Date: 16/Mar/18 10:08 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on issue #4851: [BEAM-3819] Add withLimit() option to KinesisIO URL: https://github.com/apache/beam/pull/4851#issuecomment-373664555 @jbonofre Potentially, we can collect some statistics of how many records were read and what is total data size. Then, we can calculate the average size of record and set this `limit` accordingly. From time to time we have to recalculate the average size and adjust new `limit` value if this size has beed changed significantly. But in this case, we need to close old jira and open new one and create new PR because originally, if I understand correctly, the idea was to allow user to configure this parameter by itself. WDYT? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 81127) Time Spent: 1h 20m (was: 1h 10m) > Add withLimit() option to KinesisIO > --- > > Key: BEAM-3819 > URL: https://issues.apache.org/jira/browse/BEAM-3819 > Project: Beam > Issue Type: Improvement > Components: io-java-kinesis >Reporter: Jean-Baptiste Onofré >Assignee: Alexey Romanenko >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > In some cases, the user might need to set the {{limit}} on the > {{SimplifiedKinesisClient}}, especially for performance reason, depending of > the number of records. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3819) Add withLimit() option to KinesisIO
[ https://issues.apache.org/jira/browse/BEAM-3819?focusedWorklogId=81126&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81126 ] ASF GitHub Bot logged work on BEAM-3819: Author: ASF GitHub Bot Created on: 16/Mar/18 10:05 Start Date: 16/Mar/18 10:05 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on issue #4851: [BEAM-3819] Add withLimit() option to KinesisIO URL: https://github.com/apache/beam/pull/4851#issuecomment-373664555 @jbonofre Potentially, we can collect some statistics of how many records were read and what is total data size. Then, we can calculate the average size of record and set this `limit` accordingly. From time to time we have to recalculate the average size and adjust new `limit` value if it's needed. But in this case, we need to close old jira and open new one because originally, if I understand correctly, the idea was to allow user to configure this parameter by itself. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 81126) Time Spent: 1h 10m (was: 1h) > Add withLimit() option to KinesisIO > --- > > Key: BEAM-3819 > URL: https://issues.apache.org/jira/browse/BEAM-3819 > Project: Beam > Issue Type: Improvement > Components: io-java-kinesis >Reporter: Jean-Baptiste Onofré >Assignee: Alexey Romanenko >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > In some cases, the user might need to set the {{limit}} on the > {{SimplifiedKinesisClient}}, especially for performance reason, depending of > the number of records. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3819) Add withLimit() option to KinesisIO
[ https://issues.apache.org/jira/browse/BEAM-3819?focusedWorklogId=81123&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81123 ] ASF GitHub Bot logged work on BEAM-3819: Author: ASF GitHub Bot Created on: 16/Mar/18 09:57 Start Date: 16/Mar/18 09:57 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on issue #4851: [BEAM-3819] Add withLimit() option to KinesisIO URL: https://github.com/apache/beam/pull/4851#issuecomment-373662094 @pawel-kaczmarczyk agree, it sounds more meaningful. Do you think we have to change the name of method too, like `withRequestRecordsLimit` ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 81123) Time Spent: 1h (was: 50m) > Add withLimit() option to KinesisIO > --- > > Key: BEAM-3819 > URL: https://issues.apache.org/jira/browse/BEAM-3819 > Project: Beam > Issue Type: Improvement > Components: io-java-kinesis >Reporter: Jean-Baptiste Onofré >Assignee: Alexey Romanenko >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > In some cases, the user might need to set the {{limit}} on the > {{SimplifiedKinesisClient}}, especially for performance reason, depending of > the number of records. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3819) Add withLimit() option to KinesisIO
[ https://issues.apache.org/jira/browse/BEAM-3819?focusedWorklogId=80987&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80987 ] ASF GitHub Bot logged work on BEAM-3819: Author: ASF GitHub Bot Created on: 15/Mar/18 21:37 Start Date: 15/Mar/18 21:37 Worklog Time Spent: 10m Work Description: jbonofre commented on issue #4851: [BEAM-3819] Add withLimit() option to KinesisIO URL: https://github.com/apache/beam/pull/4851#issuecomment-373531082 Generally speaking, we try to avoid to expose user configuration in the IO (see https://beam.apache.org/contribute/ptransform-style-guide/#what-parameters-to-expose). I wonder if it's worth to expose `limit` to the user as it's more an IO constraint to me. Thoughts ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 80987) Time Spent: 50m (was: 40m) > Add withLimit() option to KinesisIO > --- > > Key: BEAM-3819 > URL: https://issues.apache.org/jira/browse/BEAM-3819 > Project: Beam > Issue Type: Improvement > Components: io-java-kinesis >Reporter: Jean-Baptiste Onofré >Assignee: Alexey Romanenko >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > In some cases, the user might need to set the {{limit}} on the > {{SimplifiedKinesisClient}}, especially for performance reason, depending of > the number of records. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3819) Add withLimit() option to KinesisIO
[ https://issues.apache.org/jira/browse/BEAM-3819?focusedWorklogId=80985&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80985 ] ASF GitHub Bot logged work on BEAM-3819: Author: ASF GitHub Bot Created on: 15/Mar/18 21:29 Start Date: 15/Mar/18 21:29 Worklog Time Spent: 10m Work Description: pawel-kaczmarczyk commented on issue #4851: [BEAM-3819] Add withLimit() option to KinesisIO URL: https://github.com/apache/beam/pull/4851#issuecomment-373528960 I like the idea and the code looks good. I would only consider changing the property name as `limit` is not much informative. Maybe `requestRecordsLimit`? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 80985) Time Spent: 40m (was: 0.5h) > Add withLimit() option to KinesisIO > --- > > Key: BEAM-3819 > URL: https://issues.apache.org/jira/browse/BEAM-3819 > Project: Beam > Issue Type: Improvement > Components: io-java-kinesis >Reporter: Jean-Baptiste Onofré >Assignee: Alexey Romanenko >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > In some cases, the user might need to set the {{limit}} on the > {{SimplifiedKinesisClient}}, especially for performance reason, depending of > the number of records. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3819) Add withLimit() option to KinesisIO
[ https://issues.apache.org/jira/browse/BEAM-3819?focusedWorklogId=80852&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80852 ] ASF GitHub Bot logged work on BEAM-3819: Author: ASF GitHub Bot Created on: 15/Mar/18 14:35 Start Date: 15/Mar/18 14:35 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on issue #4851: [BEAM-3819] Add withLimit() option to KinesisIO URL: https://github.com/apache/beam/pull/4851#issuecomment-373397387 R: @jbonofre R: @iemejia This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 80852) Time Spent: 0.5h (was: 20m) > Add withLimit() option to KinesisIO > --- > > Key: BEAM-3819 > URL: https://issues.apache.org/jira/browse/BEAM-3819 > Project: Beam > Issue Type: Improvement > Components: io-java-kinesis >Reporter: Jean-Baptiste Onofré >Assignee: Alexey Romanenko >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > In some cases, the user might need to set the {{limit}} on the > {{SimplifiedKinesisClient}}, especially for performance reason, depending of > the number of records. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3819) Add withLimit() option to KinesisIO
[ https://issues.apache.org/jira/browse/BEAM-3819?focusedWorklogId=79549&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79549 ] ASF GitHub Bot logged work on BEAM-3819: Author: ASF GitHub Bot Created on: 12/Mar/18 18:01 Start Date: 12/Mar/18 18:01 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on issue #4851: [BEAM-3819] Add withLimit() option to KinesisIO URL: https://github.com/apache/beam/pull/4851#issuecomment-372406353 @jbonofre @pawel-kaczmarczyk @iemejia - please, take a look This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 79549) Time Spent: 20m (was: 10m) > Add withLimit() option to KinesisIO > --- > > Key: BEAM-3819 > URL: https://issues.apache.org/jira/browse/BEAM-3819 > Project: Beam > Issue Type: Improvement > Components: io-java-kinesis >Reporter: Jean-Baptiste Onofré >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > In some cases, the user might need to set the {{limit}} on the > {{SimplifiedKinesisClient}}, especially for performance reason, depending of > the number of records. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3819) Add withLimit() option to KinesisIO
[ https://issues.apache.org/jira/browse/BEAM-3819?focusedWorklogId=79547&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79547 ] ASF GitHub Bot logged work on BEAM-3819: Author: ASF GitHub Bot Created on: 12/Mar/18 17:58 Start Date: 12/Mar/18 17:58 Worklog Time Spent: 10m Work Description: aromanenko-dev opened a new pull request #4851: [BEAM-3819] Add withLimit() option to KinesisIO URL: https://github.com/apache/beam/pull/4851 Add `KinesisIO.Read.withLimit()` to limit maximum number of fetched records in `GetRecordsResult`. It should help to reduce a load on Kinesis if needed. Follow this checklist to help us incorporate your contribution quickly and easily: - [x] Make sure there is a [JIRA issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes. - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue. - [ ] Write a pull request description that is detailed enough to understand: - [x] What the pull request does - [x] Why it does it - [ ] How it does it - [ ] Why this approach - [x] Each commit in the pull request should have a meaningful subject line and body. - [x] Run `mvn clean verify` to make sure basic checks pass. A more thorough check will be performed on your pull request automatically. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 79547) Time Spent: 10m Remaining Estimate: 0h > Add withLimit() option to KinesisIO > --- > > Key: BEAM-3819 > URL: https://issues.apache.org/jira/browse/BEAM-3819 > Project: Beam > Issue Type: Improvement > Components: io-java-kinesis >Reporter: Jean-Baptiste Onofré >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In some cases, the user might need to set the {{limit}} on the > {{SimplifiedKinesisClient}}, especially for performance reason, depending of > the number of records. -- This message was sent by Atlassian JIRA (v7.6.3#76005)