[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO
[ https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=192343&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-192343 ] ASF GitHub Bot logged work on BEAM-6207: Author: ASF GitHub Bot Created on: 30/Jan/19 15:02 Start Date: 30/Jan/19 15:02 Worklog Time Spent: 10m Work Description: lgajowy commented on pull request #7612: [BEAM-6207] Added option to publish synthetic data to Kafka topic. URL: https://github.com/apache/beam/pull/7612 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 192343) Time Spent: 4h 40m (was: 4.5h) > extend "Data insertion Pipeline" with Kafka IO > -- > > Key: BEAM-6207 > URL: https://issues.apache.org/jira/browse/BEAM-6207 > Project: Beam > Issue Type: Sub-task > Components: io-java-kafka, testing >Reporter: Lukasz Gajowy >Assignee: Michal Walenia >Priority: Trivial > Time Spent: 4h 40m > Remaining Estimate: 0h > > Since now we have the Data insertion pipeline based on PubSubIO, it can be > easily extended with KafkaIO if needed. Same data then could be published to > any of the sinks leaving out the choice and enabling the data insertion > pipeline for Flink. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO
[ https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=192342&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-192342 ] ASF GitHub Bot logged work on BEAM-6207: Author: ASF GitHub Bot Created on: 30/Jan/19 15:01 Start Date: 30/Jan/19 15:01 Worklog Time Spent: 10m Work Description: lgajowy commented on issue #7612: [BEAM-6207] Added option to publish synthetic data to Kafka topic. URL: https://github.com/apache/beam/pull/7612#issuecomment-458975707 LGTM, thanks! Merging. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 192342) Time Spent: 4.5h (was: 4h 20m) > extend "Data insertion Pipeline" with Kafka IO > -- > > Key: BEAM-6207 > URL: https://issues.apache.org/jira/browse/BEAM-6207 > Project: Beam > Issue Type: Sub-task > Components: io-java-kafka, testing >Reporter: Lukasz Gajowy >Assignee: Michal Walenia >Priority: Trivial > Time Spent: 4.5h > Remaining Estimate: 0h > > Since now we have the Data insertion pipeline based on PubSubIO, it can be > easily extended with KafkaIO if needed. Same data then could be published to > any of the sinks leaving out the choice and enabling the data insertion > pipeline for Flink. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO
[ https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=192231&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-192231 ] ASF GitHub Bot logged work on BEAM-6207: Author: ASF GitHub Bot Created on: 30/Jan/19 12:30 Start Date: 30/Jan/19 12:30 Worklog Time Spent: 10m Work Description: mwalenia commented on pull request #7612: [BEAM-6207] Added option to publish synthetic data to Kafka topic. URL: https://github.com/apache/beam/pull/7612#discussion_r252235493 ## File path: sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPublisher.java ## @@ -50,15 +54,22 @@ * * ./gradlew :beam-sdks-java-load-tests:run -PloadTest.args=' *--insertionPipelineTopic=TOPIC_NAME + *--kafkaBootstrapServerAddress=SERVER_ADDRESS + *--kafkaTopic=KAFKA_TOPIC_NAME *--sourceOptions={"numRecords":1000,...}' - * -PloadTest.mainClass="org.apache.beam.sdk.loadtests.SyntheticDataPubSubPublisher" + * -PloadTest.mainClass="org.apache.beam.sdk.loadtests.SyntheticDataPublisher" * + * + * If parameters related to Kafka are provided, the publisher writes to Kafka. If both pubsub topic + * and Kafka params are present, records will be written to both sinks. Review comment: Sounds better, changed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 192231) Time Spent: 4h (was: 3h 50m) > extend "Data insertion Pipeline" with Kafka IO > -- > > Key: BEAM-6207 > URL: https://issues.apache.org/jira/browse/BEAM-6207 > Project: Beam > Issue Type: Sub-task > Components: io-java-kafka, testing >Reporter: Lukasz Gajowy >Assignee: Michal Walenia >Priority: Trivial > Time Spent: 4h > Remaining Estimate: 0h > > Since now we have the Data insertion pipeline based on PubSubIO, it can be > easily extended with KafkaIO if needed. Same data then could be published to > any of the sinks leaving out the choice and enabling the data insertion > pipeline for Flink. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO
[ https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=192233&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-192233 ] ASF GitHub Bot logged work on BEAM-6207: Author: ASF GitHub Bot Created on: 30/Jan/19 12:31 Start Date: 30/Jan/19 12:31 Worklog Time Spent: 10m Work Description: mwalenia commented on pull request #7612: [BEAM-6207] Added option to publish synthetic data to Kafka topic. URL: https://github.com/apache/beam/pull/7612#discussion_r252236023 ## File path: sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPublisher.java ## @@ -69,26 +80,64 @@ void setSourceOptions(String sourceOptions); @Description("PubSub topic to publish to") -@Validation.Required String getInsertionPipelineTopic(); Review comment: Ok, done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 192233) Time Spent: 4h 20m (was: 4h 10m) > extend "Data insertion Pipeline" with Kafka IO > -- > > Key: BEAM-6207 > URL: https://issues.apache.org/jira/browse/BEAM-6207 > Project: Beam > Issue Type: Sub-task > Components: io-java-kafka, testing >Reporter: Lukasz Gajowy >Assignee: Michal Walenia >Priority: Trivial > Time Spent: 4h 20m > Remaining Estimate: 0h > > Since now we have the Data insertion pipeline based on PubSubIO, it can be > easily extended with KafkaIO if needed. Same data then could be published to > any of the sinks leaving out the choice and enabling the data insertion > pipeline for Flink. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO
[ https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=192232&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-192232 ] ASF GitHub Bot logged work on BEAM-6207: Author: ASF GitHub Bot Created on: 30/Jan/19 12:30 Start Date: 30/Jan/19 12:30 Worklog Time Spent: 10m Work Description: mwalenia commented on pull request #7612: [BEAM-6207] Added option to publish synthetic data to Kafka topic. URL: https://github.com/apache/beam/pull/7612#discussion_r252235599 ## File path: sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPublisher.java ## @@ -50,15 +54,22 @@ * * ./gradlew :beam-sdks-java-load-tests:run -PloadTest.args=' *--insertionPipelineTopic=TOPIC_NAME + *--kafkaBootstrapServerAddress=SERVER_ADDRESS + *--kafkaTopic=KAFKA_TOPIC_NAME *--sourceOptions={"numRecords":1000,...}' - * -PloadTest.mainClass="org.apache.beam.sdk.loadtests.SyntheticDataPubSubPublisher" + * -PloadTest.mainClass="org.apache.beam.sdk.loadtests.SyntheticDataPublisher" * + * Review comment: Ok, added This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 192232) Time Spent: 4h 10m (was: 4h) > extend "Data insertion Pipeline" with Kafka IO > -- > > Key: BEAM-6207 > URL: https://issues.apache.org/jira/browse/BEAM-6207 > Project: Beam > Issue Type: Sub-task > Components: io-java-kafka, testing >Reporter: Lukasz Gajowy >Assignee: Michal Walenia >Priority: Trivial > Time Spent: 4h 10m > Remaining Estimate: 0h > > Since now we have the Data insertion pipeline based on PubSubIO, it can be > easily extended with KafkaIO if needed. Same data then could be published to > any of the sinks leaving out the choice and enabling the data insertion > pipeline for Flink. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO
[ https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=192230&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-192230 ] ASF GitHub Bot logged work on BEAM-6207: Author: ASF GitHub Bot Created on: 30/Jan/19 12:29 Start Date: 30/Jan/19 12:29 Worklog Time Spent: 10m Work Description: mwalenia commented on pull request #7612: [BEAM-6207] Added option to publish synthetic data to Kafka topic. URL: https://github.com/apache/beam/pull/7612#discussion_r252235416 ## File path: sdks/java/io/synthetic/build.gradle ## @@ -29,11 +29,13 @@ dependencies { shadow library.java.jackson_annotations shadow library.java.jackson_databind shadow library.java.guava + shadow library.java.kafka_clients shadowTest library.java.vendored_guava_20_0 testCompile library.java.junit testCompile library.java.hamcrest_core testCompile library.java.hamcrest_library shadow project(path: ":beam-sdks-java-core", configuration: "shadow") shadow project(path: ":beam-runners-direct-java", configuration: "shadow") + shadow project(path: ":beam-sdks-java-io-kafka", configuration: "shadow") Review comment: Moved it, as per comment above This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 192230) Time Spent: 3h 50m (was: 3h 40m) > extend "Data insertion Pipeline" with Kafka IO > -- > > Key: BEAM-6207 > URL: https://issues.apache.org/jira/browse/BEAM-6207 > Project: Beam > Issue Type: Sub-task > Components: io-java-kafka, testing >Reporter: Lukasz Gajowy >Assignee: Michal Walenia >Priority: Trivial > Time Spent: 3h 50m > Remaining Estimate: 0h > > Since now we have the Data insertion pipeline based on PubSubIO, it can be > easily extended with KafkaIO if needed. Same data then could be published to > any of the sinks leaving out the choice and enabling the data insertion > pipeline for Flink. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO
[ https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=192229&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-192229 ] ASF GitHub Bot logged work on BEAM-6207: Author: ASF GitHub Bot Created on: 30/Jan/19 12:29 Start Date: 30/Jan/19 12:29 Worklog Time Spent: 10m Work Description: mwalenia commented on pull request #7612: [BEAM-6207] Added option to publish synthetic data to Kafka topic. URL: https://github.com/apache/beam/pull/7612#discussion_r252235347 ## File path: sdks/java/io/synthetic/build.gradle ## @@ -29,11 +29,13 @@ dependencies { shadow library.java.jackson_annotations shadow library.java.jackson_databind shadow library.java.guava + shadow library.java.kafka_clients Review comment: That's true, I moved it This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 192229) Time Spent: 3h 40m (was: 3.5h) > extend "Data insertion Pipeline" with Kafka IO > -- > > Key: BEAM-6207 > URL: https://issues.apache.org/jira/browse/BEAM-6207 > Project: Beam > Issue Type: Sub-task > Components: io-java-kafka, testing >Reporter: Lukasz Gajowy >Assignee: Michal Walenia >Priority: Trivial > Time Spent: 3h 40m > Remaining Estimate: 0h > > Since now we have the Data insertion pipeline based on PubSubIO, it can be > easily extended with KafkaIO if needed. Same data then could be published to > any of the sinks leaving out the choice and enabling the data insertion > pipeline for Flink. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO
[ https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=192210&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-192210 ] ASF GitHub Bot logged work on BEAM-6207: Author: ASF GitHub Bot Created on: 30/Jan/19 11:41 Start Date: 30/Jan/19 11:41 Worklog Time Spent: 10m Work Description: lgajowy commented on pull request #7612: [BEAM-6207] Added option to publish synthetic data to Kafka topic. URL: https://github.com/apache/beam/pull/7612#discussion_r252214536 ## File path: sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPublisher.java ## @@ -69,26 +80,64 @@ void setSourceOptions(String sourceOptions); @Description("PubSub topic to publish to") -@Validation.Required String getInsertionPipelineTopic(); void setInsertionPipelineTopic(String topic); + +@Description("Kafka server address") +String getKafkaBootstrapServerAddress(); + +void setKafkaBootstrapServerAddress(String address); + +@Description("Kafka topic") +String getKafkaTopic(); + +void setKafkaTopic(String topic); } public static void main(String[] args) throws IOException { -Options options = PipelineOptionsFactory.fromArgs(args).withValidation().as(Options.class); +options = PipelineOptionsFactory.fromArgs(args).withValidation().as(Options.class); SyntheticSourceOptions sourceOptions = SyntheticOptions.fromJsonString(options.getSourceOptions(), SyntheticSourceOptions.class); Pipeline pipeline = Pipeline.create(options); +PCollection> syntheticData = Review comment: looks good! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 192210) Time Spent: 3h 20m (was: 3h 10m) > extend "Data insertion Pipeline" with Kafka IO > -- > > Key: BEAM-6207 > URL: https://issues.apache.org/jira/browse/BEAM-6207 > Project: Beam > Issue Type: Sub-task > Components: io-java-kafka, testing >Reporter: Lukasz Gajowy >Assignee: Michal Walenia >Priority: Trivial > Time Spent: 3h 20m > Remaining Estimate: 0h > > Since now we have the Data insertion pipeline based on PubSubIO, it can be > easily extended with KafkaIO if needed. Same data then could be published to > any of the sinks leaving out the choice and enabling the data insertion > pipeline for Flink. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO
[ https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=192207&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-192207 ] ASF GitHub Bot logged work on BEAM-6207: Author: ASF GitHub Bot Created on: 30/Jan/19 11:41 Start Date: 30/Jan/19 11:41 Worklog Time Spent: 10m Work Description: lgajowy commented on pull request #7612: [BEAM-6207] Added option to publish synthetic data to Kafka topic. URL: https://github.com/apache/beam/pull/7612#discussion_r252213775 ## File path: sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPublisher.java ## @@ -69,26 +80,64 @@ void setSourceOptions(String sourceOptions); @Description("PubSub topic to publish to") -@Validation.Required String getInsertionPipelineTopic(); Review comment: Can you change this to `getPubSubTopic` too? `InsertionPipelineTopic` is too generic now since we have the Kafka option too. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 192207) Time Spent: 3h (was: 2h 50m) > extend "Data insertion Pipeline" with Kafka IO > -- > > Key: BEAM-6207 > URL: https://issues.apache.org/jira/browse/BEAM-6207 > Project: Beam > Issue Type: Sub-task > Components: io-java-kafka, testing >Reporter: Lukasz Gajowy >Assignee: Michal Walenia >Priority: Trivial > Time Spent: 3h > Remaining Estimate: 0h > > Since now we have the Data insertion pipeline based on PubSubIO, it can be > easily extended with KafkaIO if needed. Same data then could be published to > any of the sinks leaving out the choice and enabling the data insertion > pipeline for Flink. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO
[ https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=192212&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-192212 ] ASF GitHub Bot logged work on BEAM-6207: Author: ASF GitHub Bot Created on: 30/Jan/19 11:41 Start Date: 30/Jan/19 11:41 Worklog Time Spent: 10m Work Description: lgajowy commented on pull request #7612: [BEAM-6207] Added option to publish synthetic data to Kafka topic. URL: https://github.com/apache/beam/pull/7612#discussion_r252220021 ## File path: sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPublisher.java ## @@ -50,15 +54,22 @@ * * ./gradlew :beam-sdks-java-load-tests:run -PloadTest.args=' *--insertionPipelineTopic=TOPIC_NAME + *--kafkaBootstrapServerAddress=SERVER_ADDRESS + *--kafkaTopic=KAFKA_TOPIC_NAME *--sourceOptions={"numRecords":1000,...}' - * -PloadTest.mainClass="org.apache.beam.sdk.loadtests.SyntheticDataPubSubPublisher" + * -PloadTest.mainClass="org.apache.beam.sdk.loadtests.SyntheticDataPublisher" * + * Review comment: missing javadoc `` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 192212) > extend "Data insertion Pipeline" with Kafka IO > -- > > Key: BEAM-6207 > URL: https://issues.apache.org/jira/browse/BEAM-6207 > Project: Beam > Issue Type: Sub-task > Components: io-java-kafka, testing >Reporter: Lukasz Gajowy >Assignee: Michal Walenia >Priority: Trivial > Time Spent: 3h 20m > Remaining Estimate: 0h > > Since now we have the Data insertion pipeline based on PubSubIO, it can be > easily extended with KafkaIO if needed. Same data then could be published to > any of the sinks leaving out the choice and enabling the data insertion > pipeline for Flink. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO
[ https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=192213&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-192213 ] ASF GitHub Bot logged work on BEAM-6207: Author: ASF GitHub Bot Created on: 30/Jan/19 11:41 Start Date: 30/Jan/19 11:41 Worklog Time Spent: 10m Work Description: lgajowy commented on pull request #7612: [BEAM-6207] Added option to publish synthetic data to Kafka topic. URL: https://github.com/apache/beam/pull/7612#discussion_r252216354 ## File path: sdks/java/io/synthetic/build.gradle ## @@ -29,11 +29,13 @@ dependencies { shadow library.java.jackson_annotations shadow library.java.jackson_databind shadow library.java.guava + shadow library.java.kafka_clients shadowTest library.java.vendored_guava_20_0 testCompile library.java.junit testCompile library.java.hamcrest_core testCompile library.java.hamcrest_library shadow project(path: ":beam-sdks-java-core", configuration: "shadow") shadow project(path: ":beam-runners-direct-java", configuration: "shadow") + shadow project(path: ":beam-sdks-java-io-kafka", configuration: "shadow") Review comment: (as the comment above) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 192213) Time Spent: 3.5h (was: 3h 20m) > extend "Data insertion Pipeline" with Kafka IO > -- > > Key: BEAM-6207 > URL: https://issues.apache.org/jira/browse/BEAM-6207 > Project: Beam > Issue Type: Sub-task > Components: io-java-kafka, testing >Reporter: Lukasz Gajowy >Assignee: Michal Walenia >Priority: Trivial > Time Spent: 3.5h > Remaining Estimate: 0h > > Since now we have the Data insertion pipeline based on PubSubIO, it can be > easily extended with KafkaIO if needed. Same data then could be published to > any of the sinks leaving out the choice and enabling the data insertion > pipeline for Flink. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO
[ https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=192208&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-192208 ] ASF GitHub Bot logged work on BEAM-6207: Author: ASF GitHub Bot Created on: 30/Jan/19 11:41 Start Date: 30/Jan/19 11:41 Worklog Time Spent: 10m Work Description: lgajowy commented on pull request #7612: [BEAM-6207] Added option to publish synthetic data to Kafka topic. URL: https://github.com/apache/beam/pull/7612#discussion_r252214647 ## File path: sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPublisher.java ## @@ -50,15 +54,22 @@ * * ./gradlew :beam-sdks-java-load-tests:run -PloadTest.args=' *--insertionPipelineTopic=TOPIC_NAME Review comment: Please remember to change here too This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 192208) Time Spent: 3h 10m (was: 3h) > extend "Data insertion Pipeline" with Kafka IO > -- > > Key: BEAM-6207 > URL: https://issues.apache.org/jira/browse/BEAM-6207 > Project: Beam > Issue Type: Sub-task > Components: io-java-kafka, testing >Reporter: Lukasz Gajowy >Assignee: Michal Walenia >Priority: Trivial > Time Spent: 3h 10m > Remaining Estimate: 0h > > Since now we have the Data insertion pipeline based on PubSubIO, it can be > easily extended with KafkaIO if needed. Same data then could be published to > any of the sinks leaving out the choice and enabling the data insertion > pipeline for Flink. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO
[ https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=192209&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-192209 ] ASF GitHub Bot logged work on BEAM-6207: Author: ASF GitHub Bot Created on: 30/Jan/19 11:41 Start Date: 30/Jan/19 11:41 Worklog Time Spent: 10m Work Description: lgajowy commented on pull request #7612: [BEAM-6207] Added option to publish synthetic data to Kafka topic. URL: https://github.com/apache/beam/pull/7612#discussion_r252216289 ## File path: sdks/java/io/synthetic/build.gradle ## @@ -29,11 +29,13 @@ dependencies { shadow library.java.jackson_annotations shadow library.java.jackson_databind shadow library.java.guava + shadow library.java.kafka_clients Review comment: Why is this needed in `sdks/java/io/synthetic/build.gradle`? Shouldn't we add this in `sdks/java/testing/load-tests/build.gradle`? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 192209) Time Spent: 3h 10m (was: 3h) > extend "Data insertion Pipeline" with Kafka IO > -- > > Key: BEAM-6207 > URL: https://issues.apache.org/jira/browse/BEAM-6207 > Project: Beam > Issue Type: Sub-task > Components: io-java-kafka, testing >Reporter: Lukasz Gajowy >Assignee: Michal Walenia >Priority: Trivial > Time Spent: 3h 10m > Remaining Estimate: 0h > > Since now we have the Data insertion pipeline based on PubSubIO, it can be > easily extended with KafkaIO if needed. Same data then could be published to > any of the sinks leaving out the choice and enabling the data insertion > pipeline for Flink. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO
[ https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=192211&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-192211 ] ASF GitHub Bot logged work on BEAM-6207: Author: ASF GitHub Bot Created on: 30/Jan/19 11:41 Start Date: 30/Jan/19 11:41 Worklog Time Spent: 10m Work Description: lgajowy commented on pull request #7612: [BEAM-6207] Added option to publish synthetic data to Kafka topic. URL: https://github.com/apache/beam/pull/7612#discussion_r252219843 ## File path: sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPublisher.java ## @@ -50,15 +54,22 @@ * * ./gradlew :beam-sdks-java-load-tests:run -PloadTest.args=' *--insertionPipelineTopic=TOPIC_NAME + *--kafkaBootstrapServerAddress=SERVER_ADDRESS + *--kafkaTopic=KAFKA_TOPIC_NAME *--sourceOptions={"numRecords":1000,...}' - * -PloadTest.mainClass="org.apache.beam.sdk.loadtests.SyntheticDataPubSubPublisher" + * -PloadTest.mainClass="org.apache.beam.sdk.loadtests.SyntheticDataPublisher" * + * + * If parameters related to Kafka are provided, the publisher writes to Kafka. If both pubsub topic + * and Kafka params are present, records will be written to both sinks. Review comment: It would be even more clear if the sentence gives instructions for both Kafka and Pubsub, so I suggest: ``` If parameters related to a specific sink are provided (Kafka or PubSub), the pipeline writes to the sink. Writing to both sinks is also acceptable. ``` (Or similar but including both runners) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 192211) > extend "Data insertion Pipeline" with Kafka IO > -- > > Key: BEAM-6207 > URL: https://issues.apache.org/jira/browse/BEAM-6207 > Project: Beam > Issue Type: Sub-task > Components: io-java-kafka, testing >Reporter: Lukasz Gajowy >Assignee: Michal Walenia >Priority: Trivial > Time Spent: 3h 20m > Remaining Estimate: 0h > > Since now we have the Data insertion pipeline based on PubSubIO, it can be > easily extended with KafkaIO if needed. Same data then could be published to > any of the sinks leaving out the choice and enabling the data insertion > pipeline for Flink. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO
[ https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=191617&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-191617 ] ASF GitHub Bot logged work on BEAM-6207: Author: ASF GitHub Bot Created on: 29/Jan/19 11:06 Start Date: 29/Jan/19 11:06 Worklog Time Spent: 10m Work Description: mwalenia commented on pull request #7612: [BEAM-6207] Added option to publish synthetic data to Kafka topic. URL: https://github.com/apache/beam/pull/7612#discussion_r251787329 ## File path: sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPubSubPublisher.java ## @@ -50,9 +54,13 @@ * * ./gradlew :beam-sdks-java-load-tests:run -PloadTest.args=' *--insertionPipelineTopic=TOPIC_NAME + *--kafkaBootstrapServerAddress=SERVER_ADDRESS *--sourceOptions={"numRecords":1000,...}' * -PloadTest.mainClass="org.apache.beam.sdk.loadtests.SyntheticDataPubSubPublisher" * + * + * Parameter kafkaBootstrapServerAddress is optional. If provided, pipeline topic will be treated as Review comment: That's a good idea, implemented in 2nd commit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 191617) Time Spent: 2h 40m (was: 2.5h) > extend "Data insertion Pipeline" with Kafka IO > -- > > Key: BEAM-6207 > URL: https://issues.apache.org/jira/browse/BEAM-6207 > Project: Beam > Issue Type: Sub-task > Components: io-java-kafka, testing >Reporter: Lukasz Gajowy >Assignee: Michal Walenia >Priority: Trivial > Time Spent: 2h 40m > Remaining Estimate: 0h > > Since now we have the Data insertion pipeline based on PubSubIO, it can be > easily extended with KafkaIO if needed. Same data then could be published to > any of the sinks leaving out the choice and enabling the data insertion > pipeline for Flink. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO
[ https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=191618&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-191618 ] ASF GitHub Bot logged work on BEAM-6207: Author: ASF GitHub Bot Created on: 29/Jan/19 11:06 Start Date: 29/Jan/19 11:06 Worklog Time Spent: 10m Work Description: mwalenia commented on pull request #7612: [BEAM-6207] Added option to publish synthetic data to Kafka topic. URL: https://github.com/apache/beam/pull/7612#discussion_r251787479 ## File path: sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPubSubPublisher.java ## @@ -83,14 +97,41 @@ public static void main(String[] args) throws IOException { Pipeline pipeline = Pipeline.create(options); -pipeline -.apply("Read synthetic data", Read.from(new SyntheticBoundedSource(sourceOptions))) -.apply("Map to PubSub messages", MapElements.via(new MapBytesToPubSubMessage())) -.apply("Write to PubSub", PubsubIO.writeMessages().to(options.getInsertionPipelineTopic())); - +if (!options.getKafkaBootstrapServerAddress().isEmpty()) { Review comment: implemented, thanks for the suggestion! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 191618) Time Spent: 2h 50m (was: 2h 40m) > extend "Data insertion Pipeline" with Kafka IO > -- > > Key: BEAM-6207 > URL: https://issues.apache.org/jira/browse/BEAM-6207 > Project: Beam > Issue Type: Sub-task > Components: io-java-kafka, testing >Reporter: Lukasz Gajowy >Assignee: Michal Walenia >Priority: Trivial > Time Spent: 2h 50m > Remaining Estimate: 0h > > Since now we have the Data insertion pipeline based on PubSubIO, it can be > easily extended with KafkaIO if needed. Same data then could be published to > any of the sinks leaving out the choice and enabling the data insertion > pipeline for Flink. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO
[ https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=191083&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-191083 ] ASF GitHub Bot logged work on BEAM-6207: Author: ASF GitHub Bot Created on: 28/Jan/19 15:56 Start Date: 28/Jan/19 15:56 Worklog Time Spent: 10m Work Description: lgajowy commented on pull request #7612: [BEAM-6207] Added option to publish synthetic data to Kafka topic. URL: https://github.com/apache/beam/pull/7612#discussion_r251464154 ## File path: sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPubSubPublisher.java ## @@ -73,6 +81,12 @@ String getInsertionPipelineTopic(); void setInsertionPipelineTopic(String topic); + +@Description("Kafka server address") +@Default.String("") Review comment: Could you remove this default and do a null-check instead? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 191083) Time Spent: 1h (was: 50m) > extend "Data insertion Pipeline" with Kafka IO > -- > > Key: BEAM-6207 > URL: https://issues.apache.org/jira/browse/BEAM-6207 > Project: Beam > Issue Type: Sub-task > Components: io-java-kafka, testing >Reporter: Lukasz Gajowy >Assignee: Michal Walenia >Priority: Trivial > Time Spent: 1h > Remaining Estimate: 0h > > Since now we have the Data insertion pipeline based on PubSubIO, it can be > easily extended with KafkaIO if needed. Same data then could be published to > any of the sinks leaving out the choice and enabling the data insertion > pipeline for Flink. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO
[ https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=191094&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-191094 ] ASF GitHub Bot logged work on BEAM-6207: Author: ASF GitHub Bot Created on: 28/Jan/19 16:05 Start Date: 28/Jan/19 16:05 Worklog Time Spent: 10m Work Description: lgajowy commented on pull request #7612: [BEAM-6207] Added option to publish synthetic data to Kafka topic. URL: https://github.com/apache/beam/pull/7612#discussion_r251480159 ## File path: sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPubSubPublisher.java ## @@ -73,6 +81,12 @@ String getInsertionPipelineTopic(); void setInsertionPipelineTopic(String topic); + +@Description("Kafka server address") +@Default.String("") Review comment: one last thing: we need to change the class name as it's not only about `PubSub` now :) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 191094) Time Spent: 2h 10m (was: 2h) > extend "Data insertion Pipeline" with Kafka IO > -- > > Key: BEAM-6207 > URL: https://issues.apache.org/jira/browse/BEAM-6207 > Project: Beam > Issue Type: Sub-task > Components: io-java-kafka, testing >Reporter: Lukasz Gajowy >Assignee: Michal Walenia >Priority: Trivial > Time Spent: 2h 10m > Remaining Estimate: 0h > > Since now we have the Data insertion pipeline based on PubSubIO, it can be > easily extended with KafkaIO if needed. Same data then could be published to > any of the sinks leaving out the choice and enabling the data insertion > pipeline for Flink. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO
[ https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=191092&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-191092 ] ASF GitHub Bot logged work on BEAM-6207: Author: ASF GitHub Bot Created on: 28/Jan/19 16:03 Start Date: 28/Jan/19 16:03 Worklog Time Spent: 10m Work Description: lgajowy commented on pull request #7612: [BEAM-6207] Added option to publish synthetic data to Kafka topic. URL: https://github.com/apache/beam/pull/7612#discussion_r251479212 ## File path: sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPubSubPublisher.java ## @@ -50,9 +54,13 @@ * * ./gradlew :beam-sdks-java-load-tests:run -PloadTest.args=' *--insertionPipelineTopic=TOPIC_NAME + *--kafkaBootstrapServerAddress=SERVER_ADDRESS *--sourceOptions={"numRecords":1000,...}' * -PloadTest.mainClass="org.apache.beam.sdk.loadtests.SyntheticDataPubSubPublisher" * + * + * Parameter kafkaBootstrapServerAddress is optional. If provided, pipeline topic will be treated as + * Kafka topic name and records will be published to Kafka instead of PubSub. */ public class SyntheticDataPubSubPublisher { Review comment: Let's change this name to abstract from `PubSub`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 191092) Time Spent: 1h 50m (was: 1h 40m) > extend "Data insertion Pipeline" with Kafka IO > -- > > Key: BEAM-6207 > URL: https://issues.apache.org/jira/browse/BEAM-6207 > Project: Beam > Issue Type: Sub-task > Components: io-java-kafka, testing >Reporter: Lukasz Gajowy >Assignee: Michal Walenia >Priority: Trivial > Time Spent: 1h 50m > Remaining Estimate: 0h > > Since now we have the Data insertion pipeline based on PubSubIO, it can be > easily extended with KafkaIO if needed. Same data then could be published to > any of the sinks leaving out the choice and enabling the data insertion > pipeline for Flink. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO
[ https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=191096&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-191096 ] ASF GitHub Bot logged work on BEAM-6207: Author: ASF GitHub Bot Created on: 28/Jan/19 16:05 Start Date: 28/Jan/19 16:05 Worklog Time Spent: 10m Work Description: lgajowy commented on issue #7612: [BEAM-6207] Added option to publish synthetic data to Kafka topic. URL: https://github.com/apache/beam/pull/7612#issuecomment-458191562 One last thing: could you change the name to abstract from `PubSub` as it's not only about pubsub now? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 191096) Time Spent: 2.5h (was: 2h 20m) > extend "Data insertion Pipeline" with Kafka IO > -- > > Key: BEAM-6207 > URL: https://issues.apache.org/jira/browse/BEAM-6207 > Project: Beam > Issue Type: Sub-task > Components: io-java-kafka, testing >Reporter: Lukasz Gajowy >Assignee: Michal Walenia >Priority: Trivial > Time Spent: 2.5h > Remaining Estimate: 0h > > Since now we have the Data insertion pipeline based on PubSubIO, it can be > easily extended with KafkaIO if needed. Same data then could be published to > any of the sinks leaving out the choice and enabling the data insertion > pipeline for Flink. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO
[ https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=191095&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-191095 ] ASF GitHub Bot logged work on BEAM-6207: Author: ASF GitHub Bot Created on: 28/Jan/19 16:05 Start Date: 28/Jan/19 16:05 Worklog Time Spent: 10m Work Description: lgajowy commented on issue #7612: [BEAM-6207] Added option to publish synthetic data to Kafka topic. URL: https://github.com/apache/beam/pull/7612#issuecomment-458191562 One last thing: could you change the name to abstract from `PubSub` as it's not only about pubsub now? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 191095) Time Spent: 2h 20m (was: 2h 10m) > extend "Data insertion Pipeline" with Kafka IO > -- > > Key: BEAM-6207 > URL: https://issues.apache.org/jira/browse/BEAM-6207 > Project: Beam > Issue Type: Sub-task > Components: io-java-kafka, testing >Reporter: Lukasz Gajowy >Assignee: Michal Walenia >Priority: Trivial > Time Spent: 2h 20m > Remaining Estimate: 0h > > Since now we have the Data insertion pipeline based on PubSubIO, it can be > easily extended with KafkaIO if needed. Same data then could be published to > any of the sinks leaving out the choice and enabling the data insertion > pipeline for Flink. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO
[ https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=191093&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-191093 ] ASF GitHub Bot logged work on BEAM-6207: Author: ASF GitHub Bot Created on: 28/Jan/19 16:04 Start Date: 28/Jan/19 16:04 Worklog Time Spent: 10m Work Description: lgajowy commented on pull request #7612: [BEAM-6207] Added option to publish synthetic data to Kafka topic. URL: https://github.com/apache/beam/pull/7612#discussion_r251480159 ## File path: sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPubSubPublisher.java ## @@ -73,6 +81,12 @@ String getInsertionPipelineTopic(); void setInsertionPipelineTopic(String topic); + +@Description("Kafka server address") +@Default.String("") Review comment: one last thing: we need to change the class name as it's not only about `PubSub` now :) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 191093) Time Spent: 2h (was: 1h 50m) > extend "Data insertion Pipeline" with Kafka IO > -- > > Key: BEAM-6207 > URL: https://issues.apache.org/jira/browse/BEAM-6207 > Project: Beam > Issue Type: Sub-task > Components: io-java-kafka, testing >Reporter: Lukasz Gajowy >Assignee: Michal Walenia >Priority: Trivial > Time Spent: 2h > Remaining Estimate: 0h > > Since now we have the Data insertion pipeline based on PubSubIO, it can be > easily extended with KafkaIO if needed. Same data then could be published to > any of the sinks leaving out the choice and enabling the data insertion > pipeline for Flink. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO
[ https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=191091&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-191091 ] ASF GitHub Bot logged work on BEAM-6207: Author: ASF GitHub Bot Created on: 28/Jan/19 16:02 Start Date: 28/Jan/19 16:02 Worklog Time Spent: 10m Work Description: lgajowy commented on pull request #7612: [BEAM-6207] Added option to publish synthetic data to Kafka topic. URL: https://github.com/apache/beam/pull/7612#discussion_r251479212 ## File path: sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPubSubPublisher.java ## @@ -50,9 +54,13 @@ * * ./gradlew :beam-sdks-java-load-tests:run -PloadTest.args=' *--insertionPipelineTopic=TOPIC_NAME + *--kafkaBootstrapServerAddress=SERVER_ADDRESS *--sourceOptions={"numRecords":1000,...}' * -PloadTest.mainClass="org.apache.beam.sdk.loadtests.SyntheticDataPubSubPublisher" * + * + * Parameter kafkaBootstrapServerAddress is optional. If provided, pipeline topic will be treated as + * Kafka topic name and records will be published to Kafka instead of PubSub. */ public class SyntheticDataPubSubPublisher { Review comment: Let's change this name to abstract from `PubSub`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 191091) Time Spent: 1h 40m (was: 1.5h) > extend "Data insertion Pipeline" with Kafka IO > -- > > Key: BEAM-6207 > URL: https://issues.apache.org/jira/browse/BEAM-6207 > Project: Beam > Issue Type: Sub-task > Components: io-java-kafka, testing >Reporter: Lukasz Gajowy >Assignee: Michal Walenia >Priority: Trivial > Time Spent: 1h 40m > Remaining Estimate: 0h > > Since now we have the Data insertion pipeline based on PubSubIO, it can be > easily extended with KafkaIO if needed. Same data then could be published to > any of the sinks leaving out the choice and enabling the data insertion > pipeline for Flink. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO
[ https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=191090&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-191090 ] ASF GitHub Bot logged work on BEAM-6207: Author: ASF GitHub Bot Created on: 28/Jan/19 15:59 Start Date: 28/Jan/19 15:59 Worklog Time Spent: 10m Work Description: lgajowy commented on pull request #7612: [BEAM-6207] Added option to publish synthetic data to Kafka topic. URL: https://github.com/apache/beam/pull/7612#discussion_r251477916 ## File path: sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPubSubPublisher.java ## @@ -83,14 +95,41 @@ public static void main(String[] args) throws IOException { Pipeline pipeline = Pipeline.create(options); -pipeline -.apply("Read synthetic data", Read.from(new SyntheticBoundedSource(sourceOptions))) -.apply("Map to PubSub messages", MapElements.via(new MapBytesToPubSubMessage())) -.apply("Write to PubSub", PubsubIO.writeMessages().to(options.getInsertionPipelineTopic())); - +if (options.getKafkaBootstrapServerAddress() != null) { Review comment: IMO it can stay as one file. After all, we can have multiple sinks in one pipeline and even publish to both at the same time if we want to (no idea that this is ever going to be a case, though... :) ) . I'd split it like I suggested in the review comments below. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 191090) Time Spent: 1.5h (was: 1h 20m) > extend "Data insertion Pipeline" with Kafka IO > -- > > Key: BEAM-6207 > URL: https://issues.apache.org/jira/browse/BEAM-6207 > Project: Beam > Issue Type: Sub-task > Components: io-java-kafka, testing >Reporter: Lukasz Gajowy >Assignee: Michal Walenia >Priority: Trivial > Time Spent: 1.5h > Remaining Estimate: 0h > > Since now we have the Data insertion pipeline based on PubSubIO, it can be > easily extended with KafkaIO if needed. Same data then could be published to > any of the sinks leaving out the choice and enabling the data insertion > pipeline for Flink. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO
[ https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=191084&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-191084 ] ASF GitHub Bot logged work on BEAM-6207: Author: ASF GitHub Bot Created on: 28/Jan/19 15:56 Start Date: 28/Jan/19 15:56 Worklog Time Spent: 10m Work Description: lgajowy commented on pull request #7612: [BEAM-6207] Added option to publish synthetic data to Kafka topic. URL: https://github.com/apache/beam/pull/7612#discussion_r251473502 ## File path: sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPubSubPublisher.java ## @@ -83,14 +97,41 @@ public static void main(String[] args) throws IOException { Pipeline pipeline = Pipeline.create(options); -pipeline -.apply("Read synthetic data", Read.from(new SyntheticBoundedSource(sourceOptions))) -.apply("Map to PubSub messages", MapElements.via(new MapBytesToPubSubMessage())) -.apply("Write to PubSub", PubsubIO.writeMessages().to(options.getInsertionPipelineTopic())); - +if (!options.getKafkaBootstrapServerAddress().isEmpty()) { Review comment: Those sinks are somewhat independent and we could let them stay this way. We can code it like this (including the division of the methods): ``` if () writeToPubsub(collection); } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 191084) Time Spent: 1h 10m (was: 1h) > extend "Data insertion Pipeline" with Kafka IO > -- > > Key: BEAM-6207 > URL: https://issues.apache.org/jira/browse/BEAM-6207 > Project: Beam > Issue Type: Sub-task > Components: io-java-kafka, testing >Reporter: Lukasz Gajowy >Assignee: Michal Walenia >Priority: Trivial > Time Spent: 1h 10m > Remaining Estimate: 0h > > Since now we have the Data insertion pipeline based on PubSubIO, it can be > easily extended with KafkaIO if needed. Same data then could be published to > any of the sinks leaving out the choice and enabling the data insertion > pipeline for Flink. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO
[ https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=191085&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-191085 ] ASF GitHub Bot logged work on BEAM-6207: Author: ASF GitHub Bot Created on: 28/Jan/19 15:56 Start Date: 28/Jan/19 15:56 Worklog Time Spent: 10m Work Description: lgajowy commented on pull request #7612: [BEAM-6207] Added option to publish synthetic data to Kafka topic. URL: https://github.com/apache/beam/pull/7612#discussion_r251474125 ## File path: sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPubSubPublisher.java ## @@ -83,14 +97,41 @@ public static void main(String[] args) throws IOException { Pipeline pipeline = Pipeline.create(options); -pipeline -.apply("Read synthetic data", Read.from(new SyntheticBoundedSource(sourceOptions))) -.apply("Map to PubSub messages", MapElements.via(new MapBytesToPubSubMessage())) -.apply("Write to PubSub", PubsubIO.writeMessages().to(options.getInsertionPipelineTopic())); - +if (!options.getKafkaBootstrapServerAddress().isEmpty()) { + pipeline + .apply("Read synthetic data", Read.from(new SyntheticBoundedSource(sourceOptions))) + .apply("Map to Kafka messages", MapElements.via(new MapKVToString())) + .apply( + "Write to Kafka", + KafkaIO.write() + .withBootstrapServers(options.getKafkaBootstrapServerAddress()) + .withTopic(options.getInsertionPipelineTopic()) + .withValueSerializer(StringSerializer.class) + .values()); +} else { + pipeline + .apply("Read synthetic data", Read.from(new SyntheticBoundedSource(sourceOptions))) + .apply("Map to PubSub messages", MapElements.via(new MapBytesToPubSubMessage())) + .apply( + "Write to PubSub", PubsubIO.writeMessages().to(options.getInsertionPipelineTopic())); +} pipeline.run().waitUntilFinish(); } + private static class MapKVToString extends SimpleFunction, String> { +@Override +public String apply(KV input) { + StringBuilder stringBuilder = + new StringBuilder() Review comment: Hint: an "oneliner" for this can be done using `String.format("{%s,%s}", Arrays.toString(input.getKey()), Arrays.toString(input.getValue()))` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 191085) Time Spent: 1h 20m (was: 1h 10m) > extend "Data insertion Pipeline" with Kafka IO > -- > > Key: BEAM-6207 > URL: https://issues.apache.org/jira/browse/BEAM-6207 > Project: Beam > Issue Type: Sub-task > Components: io-java-kafka, testing >Reporter: Lukasz Gajowy >Assignee: Michal Walenia >Priority: Trivial > Time Spent: 1h 20m > Remaining Estimate: 0h > > Since now we have the Data insertion pipeline based on PubSubIO, it can be > easily extended with KafkaIO if needed. Same data then could be published to > any of the sinks leaving out the choice and enabling the data insertion > pipeline for Flink. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO
[ https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=191086&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-191086 ] ASF GitHub Bot logged work on BEAM-6207: Author: ASF GitHub Bot Created on: 28/Jan/19 15:56 Start Date: 28/Jan/19 15:56 Worklog Time Spent: 10m Work Description: lgajowy commented on pull request #7612: [BEAM-6207] Added option to publish synthetic data to Kafka topic. URL: https://github.com/apache/beam/pull/7612#discussion_r251465369 ## File path: sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPubSubPublisher.java ## @@ -50,9 +54,13 @@ * * ./gradlew :beam-sdks-java-load-tests:run -PloadTest.args=' *--insertionPipelineTopic=TOPIC_NAME + *--kafkaBootstrapServerAddress=SERVER_ADDRESS *--sourceOptions={"numRecords":1000,...}' * -PloadTest.mainClass="org.apache.beam.sdk.loadtests.SyntheticDataPubSubPublisher" * + * + * Parameter kafkaBootstrapServerAddress is optional. If provided, pipeline topic will be treated as Review comment: I think we can do it even better. This pipeline could be designed in such a way that allows publishing to every sink type at the same time. So having: - kafka related options & pubsub related options we can publish to both Kafka and Pubsub - kafka related options: publish only to Kafka - pubsub related options: publish only to PubSub WDYT? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 191086) > extend "Data insertion Pipeline" with Kafka IO > -- > > Key: BEAM-6207 > URL: https://issues.apache.org/jira/browse/BEAM-6207 > Project: Beam > Issue Type: Sub-task > Components: io-java-kafka, testing >Reporter: Lukasz Gajowy >Assignee: Michal Walenia >Priority: Trivial > Time Spent: 1h 20m > Remaining Estimate: 0h > > Since now we have the Data insertion pipeline based on PubSubIO, it can be > easily extended with KafkaIO if needed. Same data then could be published to > any of the sinks leaving out the choice and enabling the data insertion > pipeline for Flink. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO
[ https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=189908&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189908 ] ASF GitHub Bot logged work on BEAM-6207: Author: ASF GitHub Bot Created on: 25/Jan/19 09:15 Start Date: 25/Jan/19 09:15 Worklog Time Spent: 10m Work Description: mwalenia commented on pull request #7612: [BEAM-6207] Added option to publish synthetic data to Kafka topic. URL: https://github.com/apache/beam/pull/7612#discussion_r250912490 ## File path: sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPubSubPublisher.java ## @@ -73,6 +80,11 @@ String getInsertionPipelineTopic(); void setInsertionPipelineTopic(String topic); + +@Description("Kafka server address (optional)") Review comment: That's a good idea, done. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189908) Time Spent: 50m (was: 40m) > extend "Data insertion Pipeline" with Kafka IO > -- > > Key: BEAM-6207 > URL: https://issues.apache.org/jira/browse/BEAM-6207 > Project: Beam > Issue Type: Sub-task > Components: io-java-kafka, testing >Reporter: Lukasz Gajowy >Assignee: Michal Walenia >Priority: Trivial > Time Spent: 50m > Remaining Estimate: 0h > > Since now we have the Data insertion pipeline based on PubSubIO, it can be > easily extended with KafkaIO if needed. Same data then could be published to > any of the sinks leaving out the choice and enabling the data insertion > pipeline for Flink. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO
[ https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=189537&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189537 ] ASF GitHub Bot logged work on BEAM-6207: Author: ASF GitHub Bot Created on: 24/Jan/19 15:45 Start Date: 24/Jan/19 15:45 Worklog Time Spent: 10m Work Description: kkucharc commented on issue #7612: [BEAM-6207] Added option to publish synthetic data to Kafka topic. URL: https://github.com/apache/beam/pull/7612#issuecomment-457244433 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189537) Time Spent: 40m (was: 0.5h) > extend "Data insertion Pipeline" with Kafka IO > -- > > Key: BEAM-6207 > URL: https://issues.apache.org/jira/browse/BEAM-6207 > Project: Beam > Issue Type: Sub-task > Components: io-java-kafka, testing >Reporter: Lukasz Gajowy >Assignee: Michal Walenia >Priority: Trivial > Time Spent: 40m > Remaining Estimate: 0h > > Since now we have the Data insertion pipeline based on PubSubIO, it can be > easily extended with KafkaIO if needed. Same data then could be published to > any of the sinks leaving out the choice and enabling the data insertion > pipeline for Flink. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO
[ https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=189535&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189535 ] ASF GitHub Bot logged work on BEAM-6207: Author: ASF GitHub Bot Created on: 24/Jan/19 15:42 Start Date: 24/Jan/19 15:42 Worklog Time Spent: 10m Work Description: kkucharc commented on pull request #7612: [BEAM-6207] Added option to publish synthetic data to Kafka topic. URL: https://github.com/apache/beam/pull/7612#discussion_r250654128 ## File path: sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPubSubPublisher.java ## @@ -73,6 +80,11 @@ String getInsertionPipelineTopic(); void setInsertionPipelineTopic(String topic); + +@Description("Kafka server address (optional)") Review comment: I would keep the info about this being optional only in documentation. Lack of `@Validation.Required` is enough. Usually when something is not required it has `@Default.` value. But as long as we treat this pipeline option as condition whether we use Kafka or PubSub, then it can leave it as it is now or change to default value `""` and then change the condition in the 98 line. WDYT? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189535) Time Spent: 0.5h (was: 20m) > extend "Data insertion Pipeline" with Kafka IO > -- > > Key: BEAM-6207 > URL: https://issues.apache.org/jira/browse/BEAM-6207 > Project: Beam > Issue Type: Sub-task > Components: io-java-kafka, testing >Reporter: Lukasz Gajowy >Assignee: Michal Walenia >Priority: Trivial > Time Spent: 0.5h > Remaining Estimate: 0h > > Since now we have the Data insertion pipeline based on PubSubIO, it can be > easily extended with KafkaIO if needed. Same data then could be published to > any of the sinks leaving out the choice and enabling the data insertion > pipeline for Flink. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO
[ https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=189536&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189536 ] ASF GitHub Bot logged work on BEAM-6207: Author: ASF GitHub Bot Created on: 24/Jan/19 15:42 Start Date: 24/Jan/19 15:42 Worklog Time Spent: 10m Work Description: kkucharc commented on pull request #7612: [BEAM-6207] Added option to publish synthetic data to Kafka topic. URL: https://github.com/apache/beam/pull/7612#discussion_r250654172 ## File path: sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPubSubPublisher.java ## @@ -83,14 +95,41 @@ public static void main(String[] args) throws IOException { Pipeline pipeline = Pipeline.create(options); -pipeline -.apply("Read synthetic data", Read.from(new SyntheticBoundedSource(sourceOptions))) -.apply("Map to PubSub messages", MapElements.via(new MapBytesToPubSubMessage())) -.apply("Write to PubSub", PubsubIO.writeMessages().to(options.getInsertionPipelineTopic())); - +if (options.getKafkaBootstrapServerAddress() != null) { Review comment: Personally, I would recommend to split those two pipelines into separate files. @lgajowy WDYT? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189536) Time Spent: 0.5h (was: 20m) > extend "Data insertion Pipeline" with Kafka IO > -- > > Key: BEAM-6207 > URL: https://issues.apache.org/jira/browse/BEAM-6207 > Project: Beam > Issue Type: Sub-task > Components: io-java-kafka, testing >Reporter: Lukasz Gajowy >Assignee: Michal Walenia >Priority: Trivial > Time Spent: 0.5h > Remaining Estimate: 0h > > Since now we have the Data insertion pipeline based on PubSubIO, it can be > easily extended with KafkaIO if needed. Same data then could be published to > any of the sinks leaving out the choice and enabling the data insertion > pipeline for Flink. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO
[ https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=189519&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189519 ] ASF GitHub Bot logged work on BEAM-6207: Author: ASF GitHub Bot Created on: 24/Jan/19 15:13 Start Date: 24/Jan/19 15:13 Worklog Time Spent: 10m Work Description: mwalenia commented on issue #7612: [BEAM-6207] Added option to publish synthetic data to Kafka topic. URL: https://github.com/apache/beam/pull/7612#issuecomment-457232466 One test failure is in Task :beam-runners-google-cloud-dataflow-java-legacy-worker:test, unrelated to these changes. Second one is checkstyle in /src/runners/direct-java/src/main/java/org/apache/beam/runners/direct/portable/job/ReferenceRunnerJobServer.java:147 , there's a Javadoc missing - also unrelated This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189519) Time Spent: 20m (was: 10m) > extend "Data insertion Pipeline" with Kafka IO > -- > > Key: BEAM-6207 > URL: https://issues.apache.org/jira/browse/BEAM-6207 > Project: Beam > Issue Type: Sub-task > Components: io-java-kafka, testing >Reporter: Lukasz Gajowy >Assignee: Michal Walenia >Priority: Trivial > Time Spent: 20m > Remaining Estimate: 0h > > Since now we have the Data insertion pipeline based on PubSubIO, it can be > easily extended with KafkaIO if needed. Same data then could be published to > any of the sinks leaving out the choice and enabling the data insertion > pipeline for Flink. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO
[ https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=189472&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189472 ] ASF GitHub Bot logged work on BEAM-6207: Author: ASF GitHub Bot Created on: 24/Jan/19 13:37 Start Date: 24/Jan/19 13:37 Worklog Time Spent: 10m Work Description: mwalenia commented on issue #7612: [BEAM-6207] Added option to publish synthetic data to Kafka topic. URL: https://github.com/apache/beam/pull/7612#issuecomment-457198972 Hi @kkucharc , could you look at it? Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189472) Time Spent: 10m Remaining Estimate: 0h > extend "Data insertion Pipeline" with Kafka IO > -- > > Key: BEAM-6207 > URL: https://issues.apache.org/jira/browse/BEAM-6207 > Project: Beam > Issue Type: Sub-task > Components: io-java-kafka, testing >Reporter: Lukasz Gajowy >Assignee: Lukasz Gajowy >Priority: Trivial > Time Spent: 10m > Remaining Estimate: 0h > > Since now we have the Data insertion pipeline based on PubSubIO, it can be > easily extended with KafkaIO if needed. Same data then could be published to > any of the sinks leaving out the choice and enabling the data insertion > pipeline for Flink. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)