[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO

2019-01-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=192343&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-192343
 ]

ASF GitHub Bot logged work on BEAM-6207:


Author: ASF GitHub Bot
Created on: 30/Jan/19 15:02
Start Date: 30/Jan/19 15:02
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #7612: [BEAM-6207] 
Added option to publish synthetic data to Kafka topic.
URL: https://github.com/apache/beam/pull/7612
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 192343)
Time Spent: 4h 40m  (was: 4.5h)

> extend "Data insertion Pipeline" with Kafka IO
> --
>
> Key: BEAM-6207
> URL: https://issues.apache.org/jira/browse/BEAM-6207
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-kafka, testing
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Trivial
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Since now we have the Data insertion pipeline based on PubSubIO, it can be 
> easily extended with KafkaIO if needed. Same data then could be published to 
> any of the sinks leaving out the choice and enabling the data insertion 
> pipeline for Flink.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO

2019-01-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=192342&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-192342
 ]

ASF GitHub Bot logged work on BEAM-6207:


Author: ASF GitHub Bot
Created on: 30/Jan/19 15:01
Start Date: 30/Jan/19 15:01
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on issue #7612: [BEAM-6207] Added 
option to publish synthetic data to Kafka topic.
URL: https://github.com/apache/beam/pull/7612#issuecomment-458975707
 
 
   LGTM, thanks! Merging.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 192342)
Time Spent: 4.5h  (was: 4h 20m)

> extend "Data insertion Pipeline" with Kafka IO
> --
>
> Key: BEAM-6207
> URL: https://issues.apache.org/jira/browse/BEAM-6207
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-kafka, testing
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Trivial
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Since now we have the Data insertion pipeline based on PubSubIO, it can be 
> easily extended with KafkaIO if needed. Same data then could be published to 
> any of the sinks leaving out the choice and enabling the data insertion 
> pipeline for Flink.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO

2019-01-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=192231&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-192231
 ]

ASF GitHub Bot logged work on BEAM-6207:


Author: ASF GitHub Bot
Created on: 30/Jan/19 12:30
Start Date: 30/Jan/19 12:30
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on pull request #7612: [BEAM-6207] 
Added option to publish synthetic data to Kafka topic.
URL: https://github.com/apache/beam/pull/7612#discussion_r252235493
 
 

 ##
 File path: 
sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPublisher.java
 ##
 @@ -50,15 +54,22 @@
  * 
  *  ./gradlew :beam-sdks-java-load-tests:run -PloadTest.args='
  *--insertionPipelineTopic=TOPIC_NAME
+ *--kafkaBootstrapServerAddress=SERVER_ADDRESS
+ *--kafkaTopic=KAFKA_TOPIC_NAME
  *--sourceOptions={"numRecords":1000,...}'
- *
-PloadTest.mainClass="org.apache.beam.sdk.loadtests.SyntheticDataPubSubPublisher"
+ *
-PloadTest.mainClass="org.apache.beam.sdk.loadtests.SyntheticDataPublisher"
  *  
+ *
+ * If parameters related to Kafka are provided, the publisher writes to Kafka. 
If both pubsub topic
+ * and Kafka params are present, records will be written to both sinks.
 
 Review comment:
   Sounds better, changed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 192231)
Time Spent: 4h  (was: 3h 50m)

> extend "Data insertion Pipeline" with Kafka IO
> --
>
> Key: BEAM-6207
> URL: https://issues.apache.org/jira/browse/BEAM-6207
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-kafka, testing
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Trivial
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Since now we have the Data insertion pipeline based on PubSubIO, it can be 
> easily extended with KafkaIO if needed. Same data then could be published to 
> any of the sinks leaving out the choice and enabling the data insertion 
> pipeline for Flink.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO

2019-01-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=192233&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-192233
 ]

ASF GitHub Bot logged work on BEAM-6207:


Author: ASF GitHub Bot
Created on: 30/Jan/19 12:31
Start Date: 30/Jan/19 12:31
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on pull request #7612: [BEAM-6207] 
Added option to publish synthetic data to Kafka topic.
URL: https://github.com/apache/beam/pull/7612#discussion_r252236023
 
 

 ##
 File path: 
sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPublisher.java
 ##
 @@ -69,26 +80,64 @@
 void setSourceOptions(String sourceOptions);
 
 @Description("PubSub topic to publish to")
-@Validation.Required
 String getInsertionPipelineTopic();
 
 Review comment:
   Ok, done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 192233)
Time Spent: 4h 20m  (was: 4h 10m)

> extend "Data insertion Pipeline" with Kafka IO
> --
>
> Key: BEAM-6207
> URL: https://issues.apache.org/jira/browse/BEAM-6207
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-kafka, testing
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Trivial
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Since now we have the Data insertion pipeline based on PubSubIO, it can be 
> easily extended with KafkaIO if needed. Same data then could be published to 
> any of the sinks leaving out the choice and enabling the data insertion 
> pipeline for Flink.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO

2019-01-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=192232&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-192232
 ]

ASF GitHub Bot logged work on BEAM-6207:


Author: ASF GitHub Bot
Created on: 30/Jan/19 12:30
Start Date: 30/Jan/19 12:30
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on pull request #7612: [BEAM-6207] 
Added option to publish synthetic data to Kafka topic.
URL: https://github.com/apache/beam/pull/7612#discussion_r252235599
 
 

 ##
 File path: 
sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPublisher.java
 ##
 @@ -50,15 +54,22 @@
  * 
  *  ./gradlew :beam-sdks-java-load-tests:run -PloadTest.args='
  *--insertionPipelineTopic=TOPIC_NAME
+ *--kafkaBootstrapServerAddress=SERVER_ADDRESS
+ *--kafkaTopic=KAFKA_TOPIC_NAME
  *--sourceOptions={"numRecords":1000,...}'
- *
-PloadTest.mainClass="org.apache.beam.sdk.loadtests.SyntheticDataPubSubPublisher"
+ *
-PloadTest.mainClass="org.apache.beam.sdk.loadtests.SyntheticDataPublisher"
  *  
+ *
 
 Review comment:
   Ok, added
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 192232)
Time Spent: 4h 10m  (was: 4h)

> extend "Data insertion Pipeline" with Kafka IO
> --
>
> Key: BEAM-6207
> URL: https://issues.apache.org/jira/browse/BEAM-6207
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-kafka, testing
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Trivial
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Since now we have the Data insertion pipeline based on PubSubIO, it can be 
> easily extended with KafkaIO if needed. Same data then could be published to 
> any of the sinks leaving out the choice and enabling the data insertion 
> pipeline for Flink.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO

2019-01-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=192230&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-192230
 ]

ASF GitHub Bot logged work on BEAM-6207:


Author: ASF GitHub Bot
Created on: 30/Jan/19 12:29
Start Date: 30/Jan/19 12:29
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on pull request #7612: [BEAM-6207] 
Added option to publish synthetic data to Kafka topic.
URL: https://github.com/apache/beam/pull/7612#discussion_r252235416
 
 

 ##
 File path: sdks/java/io/synthetic/build.gradle
 ##
 @@ -29,11 +29,13 @@ dependencies {
   shadow library.java.jackson_annotations
   shadow library.java.jackson_databind
   shadow library.java.guava
+  shadow library.java.kafka_clients
 
   shadowTest library.java.vendored_guava_20_0
   testCompile library.java.junit
   testCompile library.java.hamcrest_core
   testCompile library.java.hamcrest_library
   shadow project(path: ":beam-sdks-java-core", configuration: "shadow")
   shadow project(path: ":beam-runners-direct-java", configuration: "shadow")
+  shadow project(path: ":beam-sdks-java-io-kafka", configuration: "shadow")
 
 Review comment:
   Moved it, as per comment above
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 192230)
Time Spent: 3h 50m  (was: 3h 40m)

> extend "Data insertion Pipeline" with Kafka IO
> --
>
> Key: BEAM-6207
> URL: https://issues.apache.org/jira/browse/BEAM-6207
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-kafka, testing
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Trivial
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Since now we have the Data insertion pipeline based on PubSubIO, it can be 
> easily extended with KafkaIO if needed. Same data then could be published to 
> any of the sinks leaving out the choice and enabling the data insertion 
> pipeline for Flink.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO

2019-01-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=192229&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-192229
 ]

ASF GitHub Bot logged work on BEAM-6207:


Author: ASF GitHub Bot
Created on: 30/Jan/19 12:29
Start Date: 30/Jan/19 12:29
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on pull request #7612: [BEAM-6207] 
Added option to publish synthetic data to Kafka topic.
URL: https://github.com/apache/beam/pull/7612#discussion_r252235347
 
 

 ##
 File path: sdks/java/io/synthetic/build.gradle
 ##
 @@ -29,11 +29,13 @@ dependencies {
   shadow library.java.jackson_annotations
   shadow library.java.jackson_databind
   shadow library.java.guava
+  shadow library.java.kafka_clients
 
 Review comment:
   That's true, I moved it
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 192229)
Time Spent: 3h 40m  (was: 3.5h)

> extend "Data insertion Pipeline" with Kafka IO
> --
>
> Key: BEAM-6207
> URL: https://issues.apache.org/jira/browse/BEAM-6207
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-kafka, testing
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Trivial
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Since now we have the Data insertion pipeline based on PubSubIO, it can be 
> easily extended with KafkaIO if needed. Same data then could be published to 
> any of the sinks leaving out the choice and enabling the data insertion 
> pipeline for Flink.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO

2019-01-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=192210&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-192210
 ]

ASF GitHub Bot logged work on BEAM-6207:


Author: ASF GitHub Bot
Created on: 30/Jan/19 11:41
Start Date: 30/Jan/19 11:41
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #7612: [BEAM-6207] 
Added option to publish synthetic data to Kafka topic.
URL: https://github.com/apache/beam/pull/7612#discussion_r252214536
 
 

 ##
 File path: 
sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPublisher.java
 ##
 @@ -69,26 +80,64 @@
 void setSourceOptions(String sourceOptions);
 
 @Description("PubSub topic to publish to")
-@Validation.Required
 String getInsertionPipelineTopic();
 
 void setInsertionPipelineTopic(String topic);
+
+@Description("Kafka server address")
+String getKafkaBootstrapServerAddress();
+
+void setKafkaBootstrapServerAddress(String address);
+
+@Description("Kafka topic")
+String getKafkaTopic();
+
+void setKafkaTopic(String topic);
   }
 
   public static void main(String[] args) throws IOException {
-Options options = 
PipelineOptionsFactory.fromArgs(args).withValidation().as(Options.class);
+options = 
PipelineOptionsFactory.fromArgs(args).withValidation().as(Options.class);
 
 SyntheticSourceOptions sourceOptions =
 SyntheticOptions.fromJsonString(options.getSourceOptions(), 
SyntheticSourceOptions.class);
 
 Pipeline pipeline = Pipeline.create(options);
+PCollection> syntheticData =
 
 Review comment:
   looks good!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 192210)
Time Spent: 3h 20m  (was: 3h 10m)

> extend "Data insertion Pipeline" with Kafka IO
> --
>
> Key: BEAM-6207
> URL: https://issues.apache.org/jira/browse/BEAM-6207
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-kafka, testing
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Trivial
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Since now we have the Data insertion pipeline based on PubSubIO, it can be 
> easily extended with KafkaIO if needed. Same data then could be published to 
> any of the sinks leaving out the choice and enabling the data insertion 
> pipeline for Flink.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO

2019-01-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=192207&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-192207
 ]

ASF GitHub Bot logged work on BEAM-6207:


Author: ASF GitHub Bot
Created on: 30/Jan/19 11:41
Start Date: 30/Jan/19 11:41
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #7612: [BEAM-6207] 
Added option to publish synthetic data to Kafka topic.
URL: https://github.com/apache/beam/pull/7612#discussion_r252213775
 
 

 ##
 File path: 
sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPublisher.java
 ##
 @@ -69,26 +80,64 @@
 void setSourceOptions(String sourceOptions);
 
 @Description("PubSub topic to publish to")
-@Validation.Required
 String getInsertionPipelineTopic();
 
 Review comment:
   Can you change this to `getPubSubTopic` too? `InsertionPipelineTopic` is too 
generic now since we have the Kafka option too.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 192207)
Time Spent: 3h  (was: 2h 50m)

> extend "Data insertion Pipeline" with Kafka IO
> --
>
> Key: BEAM-6207
> URL: https://issues.apache.org/jira/browse/BEAM-6207
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-kafka, testing
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Trivial
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Since now we have the Data insertion pipeline based on PubSubIO, it can be 
> easily extended with KafkaIO if needed. Same data then could be published to 
> any of the sinks leaving out the choice and enabling the data insertion 
> pipeline for Flink.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO

2019-01-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=192212&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-192212
 ]

ASF GitHub Bot logged work on BEAM-6207:


Author: ASF GitHub Bot
Created on: 30/Jan/19 11:41
Start Date: 30/Jan/19 11:41
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #7612: [BEAM-6207] 
Added option to publish synthetic data to Kafka topic.
URL: https://github.com/apache/beam/pull/7612#discussion_r252220021
 
 

 ##
 File path: 
sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPublisher.java
 ##
 @@ -50,15 +54,22 @@
  * 
  *  ./gradlew :beam-sdks-java-load-tests:run -PloadTest.args='
  *--insertionPipelineTopic=TOPIC_NAME
+ *--kafkaBootstrapServerAddress=SERVER_ADDRESS
+ *--kafkaTopic=KAFKA_TOPIC_NAME
  *--sourceOptions={"numRecords":1000,...}'
- *
-PloadTest.mainClass="org.apache.beam.sdk.loadtests.SyntheticDataPubSubPublisher"
+ *
-PloadTest.mainClass="org.apache.beam.sdk.loadtests.SyntheticDataPublisher"
  *  
+ *
 
 Review comment:
   missing javadoc `` 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 192212)

> extend "Data insertion Pipeline" with Kafka IO
> --
>
> Key: BEAM-6207
> URL: https://issues.apache.org/jira/browse/BEAM-6207
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-kafka, testing
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Trivial
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Since now we have the Data insertion pipeline based on PubSubIO, it can be 
> easily extended with KafkaIO if needed. Same data then could be published to 
> any of the sinks leaving out the choice and enabling the data insertion 
> pipeline for Flink.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO

2019-01-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=192213&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-192213
 ]

ASF GitHub Bot logged work on BEAM-6207:


Author: ASF GitHub Bot
Created on: 30/Jan/19 11:41
Start Date: 30/Jan/19 11:41
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #7612: [BEAM-6207] 
Added option to publish synthetic data to Kafka topic.
URL: https://github.com/apache/beam/pull/7612#discussion_r252216354
 
 

 ##
 File path: sdks/java/io/synthetic/build.gradle
 ##
 @@ -29,11 +29,13 @@ dependencies {
   shadow library.java.jackson_annotations
   shadow library.java.jackson_databind
   shadow library.java.guava
+  shadow library.java.kafka_clients
 
   shadowTest library.java.vendored_guava_20_0
   testCompile library.java.junit
   testCompile library.java.hamcrest_core
   testCompile library.java.hamcrest_library
   shadow project(path: ":beam-sdks-java-core", configuration: "shadow")
   shadow project(path: ":beam-runners-direct-java", configuration: "shadow")
+  shadow project(path: ":beam-sdks-java-io-kafka", configuration: "shadow")
 
 Review comment:
   (as the comment above)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 192213)
Time Spent: 3.5h  (was: 3h 20m)

> extend "Data insertion Pipeline" with Kafka IO
> --
>
> Key: BEAM-6207
> URL: https://issues.apache.org/jira/browse/BEAM-6207
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-kafka, testing
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Trivial
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Since now we have the Data insertion pipeline based on PubSubIO, it can be 
> easily extended with KafkaIO if needed. Same data then could be published to 
> any of the sinks leaving out the choice and enabling the data insertion 
> pipeline for Flink.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO

2019-01-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=192208&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-192208
 ]

ASF GitHub Bot logged work on BEAM-6207:


Author: ASF GitHub Bot
Created on: 30/Jan/19 11:41
Start Date: 30/Jan/19 11:41
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #7612: [BEAM-6207] 
Added option to publish synthetic data to Kafka topic.
URL: https://github.com/apache/beam/pull/7612#discussion_r252214647
 
 

 ##
 File path: 
sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPublisher.java
 ##
 @@ -50,15 +54,22 @@
  * 
  *  ./gradlew :beam-sdks-java-load-tests:run -PloadTest.args='
  *--insertionPipelineTopic=TOPIC_NAME
 
 Review comment:
   Please remember to change here too
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 192208)
Time Spent: 3h 10m  (was: 3h)

> extend "Data insertion Pipeline" with Kafka IO
> --
>
> Key: BEAM-6207
> URL: https://issues.apache.org/jira/browse/BEAM-6207
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-kafka, testing
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Trivial
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Since now we have the Data insertion pipeline based on PubSubIO, it can be 
> easily extended with KafkaIO if needed. Same data then could be published to 
> any of the sinks leaving out the choice and enabling the data insertion 
> pipeline for Flink.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO

2019-01-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=192209&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-192209
 ]

ASF GitHub Bot logged work on BEAM-6207:


Author: ASF GitHub Bot
Created on: 30/Jan/19 11:41
Start Date: 30/Jan/19 11:41
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #7612: [BEAM-6207] 
Added option to publish synthetic data to Kafka topic.
URL: https://github.com/apache/beam/pull/7612#discussion_r252216289
 
 

 ##
 File path: sdks/java/io/synthetic/build.gradle
 ##
 @@ -29,11 +29,13 @@ dependencies {
   shadow library.java.jackson_annotations
   shadow library.java.jackson_databind
   shadow library.java.guava
+  shadow library.java.kafka_clients
 
 Review comment:
   Why is this needed in `sdks/java/io/synthetic/build.gradle`? Shouldn't we 
add this in `sdks/java/testing/load-tests/build.gradle`?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 192209)
Time Spent: 3h 10m  (was: 3h)

> extend "Data insertion Pipeline" with Kafka IO
> --
>
> Key: BEAM-6207
> URL: https://issues.apache.org/jira/browse/BEAM-6207
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-kafka, testing
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Trivial
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Since now we have the Data insertion pipeline based on PubSubIO, it can be 
> easily extended with KafkaIO if needed. Same data then could be published to 
> any of the sinks leaving out the choice and enabling the data insertion 
> pipeline for Flink.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO

2019-01-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=192211&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-192211
 ]

ASF GitHub Bot logged work on BEAM-6207:


Author: ASF GitHub Bot
Created on: 30/Jan/19 11:41
Start Date: 30/Jan/19 11:41
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #7612: [BEAM-6207] 
Added option to publish synthetic data to Kafka topic.
URL: https://github.com/apache/beam/pull/7612#discussion_r252219843
 
 

 ##
 File path: 
sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPublisher.java
 ##
 @@ -50,15 +54,22 @@
  * 
  *  ./gradlew :beam-sdks-java-load-tests:run -PloadTest.args='
  *--insertionPipelineTopic=TOPIC_NAME
+ *--kafkaBootstrapServerAddress=SERVER_ADDRESS
+ *--kafkaTopic=KAFKA_TOPIC_NAME
  *--sourceOptions={"numRecords":1000,...}'
- *
-PloadTest.mainClass="org.apache.beam.sdk.loadtests.SyntheticDataPubSubPublisher"
+ *
-PloadTest.mainClass="org.apache.beam.sdk.loadtests.SyntheticDataPublisher"
  *  
+ *
+ * If parameters related to Kafka are provided, the publisher writes to Kafka. 
If both pubsub topic
+ * and Kafka params are present, records will be written to both sinks.
 
 Review comment:
   It would be even more clear if the sentence gives instructions for both 
Kafka and Pubsub, so I suggest: 
   
   ```
   If parameters related to a specific sink are provided (Kafka or PubSub), the 
pipeline writes to the sink. Writing to both sinks is also acceptable.
   ``` 
   (Or similar but including both runners)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 192211)

> extend "Data insertion Pipeline" with Kafka IO
> --
>
> Key: BEAM-6207
> URL: https://issues.apache.org/jira/browse/BEAM-6207
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-kafka, testing
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Trivial
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Since now we have the Data insertion pipeline based on PubSubIO, it can be 
> easily extended with KafkaIO if needed. Same data then could be published to 
> any of the sinks leaving out the choice and enabling the data insertion 
> pipeline for Flink.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO

2019-01-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=191617&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-191617
 ]

ASF GitHub Bot logged work on BEAM-6207:


Author: ASF GitHub Bot
Created on: 29/Jan/19 11:06
Start Date: 29/Jan/19 11:06
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on pull request #7612: [BEAM-6207] 
Added option to publish synthetic data to Kafka topic.
URL: https://github.com/apache/beam/pull/7612#discussion_r251787329
 
 

 ##
 File path: 
sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPubSubPublisher.java
 ##
 @@ -50,9 +54,13 @@
  * 
  *  ./gradlew :beam-sdks-java-load-tests:run -PloadTest.args='
  *--insertionPipelineTopic=TOPIC_NAME
+ *--kafkaBootstrapServerAddress=SERVER_ADDRESS
  *--sourceOptions={"numRecords":1000,...}'
  *
-PloadTest.mainClass="org.apache.beam.sdk.loadtests.SyntheticDataPubSubPublisher"
  *  
+ *
+ * Parameter kafkaBootstrapServerAddress is optional. If provided, pipeline 
topic will be treated as
 
 Review comment:
   That's a good idea, implemented in 2nd commit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 191617)
Time Spent: 2h 40m  (was: 2.5h)

> extend "Data insertion Pipeline" with Kafka IO
> --
>
> Key: BEAM-6207
> URL: https://issues.apache.org/jira/browse/BEAM-6207
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-kafka, testing
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Trivial
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Since now we have the Data insertion pipeline based on PubSubIO, it can be 
> easily extended with KafkaIO if needed. Same data then could be published to 
> any of the sinks leaving out the choice and enabling the data insertion 
> pipeline for Flink.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO

2019-01-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=191618&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-191618
 ]

ASF GitHub Bot logged work on BEAM-6207:


Author: ASF GitHub Bot
Created on: 29/Jan/19 11:06
Start Date: 29/Jan/19 11:06
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on pull request #7612: [BEAM-6207] 
Added option to publish synthetic data to Kafka topic.
URL: https://github.com/apache/beam/pull/7612#discussion_r251787479
 
 

 ##
 File path: 
sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPubSubPublisher.java
 ##
 @@ -83,14 +97,41 @@ public static void main(String[] args) throws IOException {
 
 Pipeline pipeline = Pipeline.create(options);
 
-pipeline
-.apply("Read synthetic data", Read.from(new 
SyntheticBoundedSource(sourceOptions)))
-.apply("Map to PubSub messages", MapElements.via(new 
MapBytesToPubSubMessage()))
-.apply("Write to PubSub", 
PubsubIO.writeMessages().to(options.getInsertionPipelineTopic()));
-
+if (!options.getKafkaBootstrapServerAddress().isEmpty()) {
 
 Review comment:
   implemented, thanks for the suggestion!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 191618)
Time Spent: 2h 50m  (was: 2h 40m)

> extend "Data insertion Pipeline" with Kafka IO
> --
>
> Key: BEAM-6207
> URL: https://issues.apache.org/jira/browse/BEAM-6207
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-kafka, testing
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Trivial
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Since now we have the Data insertion pipeline based on PubSubIO, it can be 
> easily extended with KafkaIO if needed. Same data then could be published to 
> any of the sinks leaving out the choice and enabling the data insertion 
> pipeline for Flink.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO

2019-01-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=191083&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-191083
 ]

ASF GitHub Bot logged work on BEAM-6207:


Author: ASF GitHub Bot
Created on: 28/Jan/19 15:56
Start Date: 28/Jan/19 15:56
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #7612: [BEAM-6207] 
Added option to publish synthetic data to Kafka topic.
URL: https://github.com/apache/beam/pull/7612#discussion_r251464154
 
 

 ##
 File path: 
sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPubSubPublisher.java
 ##
 @@ -73,6 +81,12 @@
 String getInsertionPipelineTopic();
 
 void setInsertionPipelineTopic(String topic);
+
+@Description("Kafka server address")
+@Default.String("")
 
 Review comment:
   Could you remove this default and do a null-check instead?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 191083)
Time Spent: 1h  (was: 50m)

> extend "Data insertion Pipeline" with Kafka IO
> --
>
> Key: BEAM-6207
> URL: https://issues.apache.org/jira/browse/BEAM-6207
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-kafka, testing
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Trivial
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Since now we have the Data insertion pipeline based on PubSubIO, it can be 
> easily extended with KafkaIO if needed. Same data then could be published to 
> any of the sinks leaving out the choice and enabling the data insertion 
> pipeline for Flink.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO

2019-01-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=191094&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-191094
 ]

ASF GitHub Bot logged work on BEAM-6207:


Author: ASF GitHub Bot
Created on: 28/Jan/19 16:05
Start Date: 28/Jan/19 16:05
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #7612: [BEAM-6207] 
Added option to publish synthetic data to Kafka topic.
URL: https://github.com/apache/beam/pull/7612#discussion_r251480159
 
 

 ##
 File path: 
sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPubSubPublisher.java
 ##
 @@ -73,6 +81,12 @@
 String getInsertionPipelineTopic();
 
 void setInsertionPipelineTopic(String topic);
+
+@Description("Kafka server address")
+@Default.String("")
 
 Review comment:
   one last thing: we need to change the class name as it's not only about 
`PubSub` now :)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 191094)
Time Spent: 2h 10m  (was: 2h)

> extend "Data insertion Pipeline" with Kafka IO
> --
>
> Key: BEAM-6207
> URL: https://issues.apache.org/jira/browse/BEAM-6207
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-kafka, testing
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Trivial
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Since now we have the Data insertion pipeline based on PubSubIO, it can be 
> easily extended with KafkaIO if needed. Same data then could be published to 
> any of the sinks leaving out the choice and enabling the data insertion 
> pipeline for Flink.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO

2019-01-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=191092&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-191092
 ]

ASF GitHub Bot logged work on BEAM-6207:


Author: ASF GitHub Bot
Created on: 28/Jan/19 16:03
Start Date: 28/Jan/19 16:03
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #7612: [BEAM-6207] 
Added option to publish synthetic data to Kafka topic.
URL: https://github.com/apache/beam/pull/7612#discussion_r251479212
 
 

 ##
 File path: 
sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPubSubPublisher.java
 ##
 @@ -50,9 +54,13 @@
  * 
  *  ./gradlew :beam-sdks-java-load-tests:run -PloadTest.args='
  *--insertionPipelineTopic=TOPIC_NAME
+ *--kafkaBootstrapServerAddress=SERVER_ADDRESS
  *--sourceOptions={"numRecords":1000,...}'
  *
-PloadTest.mainClass="org.apache.beam.sdk.loadtests.SyntheticDataPubSubPublisher"
  *  
+ *
+ * Parameter kafkaBootstrapServerAddress is optional. If provided, pipeline 
topic will be treated as
+ * Kafka topic name and records will be published to Kafka instead of PubSub.
  */
 public class SyntheticDataPubSubPublisher {
 
 Review comment:
   Let's change this name to abstract from `PubSub`. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 191092)
Time Spent: 1h 50m  (was: 1h 40m)

> extend "Data insertion Pipeline" with Kafka IO
> --
>
> Key: BEAM-6207
> URL: https://issues.apache.org/jira/browse/BEAM-6207
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-kafka, testing
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Trivial
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Since now we have the Data insertion pipeline based on PubSubIO, it can be 
> easily extended with KafkaIO if needed. Same data then could be published to 
> any of the sinks leaving out the choice and enabling the data insertion 
> pipeline for Flink.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO

2019-01-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=191096&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-191096
 ]

ASF GitHub Bot logged work on BEAM-6207:


Author: ASF GitHub Bot
Created on: 28/Jan/19 16:05
Start Date: 28/Jan/19 16:05
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on issue #7612: [BEAM-6207] Added 
option to publish synthetic data to Kafka topic.
URL: https://github.com/apache/beam/pull/7612#issuecomment-458191562
 
 
   One last thing: could you change the name to abstract from `PubSub` as it's 
not only about pubsub now?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 191096)
Time Spent: 2.5h  (was: 2h 20m)

> extend "Data insertion Pipeline" with Kafka IO
> --
>
> Key: BEAM-6207
> URL: https://issues.apache.org/jira/browse/BEAM-6207
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-kafka, testing
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Trivial
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Since now we have the Data insertion pipeline based on PubSubIO, it can be 
> easily extended with KafkaIO if needed. Same data then could be published to 
> any of the sinks leaving out the choice and enabling the data insertion 
> pipeline for Flink.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO

2019-01-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=191095&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-191095
 ]

ASF GitHub Bot logged work on BEAM-6207:


Author: ASF GitHub Bot
Created on: 28/Jan/19 16:05
Start Date: 28/Jan/19 16:05
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on issue #7612: [BEAM-6207] Added 
option to publish synthetic data to Kafka topic.
URL: https://github.com/apache/beam/pull/7612#issuecomment-458191562
 
 
   One last thing: could you change the name to abstract from `PubSub` as it's 
not only about pubsub now?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 191095)
Time Spent: 2h 20m  (was: 2h 10m)

> extend "Data insertion Pipeline" with Kafka IO
> --
>
> Key: BEAM-6207
> URL: https://issues.apache.org/jira/browse/BEAM-6207
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-kafka, testing
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Trivial
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Since now we have the Data insertion pipeline based on PubSubIO, it can be 
> easily extended with KafkaIO if needed. Same data then could be published to 
> any of the sinks leaving out the choice and enabling the data insertion 
> pipeline for Flink.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO

2019-01-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=191093&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-191093
 ]

ASF GitHub Bot logged work on BEAM-6207:


Author: ASF GitHub Bot
Created on: 28/Jan/19 16:04
Start Date: 28/Jan/19 16:04
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #7612: [BEAM-6207] 
Added option to publish synthetic data to Kafka topic.
URL: https://github.com/apache/beam/pull/7612#discussion_r251480159
 
 

 ##
 File path: 
sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPubSubPublisher.java
 ##
 @@ -73,6 +81,12 @@
 String getInsertionPipelineTopic();
 
 void setInsertionPipelineTopic(String topic);
+
+@Description("Kafka server address")
+@Default.String("")
 
 Review comment:
   one last thing: we need to change the class name as it's not only about 
`PubSub` now :)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 191093)
Time Spent: 2h  (was: 1h 50m)

> extend "Data insertion Pipeline" with Kafka IO
> --
>
> Key: BEAM-6207
> URL: https://issues.apache.org/jira/browse/BEAM-6207
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-kafka, testing
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Trivial
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Since now we have the Data insertion pipeline based on PubSubIO, it can be 
> easily extended with KafkaIO if needed. Same data then could be published to 
> any of the sinks leaving out the choice and enabling the data insertion 
> pipeline for Flink.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO

2019-01-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=191091&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-191091
 ]

ASF GitHub Bot logged work on BEAM-6207:


Author: ASF GitHub Bot
Created on: 28/Jan/19 16:02
Start Date: 28/Jan/19 16:02
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #7612: [BEAM-6207] 
Added option to publish synthetic data to Kafka topic.
URL: https://github.com/apache/beam/pull/7612#discussion_r251479212
 
 

 ##
 File path: 
sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPubSubPublisher.java
 ##
 @@ -50,9 +54,13 @@
  * 
  *  ./gradlew :beam-sdks-java-load-tests:run -PloadTest.args='
  *--insertionPipelineTopic=TOPIC_NAME
+ *--kafkaBootstrapServerAddress=SERVER_ADDRESS
  *--sourceOptions={"numRecords":1000,...}'
  *
-PloadTest.mainClass="org.apache.beam.sdk.loadtests.SyntheticDataPubSubPublisher"
  *  
+ *
+ * Parameter kafkaBootstrapServerAddress is optional. If provided, pipeline 
topic will be treated as
+ * Kafka topic name and records will be published to Kafka instead of PubSub.
  */
 public class SyntheticDataPubSubPublisher {
 
 Review comment:
   Let's change this name to abstract from `PubSub`. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 191091)
Time Spent: 1h 40m  (was: 1.5h)

> extend "Data insertion Pipeline" with Kafka IO
> --
>
> Key: BEAM-6207
> URL: https://issues.apache.org/jira/browse/BEAM-6207
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-kafka, testing
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Trivial
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Since now we have the Data insertion pipeline based on PubSubIO, it can be 
> easily extended with KafkaIO if needed. Same data then could be published to 
> any of the sinks leaving out the choice and enabling the data insertion 
> pipeline for Flink.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO

2019-01-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=191090&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-191090
 ]

ASF GitHub Bot logged work on BEAM-6207:


Author: ASF GitHub Bot
Created on: 28/Jan/19 15:59
Start Date: 28/Jan/19 15:59
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #7612: [BEAM-6207] 
Added option to publish synthetic data to Kafka topic.
URL: https://github.com/apache/beam/pull/7612#discussion_r251477916
 
 

 ##
 File path: 
sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPubSubPublisher.java
 ##
 @@ -83,14 +95,41 @@ public static void main(String[] args) throws IOException {
 
 Pipeline pipeline = Pipeline.create(options);
 
-pipeline
-.apply("Read synthetic data", Read.from(new 
SyntheticBoundedSource(sourceOptions)))
-.apply("Map to PubSub messages", MapElements.via(new 
MapBytesToPubSubMessage()))
-.apply("Write to PubSub", 
PubsubIO.writeMessages().to(options.getInsertionPipelineTopic()));
-
+if (options.getKafkaBootstrapServerAddress() != null) {
 
 Review comment:
   IMO it can stay as one file. After all, we can have multiple sinks in one 
pipeline and even publish to both at the same time if we want to (no idea that 
this is ever going to be a case, though...  :) ) . I'd split it like I 
suggested in the review comments below. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 191090)
Time Spent: 1.5h  (was: 1h 20m)

> extend "Data insertion Pipeline" with Kafka IO
> --
>
> Key: BEAM-6207
> URL: https://issues.apache.org/jira/browse/BEAM-6207
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-kafka, testing
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Trivial
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Since now we have the Data insertion pipeline based on PubSubIO, it can be 
> easily extended with KafkaIO if needed. Same data then could be published to 
> any of the sinks leaving out the choice and enabling the data insertion 
> pipeline for Flink.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO

2019-01-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=191084&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-191084
 ]

ASF GitHub Bot logged work on BEAM-6207:


Author: ASF GitHub Bot
Created on: 28/Jan/19 15:56
Start Date: 28/Jan/19 15:56
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #7612: [BEAM-6207] 
Added option to publish synthetic data to Kafka topic.
URL: https://github.com/apache/beam/pull/7612#discussion_r251473502
 
 

 ##
 File path: 
sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPubSubPublisher.java
 ##
 @@ -83,14 +97,41 @@ public static void main(String[] args) throws IOException {
 
 Pipeline pipeline = Pipeline.create(options);
 
-pipeline
-.apply("Read synthetic data", Read.from(new 
SyntheticBoundedSource(sourceOptions)))
-.apply("Map to PubSub messages", MapElements.via(new 
MapBytesToPubSubMessage()))
-.apply("Write to PubSub", 
PubsubIO.writeMessages().to(options.getInsertionPipelineTopic()));
-
+if (!options.getKafkaBootstrapServerAddress().isEmpty()) {
 
 Review comment:
   Those sinks are somewhat independent and we could let them stay this way. We 
can code it like this (including the division of the methods):
   
   ```
   if ()
 writeToPubsub(collection);
   }
   ```
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 191084)
Time Spent: 1h 10m  (was: 1h)

> extend "Data insertion Pipeline" with Kafka IO
> --
>
> Key: BEAM-6207
> URL: https://issues.apache.org/jira/browse/BEAM-6207
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-kafka, testing
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Trivial
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Since now we have the Data insertion pipeline based on PubSubIO, it can be 
> easily extended with KafkaIO if needed. Same data then could be published to 
> any of the sinks leaving out the choice and enabling the data insertion 
> pipeline for Flink.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO

2019-01-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=191085&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-191085
 ]

ASF GitHub Bot logged work on BEAM-6207:


Author: ASF GitHub Bot
Created on: 28/Jan/19 15:56
Start Date: 28/Jan/19 15:56
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #7612: [BEAM-6207] 
Added option to publish synthetic data to Kafka topic.
URL: https://github.com/apache/beam/pull/7612#discussion_r251474125
 
 

 ##
 File path: 
sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPubSubPublisher.java
 ##
 @@ -83,14 +97,41 @@ public static void main(String[] args) throws IOException {
 
 Pipeline pipeline = Pipeline.create(options);
 
-pipeline
-.apply("Read synthetic data", Read.from(new 
SyntheticBoundedSource(sourceOptions)))
-.apply("Map to PubSub messages", MapElements.via(new 
MapBytesToPubSubMessage()))
-.apply("Write to PubSub", 
PubsubIO.writeMessages().to(options.getInsertionPipelineTopic()));
-
+if (!options.getKafkaBootstrapServerAddress().isEmpty()) {
+  pipeline
+  .apply("Read synthetic data", Read.from(new 
SyntheticBoundedSource(sourceOptions)))
+  .apply("Map to Kafka messages", MapElements.via(new MapKVToString()))
+  .apply(
+  "Write to Kafka",
+  KafkaIO.write()
+  
.withBootstrapServers(options.getKafkaBootstrapServerAddress())
+  .withTopic(options.getInsertionPipelineTopic())
+  .withValueSerializer(StringSerializer.class)
+  .values());
+} else {
+  pipeline
+  .apply("Read synthetic data", Read.from(new 
SyntheticBoundedSource(sourceOptions)))
+  .apply("Map to PubSub messages", MapElements.via(new 
MapBytesToPubSubMessage()))
+  .apply(
+  "Write to PubSub", 
PubsubIO.writeMessages().to(options.getInsertionPipelineTopic()));
+}
 pipeline.run().waitUntilFinish();
   }
 
+  private static class MapKVToString extends SimpleFunction, String> {
+@Override
+public String apply(KV input) {
+  StringBuilder stringBuilder =
+  new StringBuilder()
 
 Review comment:
   Hint: an "oneliner" for this can be done using `String.format("{%s,%s}", 
Arrays.toString(input.getKey()), Arrays.toString(input.getValue()))`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 191085)
Time Spent: 1h 20m  (was: 1h 10m)

> extend "Data insertion Pipeline" with Kafka IO
> --
>
> Key: BEAM-6207
> URL: https://issues.apache.org/jira/browse/BEAM-6207
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-kafka, testing
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Trivial
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Since now we have the Data insertion pipeline based on PubSubIO, it can be 
> easily extended with KafkaIO if needed. Same data then could be published to 
> any of the sinks leaving out the choice and enabling the data insertion 
> pipeline for Flink.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO

2019-01-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=191086&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-191086
 ]

ASF GitHub Bot logged work on BEAM-6207:


Author: ASF GitHub Bot
Created on: 28/Jan/19 15:56
Start Date: 28/Jan/19 15:56
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #7612: [BEAM-6207] 
Added option to publish synthetic data to Kafka topic.
URL: https://github.com/apache/beam/pull/7612#discussion_r251465369
 
 

 ##
 File path: 
sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPubSubPublisher.java
 ##
 @@ -50,9 +54,13 @@
  * 
  *  ./gradlew :beam-sdks-java-load-tests:run -PloadTest.args='
  *--insertionPipelineTopic=TOPIC_NAME
+ *--kafkaBootstrapServerAddress=SERVER_ADDRESS
  *--sourceOptions={"numRecords":1000,...}'
  *
-PloadTest.mainClass="org.apache.beam.sdk.loadtests.SyntheticDataPubSubPublisher"
  *  
+ *
+ * Parameter kafkaBootstrapServerAddress is optional. If provided, pipeline 
topic will be treated as
 
 Review comment:
   I think we can do it even better. This pipeline could be designed in such a 
way that allows publishing to every sink type at the same time. So having: 
   
- kafka related options & pubsub related options we can publish to both 
Kafka and Pubsub 
- kafka related options: publish only to Kafka
- pubsub related options: publish only to PubSub
   
   WDYT?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 191086)

> extend "Data insertion Pipeline" with Kafka IO
> --
>
> Key: BEAM-6207
> URL: https://issues.apache.org/jira/browse/BEAM-6207
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-kafka, testing
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Trivial
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Since now we have the Data insertion pipeline based on PubSubIO, it can be 
> easily extended with KafkaIO if needed. Same data then could be published to 
> any of the sinks leaving out the choice and enabling the data insertion 
> pipeline for Flink.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO

2019-01-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=189908&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189908
 ]

ASF GitHub Bot logged work on BEAM-6207:


Author: ASF GitHub Bot
Created on: 25/Jan/19 09:15
Start Date: 25/Jan/19 09:15
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on pull request #7612: [BEAM-6207] 
Added option to publish synthetic data to Kafka topic.
URL: https://github.com/apache/beam/pull/7612#discussion_r250912490
 
 

 ##
 File path: 
sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPubSubPublisher.java
 ##
 @@ -73,6 +80,11 @@
 String getInsertionPipelineTopic();
 
 void setInsertionPipelineTopic(String topic);
+
+@Description("Kafka server address (optional)")
 
 Review comment:
   That's a good idea, done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189908)
Time Spent: 50m  (was: 40m)

> extend "Data insertion Pipeline" with Kafka IO
> --
>
> Key: BEAM-6207
> URL: https://issues.apache.org/jira/browse/BEAM-6207
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-kafka, testing
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Trivial
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Since now we have the Data insertion pipeline based on PubSubIO, it can be 
> easily extended with KafkaIO if needed. Same data then could be published to 
> any of the sinks leaving out the choice and enabling the data insertion 
> pipeline for Flink.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=189537&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189537
 ]

ASF GitHub Bot logged work on BEAM-6207:


Author: ASF GitHub Bot
Created on: 24/Jan/19 15:45
Start Date: 24/Jan/19 15:45
Worklog Time Spent: 10m 
  Work Description: kkucharc commented on issue #7612: [BEAM-6207] Added 
option to publish synthetic data to Kafka topic.
URL: https://github.com/apache/beam/pull/7612#issuecomment-457244433
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189537)
Time Spent: 40m  (was: 0.5h)

> extend "Data insertion Pipeline" with Kafka IO
> --
>
> Key: BEAM-6207
> URL: https://issues.apache.org/jira/browse/BEAM-6207
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-kafka, testing
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Trivial
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Since now we have the Data insertion pipeline based on PubSubIO, it can be 
> easily extended with KafkaIO if needed. Same data then could be published to 
> any of the sinks leaving out the choice and enabling the data insertion 
> pipeline for Flink.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=189535&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189535
 ]

ASF GitHub Bot logged work on BEAM-6207:


Author: ASF GitHub Bot
Created on: 24/Jan/19 15:42
Start Date: 24/Jan/19 15:42
Worklog Time Spent: 10m 
  Work Description: kkucharc commented on pull request #7612: [BEAM-6207] 
Added option to publish synthetic data to Kafka topic.
URL: https://github.com/apache/beam/pull/7612#discussion_r250654128
 
 

 ##
 File path: 
sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPubSubPublisher.java
 ##
 @@ -73,6 +80,11 @@
 String getInsertionPipelineTopic();
 
 void setInsertionPipelineTopic(String topic);
+
+@Description("Kafka server address (optional)")
 
 Review comment:
   I would keep the info about this being optional only in documentation. Lack 
of `@Validation.Required` is enough. 
   
   Usually when something is not required it has `@Default.` value. But as long 
as we treat this pipeline option as condition whether we use Kafka or PubSub, 
then it can leave it as it is now or change to default value `""` and then 
change the condition in the 98 line. WDYT?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189535)
Time Spent: 0.5h  (was: 20m)

> extend "Data insertion Pipeline" with Kafka IO
> --
>
> Key: BEAM-6207
> URL: https://issues.apache.org/jira/browse/BEAM-6207
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-kafka, testing
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Trivial
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Since now we have the Data insertion pipeline based on PubSubIO, it can be 
> easily extended with KafkaIO if needed. Same data then could be published to 
> any of the sinks leaving out the choice and enabling the data insertion 
> pipeline for Flink.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=189536&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189536
 ]

ASF GitHub Bot logged work on BEAM-6207:


Author: ASF GitHub Bot
Created on: 24/Jan/19 15:42
Start Date: 24/Jan/19 15:42
Worklog Time Spent: 10m 
  Work Description: kkucharc commented on pull request #7612: [BEAM-6207] 
Added option to publish synthetic data to Kafka topic.
URL: https://github.com/apache/beam/pull/7612#discussion_r250654172
 
 

 ##
 File path: 
sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPubSubPublisher.java
 ##
 @@ -83,14 +95,41 @@ public static void main(String[] args) throws IOException {
 
 Pipeline pipeline = Pipeline.create(options);
 
-pipeline
-.apply("Read synthetic data", Read.from(new 
SyntheticBoundedSource(sourceOptions)))
-.apply("Map to PubSub messages", MapElements.via(new 
MapBytesToPubSubMessage()))
-.apply("Write to PubSub", 
PubsubIO.writeMessages().to(options.getInsertionPipelineTopic()));
-
+if (options.getKafkaBootstrapServerAddress() != null) {
 
 Review comment:
   Personally, I would recommend to split those two pipelines into separate 
files. 
   
   @lgajowy WDYT?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189536)
Time Spent: 0.5h  (was: 20m)

> extend "Data insertion Pipeline" with Kafka IO
> --
>
> Key: BEAM-6207
> URL: https://issues.apache.org/jira/browse/BEAM-6207
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-kafka, testing
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Trivial
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Since now we have the Data insertion pipeline based on PubSubIO, it can be 
> easily extended with KafkaIO if needed. Same data then could be published to 
> any of the sinks leaving out the choice and enabling the data insertion 
> pipeline for Flink.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=189519&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189519
 ]

ASF GitHub Bot logged work on BEAM-6207:


Author: ASF GitHub Bot
Created on: 24/Jan/19 15:13
Start Date: 24/Jan/19 15:13
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #7612: [BEAM-6207] Added 
option to publish synthetic data to Kafka topic.
URL: https://github.com/apache/beam/pull/7612#issuecomment-457232466
 
 
   One test failure is in Task 
:beam-runners-google-cloud-dataflow-java-legacy-worker:test, unrelated to these 
changes.
   Second one is checkstyle in 
/src/runners/direct-java/src/main/java/org/apache/beam/runners/direct/portable/job/ReferenceRunnerJobServer.java:147
   , there's a Javadoc missing - also unrelated
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189519)
Time Spent: 20m  (was: 10m)

> extend "Data insertion Pipeline" with Kafka IO
> --
>
> Key: BEAM-6207
> URL: https://issues.apache.org/jira/browse/BEAM-6207
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-kafka, testing
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Trivial
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Since now we have the Data insertion pipeline based on PubSubIO, it can be 
> easily extended with KafkaIO if needed. Same data then could be published to 
> any of the sinks leaving out the choice and enabling the data insertion 
> pipeline for Flink.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6207) extend "Data insertion Pipeline" with Kafka IO

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6207?focusedWorklogId=189472&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189472
 ]

ASF GitHub Bot logged work on BEAM-6207:


Author: ASF GitHub Bot
Created on: 24/Jan/19 13:37
Start Date: 24/Jan/19 13:37
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #7612: [BEAM-6207] Added 
option to publish synthetic data to Kafka topic.
URL: https://github.com/apache/beam/pull/7612#issuecomment-457198972
 
 
   Hi @kkucharc , could you look at it? Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189472)
Time Spent: 10m
Remaining Estimate: 0h

> extend "Data insertion Pipeline" with Kafka IO
> --
>
> Key: BEAM-6207
> URL: https://issues.apache.org/jira/browse/BEAM-6207
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-kafka, testing
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Trivial
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Since now we have the Data insertion pipeline based on PubSubIO, it can be 
> easily extended with KafkaIO if needed. Same data then could be published to 
> any of the sinks leaving out the choice and enabling the data insertion 
> pipeline for Flink.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)