[jira] [Resolved] (BEAM-9734) Revert https://github.com/apache/beam/pull/11122 which is a potential regression
[ https://issues.apache.org/jira/browse/BEAM-9734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chamikara Madhusanka Jayalath resolved BEAM-9734. - Resolution: Fixed > Revert https://github.com/apache/beam/pull/11122 which is a potential > regression > > > Key: BEAM-9734 > URL: https://issues.apache.org/jira/browse/BEAM-9734 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Blocker > Fix For: 2.21.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > This is potentially a regression for Dataflow. We should revert and > re-introduce as an optional change that can be controlled by a user option. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9734) Revert https://github.com/apache/beam/pull/11122 which is a potential regression
[ https://issues.apache.org/jira/browse/BEAM-9734?focusedWorklogId=421072=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421072 ] ASF GitHub Bot logged work on BEAM-9734: Author: ASF GitHub Bot Created on: 13/Apr/20 04:41 Start Date: 13/Apr/20 04:41 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #11378: [BEAM-9734] Revert #11122 URL: https://github.com/apache/beam/pull/11378 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421072) Time Spent: 1h 40m (was: 1.5h) > Revert https://github.com/apache/beam/pull/11122 which is a potential > regression > > > Key: BEAM-9734 > URL: https://issues.apache.org/jira/browse/BEAM-9734 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Blocker > Fix For: 2.21.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > This is potentially a regression for Dataflow. We should revert and > re-introduce as an optional change that can be controlled by a user option. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage
[ https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=421039=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421039 ] ASF GitHub Bot logged work on BEAM-8889: Author: ASF GitHub Bot Created on: 13/Apr/20 01:24 Start Date: 13/Apr/20 01:24 Worklog Time Spent: 10m Work Description: vnorigoog commented on issue #11381: [BEAM-8889] add gRPC suport in GCS connector (behind an experimental-flag) URL: https://github.com/apache/beam/pull/11381#issuecomment-612708012 This now passes gradlew sdk:java:extensions:google-cloud-platform-core:check and test PTAL This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421039) Remaining Estimate: 140.5h (was: 140h 40m) Time Spent: 27.5h (was: 27h 20m) > Make GcsUtil use GoogleCloudStorage > --- > > Key: BEAM-8889 > URL: https://issues.apache.org/jira/browse/BEAM-8889 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Affects Versions: 2.16.0 >Reporter: Esun Kim >Assignee: VASU NORI >Priority: Major > Labels: gcs > Original Estimate: 168h > Time Spent: 27.5h > Remaining Estimate: 140.5h > > [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java] > is a primary class to access Google Cloud Storage on Apache Beam. Current > implementation directly creates GoogleCloudStorageReadChannel and > GoogleCloudStorageWriteChannel by itself to read and write GCS data rather > than using > [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java] > which is an abstract class providing basic IO capability which eventually > creates channel objects. This request is about updating GcsUtil to use > GoogleCloudStorage to create read and write channel, which is expected > flexible because it can easily pick up the new change; e.g. new channel > implementation using new protocol without code change. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9734) Revert https://github.com/apache/beam/pull/11122 which is a potential regression
[ https://issues.apache.org/jira/browse/BEAM-9734?focusedWorklogId=421030=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421030 ] ASF GitHub Bot logged work on BEAM-9734: Author: ASF GitHub Bot Created on: 12/Apr/20 23:07 Start Date: 12/Apr/20 23:07 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #11378: [BEAM-9734] Revert #11122 URL: https://github.com/apache/beam/pull/11378#issuecomment-612689719 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421030) Time Spent: 1.5h (was: 1h 20m) > Revert https://github.com/apache/beam/pull/11122 which is a potential > regression > > > Key: BEAM-9734 > URL: https://issues.apache.org/jira/browse/BEAM-9734 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Blocker > Fix For: 2.21.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > This is potentially a regression for Dataflow. We should revert and > re-introduce as an optional change that can be controlled by a user option. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9734) Revert https://github.com/apache/beam/pull/11122 which is a potential regression
[ https://issues.apache.org/jira/browse/BEAM-9734?focusedWorklogId=421029=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421029 ] ASF GitHub Bot logged work on BEAM-9734: Author: ASF GitHub Bot Created on: 12/Apr/20 23:07 Start Date: 12/Apr/20 23:07 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #11378: [BEAM-9734] Revert #11122 URL: https://github.com/apache/beam/pull/11378#issuecomment-612689699 Failure seems to be unrelated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421029) Time Spent: 1h 20m (was: 1h 10m) > Revert https://github.com/apache/beam/pull/11122 which is a potential > regression > > > Key: BEAM-9734 > URL: https://issues.apache.org/jira/browse/BEAM-9734 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Blocker > Fix For: 2.21.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > This is potentially a regression for Dataflow. We should revert and > re-introduce as an optional change that can be controlled by a user option. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8933) BigQuery IO should support read/write in Arrow format
[ https://issues.apache.org/jira/browse/BEAM-8933?focusedWorklogId=421014=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421014 ] ASF GitHub Bot logged work on BEAM-8933: Author: ASF GitHub Bot Created on: 12/Apr/20 22:25 Start Date: 12/Apr/20 22:25 Worklog Time Spent: 10m Work Description: stale[bot] commented on pull request #10384: [BEAM-8933] Utilities for converting Arrow schemas and reading Arrow batches as Rows URL: https://github.com/apache/beam/pull/10384 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421014) Time Spent: 12h 40m (was: 12.5h) > BigQuery IO should support read/write in Arrow format > - > > Key: BEAM-8933 > URL: https://issues.apache.org/jira/browse/BEAM-8933 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 12h 40m > Remaining Estimate: 0h > > As of right now BigQuery uses Avro format for reading and writing. > We should add a config to BigQueryIO to specify which format to use: Arrow or > Avro (with Avro as default). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8933) BigQuery IO should support read/write in Arrow format
[ https://issues.apache.org/jira/browse/BEAM-8933?focusedWorklogId=421013=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421013 ] ASF GitHub Bot logged work on BEAM-8933: Author: ASF GitHub Bot Created on: 12/Apr/20 22:25 Start Date: 12/Apr/20 22:25 Worklog Time Spent: 10m Work Description: stale[bot] commented on issue #10384: [BEAM-8933] Utilities for converting Arrow schemas and reading Arrow batches as Rows URL: https://github.com/apache/beam/pull/10384#issuecomment-612685431 This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421013) Time Spent: 12.5h (was: 12h 20m) > BigQuery IO should support read/write in Arrow format > - > > Key: BEAM-8933 > URL: https://issues.apache.org/jira/browse/BEAM-8933 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 12.5h > Remaining Estimate: 0h > > As of right now BigQuery uses Avro format for reading and writing. > We should add a config to BigQueryIO to specify which format to use: Arrow or > Avro (with Avro as default). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2546) Create InfluxDbIO
[ https://issues.apache.org/jira/browse/BEAM-2546?focusedWorklogId=420987=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-420987 ] ASF GitHub Bot logged work on BEAM-2546: Author: ASF GitHub Bot Created on: 12/Apr/20 19:26 Start Date: 12/Apr/20 19:26 Worklog Time Spent: 10m Work Description: bipinupd commented on issue #11028: BEAM-2546 Beam IO for InfluxDB URL: https://github.com/apache/beam/pull/11028#issuecomment-612664265 @iemejia and @mwalenia Thanks for reviewing and providing suggestion and helpful comments. It's my first contribution to Beam project. Looking forward for your guidance and support. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 420987) Time Spent: 16h 10m (was: 16h) > Create InfluxDbIO > - > > Key: BEAM-2546 > URL: https://issues.apache.org/jira/browse/BEAM-2546 > Project: Beam > Issue Type: New Feature > Components: io-ideas >Reporter: Jean-Baptiste Onofré >Assignee: Bipin Upadhyaya >Priority: Major > Time Spent: 16h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2546) Create InfluxDbIO
[ https://issues.apache.org/jira/browse/BEAM-2546?focusedWorklogId=420986=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-420986 ] ASF GitHub Bot logged work on BEAM-2546: Author: ASF GitHub Bot Created on: 12/Apr/20 19:21 Start Date: 12/Apr/20 19:21 Worklog Time Spent: 10m Work Description: bipinupd commented on pull request #11028: BEAM-2546 Beam IO for InfluxDB URL: https://github.com/apache/beam/pull/11028#discussion_r407241372 ## File path: sdks/java/io/influxdb/src/main/java/org/apache/beam/sdk/io/influxdb/InfluxDBIO.java ## @@ -0,0 +1,709 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.influxdb; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; + +import com.google.auto.value.AutoValue; +import java.io.Serializable; +import java.security.cert.CertificateException; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Iterator; +import java.util.List; +import java.util.NoSuchElementException; +import javax.annotation.Nullable; +import javax.net.ssl.HostnameVerifier; +import javax.net.ssl.SSLContext; +import javax.net.ssl.SSLSession; +import javax.net.ssl.SSLSocketFactory; +import javax.net.ssl.TrustManager; +import javax.net.ssl.X509TrustManager; +import okhttp3.OkHttpClient; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.SerializableCoder; +import org.apache.beam.sdk.io.BoundedSource; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.transforms.display.HasDisplayData; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.influxdb.BatchOptions; +import org.influxdb.InfluxDB; +import org.influxdb.InfluxDBFactory; +import org.influxdb.dto.Query; +import org.influxdb.dto.QueryResult; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * IO to read and write to InfluxDB. + * + * Reading from InfluxDB datasource + * + * InfluxDBIO source returns a bounded collection of {@code String} as a {@code + * PCollection}. + * + * To configure the InfluxDB source, you have to provide a {@link DataSourceConfiguration} using + * + * {@link DataSourceConfiguration#create(String, String, String)}(durl, username and password). + * Optionally, {@link DataSourceConfiguration#withUsername(String)} and {@link + * DataSourceConfiguration#withPassword(String)} allows you to define username and password. + * + * For example: + * + * {@code + * PCollection collection = pipeline.apply(InfluxDBIO.read() + * .withDataSourceConfiguration(JdbcIO.DataSourceConfiguration.create( + * "https://localhost:8086","username","password;)) + * .withDatabase("metrics") + * .withRetentionPolicy("autogen") + * .withSslInvalidHostNameAllowed(true) + * withSslEnabled(true)); + * } + * + * For example (Read from query): + * + * {@code + * PCollection collection = pipeline.apply(InfluxDBIO.read() + * .withDataSourceConfiguration(JdbcIO.DataSourceConfiguration.create( + * "https://localhost:8086","username","password;)) + * .withDatabase("metrics") + * .withQuery("Select * from cpu") + * .withRetentionPolicy("autogen") + * .withSslInvalidHostNameAllowed(true) + * withSslEnabled(true)); + * } + * + * Writing to Influx datasource + * + * InfluxDB sink supports writing records into a database. It writes a {@link PCollection} to the + * database by converting each T. The T should implement getLineProtocol() from {@link + * LineProtocolConvertable}. + * + * Like the source, to configure the sink, you have to provide a {@link DataSourceConfiguration}. + * + *
[jira] [Work logged] (BEAM-2546) Create InfluxDbIO
[ https://issues.apache.org/jira/browse/BEAM-2546?focusedWorklogId=420985=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-420985 ] ASF GitHub Bot logged work on BEAM-2546: Author: ASF GitHub Bot Created on: 12/Apr/20 19:19 Start Date: 12/Apr/20 19:19 Worklog Time Spent: 10m Work Description: bipinupd commented on pull request #11028: BEAM-2546 Beam IO for InfluxDB URL: https://github.com/apache/beam/pull/11028#discussion_r407241190 ## File path: sdks/java/io/influxdb/src/main/java/org/apache/beam/sdk/io/influxdb/InfluxDBIO.java ## @@ -0,0 +1,709 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.influxdb; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; + +import com.google.auto.value.AutoValue; +import java.io.Serializable; +import java.security.cert.CertificateException; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Iterator; +import java.util.List; +import java.util.NoSuchElementException; +import javax.annotation.Nullable; +import javax.net.ssl.HostnameVerifier; +import javax.net.ssl.SSLContext; +import javax.net.ssl.SSLSession; +import javax.net.ssl.SSLSocketFactory; +import javax.net.ssl.TrustManager; +import javax.net.ssl.X509TrustManager; +import okhttp3.OkHttpClient; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.SerializableCoder; +import org.apache.beam.sdk.io.BoundedSource; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.transforms.display.HasDisplayData; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.influxdb.BatchOptions; +import org.influxdb.InfluxDB; +import org.influxdb.InfluxDBFactory; +import org.influxdb.dto.Query; +import org.influxdb.dto.QueryResult; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * IO to read and write to InfluxDB. + * + * Reading from InfluxDB datasource + * + * InfluxDBIO source returns a bounded collection of {@code String} as a {@code + * PCollection}. + * + * To configure the InfluxDB source, you have to provide a {@link DataSourceConfiguration} using + * + * {@link DataSourceConfiguration#create(String, String, String)}(durl, username and password). + * Optionally, {@link DataSourceConfiguration#withUsername(String)} and {@link + * DataSourceConfiguration#withPassword(String)} allows you to define username and password. + * + * For example: + * + * {@code + * PCollection collection = pipeline.apply(InfluxDBIO.read() + * .withDataSourceConfiguration(JdbcIO.DataSourceConfiguration.create( + * "https://localhost:8086","username","password;)) + * .withDatabase("metrics") + * .withRetentionPolicy("autogen") + * .withSslInvalidHostNameAllowed(true) + * withSslEnabled(true)); + * } + * + * For example (Read from query): + * + * {@code + * PCollection collection = pipeline.apply(InfluxDBIO.read() + * .withDataSourceConfiguration(JdbcIO.DataSourceConfiguration.create( + * "https://localhost:8086","username","password;)) + * .withDatabase("metrics") + * .withQuery("Select * from cpu") + * .withRetentionPolicy("autogen") + * .withSslInvalidHostNameAllowed(true) + * withSslEnabled(true)); + * } + * + * Writing to Influx datasource + * + * InfluxDB sink supports writing records into a database. It writes a {@link PCollection} to the + * database by converting each T. The T should implement getLineProtocol() from {@link + * LineProtocolConvertable}. + * + * Like the source, to configure the sink, you have to provide a {@link DataSourceConfiguration}. + * + *
[jira] [Work logged] (BEAM-2546) Create InfluxDbIO
[ https://issues.apache.org/jira/browse/BEAM-2546?focusedWorklogId=420984=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-420984 ] ASF GitHub Bot logged work on BEAM-2546: Author: ASF GitHub Bot Created on: 12/Apr/20 19:16 Start Date: 12/Apr/20 19:16 Worklog Time Spent: 10m Work Description: bipinupd commented on pull request #11028: BEAM-2546 Beam IO for InfluxDB URL: https://github.com/apache/beam/pull/11028#discussion_r407240851 ## File path: sdks/java/io/influxdb/src/main/java/org/apache/beam/sdk/io/influxdb/InfluxDBIO.java ## @@ -0,0 +1,709 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.influxdb; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; + +import com.google.auto.value.AutoValue; +import java.io.Serializable; +import java.security.cert.CertificateException; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Iterator; +import java.util.List; +import java.util.NoSuchElementException; +import javax.annotation.Nullable; +import javax.net.ssl.HostnameVerifier; +import javax.net.ssl.SSLContext; +import javax.net.ssl.SSLSession; +import javax.net.ssl.SSLSocketFactory; +import javax.net.ssl.TrustManager; +import javax.net.ssl.X509TrustManager; +import okhttp3.OkHttpClient; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.SerializableCoder; +import org.apache.beam.sdk.io.BoundedSource; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.transforms.display.HasDisplayData; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.influxdb.BatchOptions; +import org.influxdb.InfluxDB; +import org.influxdb.InfluxDBFactory; +import org.influxdb.dto.Query; +import org.influxdb.dto.QueryResult; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * IO to read and write to InfluxDB. + * + * Reading from InfluxDB datasource + * + * InfluxDBIO source returns a bounded collection of {@code String} as a {@code + * PCollection}. + * + * To configure the InfluxDB source, you have to provide a {@link DataSourceConfiguration} using + * + * {@link DataSourceConfiguration#create(String, String, String)}(durl, username and password). + * Optionally, {@link DataSourceConfiguration#withUsername(String)} and {@link + * DataSourceConfiguration#withPassword(String)} allows you to define username and password. + * + * For example: + * + * {@code + * PCollection collection = pipeline.apply(InfluxDBIO.read() + * .withDataSourceConfiguration(JdbcIO.DataSourceConfiguration.create( + * "https://localhost:8086","username","password;)) + * .withDatabase("metrics") + * .withRetentionPolicy("autogen") + * .withSslInvalidHostNameAllowed(true) + * withSslEnabled(true)); + * } + * + * For example (Read from query): + * + * {@code + * PCollection collection = pipeline.apply(InfluxDBIO.read() + * .withDataSourceConfiguration(JdbcIO.DataSourceConfiguration.create( + * "https://localhost:8086","username","password;)) + * .withDatabase("metrics") + * .withQuery("Select * from cpu") + * .withRetentionPolicy("autogen") + * .withSslInvalidHostNameAllowed(true) + * withSslEnabled(true)); + * } + * + * Writing to Influx datasource + * + * InfluxDB sink supports writing records into a database. It writes a {@link PCollection} to the + * database by converting each T. The T should implement getLineProtocol() from {@link + * LineProtocolConvertable}. + * + * Like the source, to configure the sink, you have to provide a {@link DataSourceConfiguration}. + * + *
[jira] [Work logged] (BEAM-2546) Create InfluxDbIO
[ https://issues.apache.org/jira/browse/BEAM-2546?focusedWorklogId=420982=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-420982 ] ASF GitHub Bot logged work on BEAM-2546: Author: ASF GitHub Bot Created on: 12/Apr/20 19:13 Start Date: 12/Apr/20 19:13 Worklog Time Spent: 10m Work Description: bipinupd commented on pull request #11028: BEAM-2546 Beam IO for InfluxDB URL: https://github.com/apache/beam/pull/11028#discussion_r407240538 ## File path: sdks/java/io/influxdb/src/main/java/org/apache/beam/sdk/io/influxdb/package-info.java ## @@ -0,0 +1,24 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Transforms for reading and writing from/to InfluxDB. + * + * @see org.apache.beam.sdk.io.influxdb.InfluxDBIO + */ +package org.apache.beam.sdk.io.influxdb; Review comment: I have added in the tag in InfluxDBIO class. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 420982) Time Spent: 15.5h (was: 15h 20m) > Create InfluxDbIO > - > > Key: BEAM-2546 > URL: https://issues.apache.org/jira/browse/BEAM-2546 > Project: Beam > Issue Type: New Feature > Components: io-ideas >Reporter: Jean-Baptiste Onofré >Assignee: Bipin Upadhyaya >Priority: Major > Time Spent: 15.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9733) ImpulseSourceFunction does not emit a final watermark
[ https://issues.apache.org/jira/browse/BEAM-9733?focusedWorklogId=420945=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-420945 ] ASF GitHub Bot logged work on BEAM-9733: Author: ASF GitHub Bot Created on: 12/Apr/20 13:57 Start Date: 12/Apr/20 13:57 Worklog Time Spent: 10m Work Description: mxm commented on issue #11362: [BEAM-9733] Always let ImpulseSourceFunction emit a final watermark URL: https://github.com/apache/beam/pull/11362#issuecomment-612618733 Test case failing has been failing before these changes. I'll look into the cause. https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming_PR/191/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 420945) Time Spent: 1h 40m (was: 1.5h) > ImpulseSourceFunction does not emit a final watermark > - > > Key: BEAM-9733 > URL: https://issues.apache.org/jira/browse/BEAM-9733 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Critical > Fix For: 2.21.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > The Flink Runner's {{ImpulseSourceFunction}} does not emit a final watermark, > unless {{--shutdownSourcesOnFinalWatermark}} flag has been specified (the > flag is used in tests to shutdown the pipeline after reading all data). Most > pipelines will be long-running and thus do not specify the flag. > Not sending out the final watermark causes GroupByKey to hold back the data > of event time windows until the pipeline is shut down (the final watermark is > always emitted on pipeline shutdown which is why using the above flag works). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9733) ImpulseSourceFunction does not emit a final watermark
[ https://issues.apache.org/jira/browse/BEAM-9733?focusedWorklogId=420940=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-420940 ] ASF GitHub Bot logged work on BEAM-9733: Author: ASF GitHub Bot Created on: 12/Apr/20 13:24 Start Date: 12/Apr/20 13:24 Worklog Time Spent: 10m Work Description: mxm commented on issue #11362: [BEAM-9733] Always let ImpulseSourceFunction emit a final watermark URL: https://github.com/apache/beam/pull/11362#issuecomment-612613900 Run Java Flink PortableValidatesRunner Streaming This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 420940) Time Spent: 1.5h (was: 1h 20m) > ImpulseSourceFunction does not emit a final watermark > - > > Key: BEAM-9733 > URL: https://issues.apache.org/jira/browse/BEAM-9733 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Critical > Fix For: 2.21.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > The Flink Runner's {{ImpulseSourceFunction}} does not emit a final watermark, > unless {{--shutdownSourcesOnFinalWatermark}} flag has been specified (the > flag is used in tests to shutdown the pipeline after reading all data). Most > pipelines will be long-running and thus do not specify the flag. > Not sending out the final watermark causes GroupByKey to hold back the data > of event time windows until the pipeline is shut down (the final watermark is > always emitted on pipeline shutdown which is why using the above flag works). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9743) TFRecordCodec not attempt to fully read/write
[ https://issues.apache.org/jira/browse/BEAM-9743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-9743: --- Status: Open (was: Triage Needed) > TFRecordCodec not attempt to fully read/write > - > > Key: BEAM-9743 > URL: https://issues.apache.org/jira/browse/BEAM-9743 > Project: Beam > Issue Type: Bug > Components: io-java-tfrecord, sdk-java-core >Reporter: Kyoungha Min >Assignee: Kyoungha Min >Priority: Critical > Time Spent: 1.5h > Remaining Estimate: 0h > > The same issue has been pointed out and the issues were marked resolved. But > they were still remaining parts > https://issues.apache.org/jira/browse/BEAM-5412?jql=text%20~%20%22tfrecord%22 > > Issue # 1: TFRecordCodec only tries once to read the header/footer. This is > likely to fail around the end of channel buffer. > Issue # 2: (minor) TFRecordCodec currently does not checks how much it > writes. > > Seems like it only happens with Zstd compression (or any other picky input > stream that refuse to read fully). ZstdInputStream seems very picky at giving > out data. > The parts with the issue are > [https://github.com/apache/beam/blob/c7911043510a266078a3dc8faef7a1dbe1f598c5/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L672] > [https://github.com/apache/beam/blob/c7911043510a266078a3dc8faef7a1dbe1f598c5/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L699] > > And not so problem within the beam application (As all (or most) of > WritableByteChannels in beam-java-sdk-core are backed by some OutputStream), > but still not following the WritableByteChannel specification, > [https://github.com/apache/beam/blob/c7911043510a266078a3dc8faef7a1dbe1f598c5/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L720-L727] > > ReadableByteChannel/WritableByteChannel Javadoc specifies that they are not > required to read/write fully, and can refuse to read/write time to time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9738) Python Dataflow runner omits capabilities.
[ https://issues.apache.org/jira/browse/BEAM-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-9738: --- Status: Open (was: Triage Needed) > Python Dataflow runner omits capabilities. > -- > > Key: BEAM-9738 > URL: https://issues.apache.org/jira/browse/BEAM-9738 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Blocker > Fix For: 2.21.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9737) beam_PostCommit_Website_Test failing
[ https://issues.apache.org/jira/browse/BEAM-9737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-9737: --- Status: Open (was: Triage Needed) > beam_PostCommit_Website_Test failing > > > Key: BEAM-9737 > URL: https://issues.apache.org/jira/browse/BEAM-9737 > Project: Beam > Issue Type: Bug > Components: test-failures, website >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Also failing: beam_PostCommit_Website_Publish (same failure) > {code} > > Task :website:buildLocalWebsite > `/` is not writable. > Bundler will use `/tmp/bundler/home/unknown' as your home directory > temporarily. > Configuration file: /repo/website/_config.yml > Configuration file: /repo/website/_config_test.yml > Configuration file: /tmp/_config_branch_repo.yml > Source: /repo/website/src >Destination: generated-local-content > Incremental build: enabled > Generating... > jekyll 3.6.3 | Error: Permission denied @ dir_s_mkdir - > /repo/build/website/generated-local-content/security > {code} > https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_Website_Test/3676/console > Possible culprit: https://github.com/apache/beam/pull/11232/files -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9738) Python Dataflow runner omits capabilities.
[ https://issues.apache.org/jira/browse/BEAM-9738?focusedWorklogId=420911=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-420911 ] ASF GitHub Bot logged work on BEAM-9738: Author: ASF GitHub Bot Created on: 12/Apr/20 08:41 Start Date: 12/Apr/20 08:41 Worklog Time Spent: 10m Work Description: ananvay commented on issue #11371: [BEAM-9738] Update dataflow to setup correct docker environment options. URL: https://github.com/apache/beam/pull/11371#issuecomment-612582987 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 420911) Time Spent: 1h (was: 50m) > Python Dataflow runner omits capabilities. > -- > > Key: BEAM-9738 > URL: https://issues.apache.org/jira/browse/BEAM-9738 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Blocker > Fix For: 2.21.0 > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9738) Python Dataflow runner omits capabilities.
[ https://issues.apache.org/jira/browse/BEAM-9738?focusedWorklogId=420912=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-420912 ] ASF GitHub Bot logged work on BEAM-9738: Author: ASF GitHub Bot Created on: 12/Apr/20 08:41 Start Date: 12/Apr/20 08:41 Worklog Time Spent: 10m Work Description: ananvay commented on issue #11371: [BEAM-9738] Update dataflow to setup correct docker environment options. URL: https://github.com/apache/beam/pull/11371#issuecomment-612583000 test this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 420912) Time Spent: 1h 10m (was: 1h) > Python Dataflow runner omits capabilities. > -- > > Key: BEAM-9738 > URL: https://issues.apache.org/jira/browse/BEAM-9738 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Blocker > Fix For: 2.21.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9738) Python Dataflow runner omits capabilities.
[ https://issues.apache.org/jira/browse/BEAM-9738?focusedWorklogId=420910=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-420910 ] ASF GitHub Bot logged work on BEAM-9738: Author: ASF GitHub Bot Created on: 12/Apr/20 08:40 Start Date: 12/Apr/20 08:40 Worklog Time Spent: 10m Work Description: ananvay commented on issue #11371: [BEAM-9738] Update dataflow to setup correct docker environment options. URL: https://github.com/apache/beam/pull/11371#issuecomment-612582933 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 420910) Time Spent: 50m (was: 40m) > Python Dataflow runner omits capabilities. > -- > > Key: BEAM-9738 > URL: https://issues.apache.org/jira/browse/BEAM-9738 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Blocker > Fix For: 2.21.0 > > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9738) Python Dataflow runner omits capabilities.
[ https://issues.apache.org/jira/browse/BEAM-9738?focusedWorklogId=420909=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-420909 ] ASF GitHub Bot logged work on BEAM-9738: Author: ASF GitHub Bot Created on: 12/Apr/20 08:40 Start Date: 12/Apr/20 08:40 Worklog Time Spent: 10m Work Description: ananvay commented on issue #11371: [BEAM-9738] Update dataflow to setup correct docker environment options. URL: https://github.com/apache/beam/pull/11371#issuecomment-612582905 updated test, sorry for missing that. PTAL. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 420909) Time Spent: 40m (was: 0.5h) > Python Dataflow runner omits capabilities. > -- > > Key: BEAM-9738 > URL: https://issues.apache.org/jira/browse/BEAM-9738 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Blocker > Fix For: 2.21.0 > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)