Re: [PR] [HUDI-7115] Add in new options for the bigquery sync [hudi]
nsivabalan merged PR #10125: URL: https://github.com/apache/hudi/pull/10125 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7115] Add in new options for the bigquery sync [hudi]
hudi-bot commented on PR #10125: URL: https://github.com/apache/hudi/pull/10125#issuecomment-1820443298 ## CI report: * d94d74a02df88f3ca32807c7f580900b268ca0d0 UNKNOWN * 2d743121c2c2fd4d228bc0db8b22598da592800a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20978) * c02113b5a7ffb07fab28c87caa41b04461929b21 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21052) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7115] Add in new options for the bigquery sync [hudi]
hudi-bot commented on PR #10125: URL: https://github.com/apache/hudi/pull/10125#issuecomment-1820431328 ## CI report: * d94d74a02df88f3ca32807c7f580900b268ca0d0 UNKNOWN * 2d743121c2c2fd4d228bc0db8b22598da592800a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20978) * c02113b5a7ffb07fab28c87caa41b04461929b21 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7115] Add in new options for the bigquery sync [hudi]
the-other-tim-brown commented on PR #10125: URL: https://github.com/apache/hudi/pull/10125#issuecomment-1819674598 > hey @the-other-tim-brown : can you check the CI failure please @nsivabalan it looks like these are flakey tests since these changes due not touch spark/row writer code paths ``` Errors: Error: TestHoodieInternalRowParquetWriter.setUp:61->HoodieSparkClientTestHarness.initSparkContexts:194 » Spark Error: TestHoodieInternalRowParquetWriter.setUp:61->HoodieSparkClientTestHarness.initSparkContexts:194 » Spark Error: TestHoodieRowCreateHandle.setUp:67->HoodieSparkClientTestHarness.initSparkContexts:194 » Spark Error: TestHoodieRowCreateHandle.setUp:67->HoodieSparkClientTestHarness.initSparkContexts:194 » Spark Error: TestHoodieRowCreateHandle.setUp:67->HoodieSparkClientTestHarness.initSparkContexts:194 » Spark Error: TestHoodieRowCreateHandle.setUp:67->HoodieSparkClientTestHarness.initSparkContexts:194 » Spark Error: TestHoodieRowCreateHandle.setUp:67->HoodieSparkClientTestHarness.initSparkContexts:194 » Spark ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7115] Add in new options for the bigquery sync [hudi]
nsivabalan commented on PR #10125: URL: https://github.com/apache/hudi/pull/10125#issuecomment-1819550824 hey @the-other-tim-brown : can you check the CI failure please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7115] Add in new options for the bigquery sync [hudi]
hudi-bot commented on PR #10125: URL: https://github.com/apache/hudi/pull/10125#issuecomment-1816930897 ## CI report: * d94d74a02df88f3ca32807c7f580900b268ca0d0 UNKNOWN * 2d743121c2c2fd4d228bc0db8b22598da592800a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20978) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7115] Add in new options for the bigquery sync [hudi]
hudi-bot commented on PR #10125: URL: https://github.com/apache/hudi/pull/10125#issuecomment-1816676080 ## CI report: * d94d74a02df88f3ca32807c7f580900b268ca0d0 UNKNOWN * f2f380dec7f0afa5fd7fb0accbe8c17e22853f00 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20963) * 2d743121c2c2fd4d228bc0db8b22598da592800a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20978) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7115] Add in new options for the bigquery sync [hudi]
hudi-bot commented on PR #10125: URL: https://github.com/apache/hudi/pull/10125#issuecomment-1816590141 ## CI report: * d94d74a02df88f3ca32807c7f580900b268ca0d0 UNKNOWN * f2f380dec7f0afa5fd7fb0accbe8c17e22853f00 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20963) * 2d743121c2c2fd4d228bc0db8b22598da592800a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7115] Add in new options for the bigquery sync [hudi]
the-other-tim-brown commented on PR #10125: URL: https://github.com/apache/hudi/pull/10125#issuecomment-1816526588 > https://hudi.apache.org/docs/gcp_bigquery Yes, I will update to include the new options -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7115] Add in new options for the bigquery sync [hudi]
yihua commented on code in PR #10125: URL: https://github.com/apache/hudi/pull/10125#discussion_r1396743948 ## hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/HoodieBigQuerySyncClient.java: ## @@ -51,34 +51,41 @@ import java.util.Map; import java.util.stream.Collectors; +import static org.apache.hudi.gcp.bigquery.BigQuerySyncConfig.BIGQUERY_SYNC_BIG_LAKE_CONNECTION_ID; import static org.apache.hudi.gcp.bigquery.BigQuerySyncConfig.BIGQUERY_SYNC_DATASET_LOCATION; import static org.apache.hudi.gcp.bigquery.BigQuerySyncConfig.BIGQUERY_SYNC_DATASET_NAME; import static org.apache.hudi.gcp.bigquery.BigQuerySyncConfig.BIGQUERY_SYNC_PROJECT_ID; +import static org.apache.hudi.gcp.bigquery.BigQuerySyncConfig.BIGQUERY_SYNC_REQUIRE_PARTITION_FILTER; public class HoodieBigQuerySyncClient extends HoodieSyncClient { private static final Logger LOG = LoggerFactory.getLogger(HoodieBigQuerySyncClient.class); protected final BigQuerySyncConfig config; private final String projectId; + private final String bigLakeConnectionId; private final String datasetName; + private final boolean requirePartitionFilter; private transient BigQuery bigquery; public HoodieBigQuerySyncClient(final BigQuerySyncConfig config) { super(config); this.config = config; this.projectId = config.getString(BIGQUERY_SYNC_PROJECT_ID); +this.bigLakeConnectionId = config.getString(BIGQUERY_SYNC_BIG_LAKE_CONNECTION_ID); this.datasetName = config.getString(BIGQUERY_SYNC_DATASET_NAME); +this.requirePartitionFilter = config.getBoolean(BIGQUERY_SYNC_REQUIRE_PARTITION_FILTER); this.createBigQueryConnection(); } - @VisibleForTesting Review Comment: not sure if this need to be kept. Leave it to you to decide. ## hudi-gcp/pom.xml: ## @@ -70,7 +70,6 @@ See https://github.com/GoogleCloudPlatform/cloud-opensource-java/wiki/The-Google com.google.cloud google-cloud-pubsub - ${google.cloud.pubsub.version} Review Comment: Got it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7115] Add in new options for the bigquery sync [hudi]
yihua commented on code in PR #10125: URL: https://github.com/apache/hudi/pull/10125#discussion_r1396743251 ## hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/BigQuerySyncConfig.java: ## @@ -122,6 +121,16 @@ public class BigQuerySyncConfig extends HoodieSyncConfig implements Serializable .markAdvanced() .withDocumentation("Fetch file listing from Hudi's metadata"); + public static final ConfigProperty BIGQUERY_SYNC_REQUIRE_PARTITION_FILTER = ConfigProperty + .key("hoodie.gcp.bigquery.sync.require_partition_filter") + .defaultValue(false) + .withDocumentation("If true, configure table to require a partition filter to be specified when querying the table"); + + public static final ConfigProperty BIGQUERY_SYNC_BIG_LAKE_CONNECTION_ID = ConfigProperty + .key("hoodie.onehouse.gcp.bigquery.sync.big_lake_connection_id") + .noDefaultValue() + .withDocumentation("The Big Lake connection ID to use"); + Review Comment: yeah -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7115] Add in new options for the bigquery sync [hudi]
hudi-bot commented on PR #10125: URL: https://github.com/apache/hudi/pull/10125#issuecomment-1815666263 ## CI report: * d94d74a02df88f3ca32807c7f580900b268ca0d0 UNKNOWN * f2f380dec7f0afa5fd7fb0accbe8c17e22853f00 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20963) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7115] Add in new options for the bigquery sync [hudi]
hudi-bot commented on PR #10125: URL: https://github.com/apache/hudi/pull/10125#issuecomment-1815627236 ## CI report: * 7c1b9cc77e2e5ea2ee9d6089f41b5a9c482de9f5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20961) * d94d74a02df88f3ca32807c7f580900b268ca0d0 UNKNOWN * f2f380dec7f0afa5fd7fb0accbe8c17e22853f00 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7115] Add in new options for the bigquery sync [hudi]
hudi-bot commented on PR #10125: URL: https://github.com/apache/hudi/pull/10125#issuecomment-1815621023 ## CI report: * 7c1b9cc77e2e5ea2ee9d6089f41b5a9c482de9f5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20961) * d94d74a02df88f3ca32807c7f580900b268ca0d0 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7115] Add in new options for the bigquery sync [hudi]
the-other-tim-brown commented on code in PR #10125: URL: https://github.com/apache/hudi/pull/10125#discussion_r1396557170 ## hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/BigQuerySyncTool.java: ## @@ -79,7 +78,7 @@ public BigQuerySyncTool(Properties props) { this.bqSchemaResolver = BigQuerySchemaResolver.getInstance(); } - @VisibleForTesting // allows us to pass in mocks for the writer and client + // allows us to pass in mocks for the writer and client Review Comment: yes, accidentally removed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7115] Add in new options for the bigquery sync [hudi]
hudi-bot commented on PR #10125: URL: https://github.com/apache/hudi/pull/10125#issuecomment-1815576193 ## CI report: * 7c1b9cc77e2e5ea2ee9d6089f41b5a9c482de9f5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20961) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7115] Add in new options for the bigquery sync [hudi]
the-other-tim-brown commented on code in PR #10125: URL: https://github.com/apache/hudi/pull/10125#discussion_r139668 ## hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/BigQuerySyncConfig.java: ## @@ -122,6 +121,16 @@ public class BigQuerySyncConfig extends HoodieSyncConfig implements Serializable .markAdvanced() .withDocumentation("Fetch file listing from Hudi's metadata"); + public static final ConfigProperty BIGQUERY_SYNC_REQUIRE_PARTITION_FILTER = ConfigProperty + .key("hoodie.gcp.bigquery.sync.require_partition_filter") + .defaultValue(false) + .withDocumentation("If true, configure table to require a partition filter to be specified when querying the table"); + + public static final ConfigProperty BIGQUERY_SYNC_BIG_LAKE_CONNECTION_ID = ConfigProperty + .key("hoodie.onehouse.gcp.bigquery.sync.big_lake_connection_id") + .noDefaultValue() + .withDocumentation("The Big Lake connection ID to use"); + Review Comment: should it be 0.14.1? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7115] Add in new options for the bigquery sync [hudi]
the-other-tim-brown commented on code in PR #10125: URL: https://github.com/apache/hudi/pull/10125#discussion_r1396555489 ## hudi-gcp/pom.xml: ## @@ -70,7 +70,6 @@ See https://github.com/GoogleCloudPlatform/cloud-opensource-java/wiki/The-Google com.google.cloud google-cloud-pubsub - ${google.cloud.pubsub.version} Review Comment: We include the bom above so this was not required -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7115] Add in new options for the bigquery sync [hudi]
yihua commented on code in PR #10125: URL: https://github.com/apache/hudi/pull/10125#discussion_r1396550878 ## hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/BigQuerySyncConfig.java: ## @@ -83,7 +83,6 @@ public class BigQuerySyncConfig extends HoodieSyncConfig implements Serializable .key("hoodie.gcp.bigquery.sync.use_bq_manifest_file") .defaultValue(false) .markAdvanced() - .sinceVersion("0.14.0") Review Comment: nit: this is still needed. ## hudi-gcp/pom.xml: ## @@ -70,7 +70,6 @@ See https://github.com/GoogleCloudPlatform/cloud-opensource-java/wiki/The-Google com.google.cloud google-cloud-pubsub - ${google.cloud.pubsub.version} Review Comment: Is this safe to remove now? ## hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/BigQuerySyncTool.java: ## @@ -79,7 +78,7 @@ public BigQuerySyncTool(Properties props) { this.bqSchemaResolver = BigQuerySchemaResolver.getInstance(); } - @VisibleForTesting // allows us to pass in mocks for the writer and client + // allows us to pass in mocks for the writer and client Review Comment: Is the annotation still needed? ## hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/BigQuerySyncConfig.java: ## @@ -122,6 +121,16 @@ public class BigQuerySyncConfig extends HoodieSyncConfig implements Serializable .markAdvanced() .withDocumentation("Fetch file listing from Hudi's metadata"); + public static final ConfigProperty BIGQUERY_SYNC_REQUIRE_PARTITION_FILTER = ConfigProperty + .key("hoodie.gcp.bigquery.sync.require_partition_filter") + .defaultValue(false) + .withDocumentation("If true, configure table to require a partition filter to be specified when querying the table"); + + public static final ConfigProperty BIGQUERY_SYNC_BIG_LAKE_CONNECTION_ID = ConfigProperty + .key("hoodie.onehouse.gcp.bigquery.sync.big_lake_connection_id") + .noDefaultValue() + .withDocumentation("The Big Lake connection ID to use"); + Review Comment: Add `.sinceVersion("0.14.0")` and mark them advanced (`.markAdvanced()`, if they are not required to be set by the users)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7115] Add in new options for the bigquery sync [hudi]
hudi-bot commented on PR #10125: URL: https://github.com/apache/hudi/pull/10125#issuecomment-1815570231 ## CI report: * 7c1b9cc77e2e5ea2ee9d6089f41b5a9c482de9f5 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org