Re: [PR] Merge schema in ParuqetDFSSource [hudi]
rohitmittapalli commented on code in PR #10199: URL: https://github.com/apache/hudi/pull/10199#discussion_r1454158271 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/config/ParquetDFSSourceConfig.java: ## @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.utilities.config; + +import org.apache.hudi.common.config.ConfigClassProperty; +import org.apache.hudi.common.config.ConfigGroups; +import org.apache.hudi.common.config.ConfigProperty; +import org.apache.hudi.common.config.HoodieConfig; + +import javax.annotation.concurrent.Immutable; + +import static org.apache.hudi.common.util.ConfigUtils.DELTA_STREAMER_CONFIG_PREFIX; +import static org.apache.hudi.common.util.ConfigUtils.STREAMER_CONFIG_PREFIX; + +/** + * Parquet DFS Source Configs + */ +@Immutable +@ConfigClassProperty(name = "Parquet DFS Source Configs", +groupName = ConfigGroups.Names.HUDI_STREAMER, +subGroupName = ConfigGroups.SubGroupNames.DELTA_STREAMER_SOURCE, +description = "Configurations controlling the behavior of Parquet DFS source in Hudi Streamer.") +public class ParquetDFSSourceConfig extends HoodieConfig { + +public static final ConfigProperty PARQUET_DFS_MERGE_SCHEMA = ConfigProperty +.key(STREAMER_CONFIG_PREFIX + "source.parquet.dfs.mergeSchema") +.defaultValue(true) Review Comment: fine by me! will set to false by default then -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Merge schema in ParuqetDFSSource [hudi]
xushiyan commented on code in PR #10199: URL: https://github.com/apache/hudi/pull/10199#discussion_r1454154841 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/config/ParquetDFSSourceConfig.java: ## @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.utilities.config; + +import org.apache.hudi.common.config.ConfigClassProperty; +import org.apache.hudi.common.config.ConfigGroups; +import org.apache.hudi.common.config.ConfigProperty; +import org.apache.hudi.common.config.HoodieConfig; + +import javax.annotation.concurrent.Immutable; + +import static org.apache.hudi.common.util.ConfigUtils.DELTA_STREAMER_CONFIG_PREFIX; +import static org.apache.hudi.common.util.ConfigUtils.STREAMER_CONFIG_PREFIX; + +/** + * Parquet DFS Source Configs + */ +@Immutable +@ConfigClassProperty(name = "Parquet DFS Source Configs", +groupName = ConfigGroups.Names.HUDI_STREAMER, +subGroupName = ConfigGroups.SubGroupNames.DELTA_STREAMER_SOURCE, +description = "Configurations controlling the behavior of Parquet DFS source in Hudi Streamer.") +public class ParquetDFSSourceConfig extends HoodieConfig { + +public static final ConfigProperty PARQUET_DFS_MERGE_SCHEMA = ConfigProperty +.key(STREAMER_CONFIG_PREFIX + "source.parquet.dfs.mergeSchema") +.defaultValue(true) Review Comment: ![Screenshot 2024-01-16 at 4 38 21 PM](https://github.com/apache/hudi/assets/2701446/9c6730f8-e9f1-41ab-988c-f6242ec8e523) did a quick check on the doc so it's default false. setting this true will introduce behavior changes. we should keep it BWC in pre 1.0 releases -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Merge schema in ParuqetDFSSource [hudi]
yihua commented on code in PR #10199: URL: https://github.com/apache/hudi/pull/10199#discussion_r1454147802 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/config/ParquetDFSSourceConfig.java: ## @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.utilities.config; + +import org.apache.hudi.common.config.ConfigClassProperty; +import org.apache.hudi.common.config.ConfigGroups; +import org.apache.hudi.common.config.ConfigProperty; +import org.apache.hudi.common.config.HoodieConfig; + +import javax.annotation.concurrent.Immutable; + +import static org.apache.hudi.common.util.ConfigUtils.DELTA_STREAMER_CONFIG_PREFIX; +import static org.apache.hudi.common.util.ConfigUtils.STREAMER_CONFIG_PREFIX; + +/** + * Parquet DFS Source Configs + */ +@Immutable +@ConfigClassProperty(name = "Parquet DFS Source Configs", +groupName = ConfigGroups.Names.HUDI_STREAMER, +subGroupName = ConfigGroups.SubGroupNames.DELTA_STREAMER_SOURCE, +description = "Configurations controlling the behavior of Parquet DFS source in Hudi Streamer.") +public class ParquetDFSSourceConfig extends HoodieConfig { + +public static final ConfigProperty PARQUET_DFS_MERGE_SCHEMA = ConfigProperty +.key(STREAMER_CONFIG_PREFIX + "source.parquet.dfs.mergeSchema") Review Comment: Avoid camelCase in the config naming. use `.enable_merge_schema` instead. ## hudi-utilities/src/main/java/org/apache/hudi/utilities/config/ParquetDFSSourceConfig.java: ## @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.utilities.config; + +import org.apache.hudi.common.config.ConfigClassProperty; +import org.apache.hudi.common.config.ConfigGroups; +import org.apache.hudi.common.config.ConfigProperty; +import org.apache.hudi.common.config.HoodieConfig; + +import javax.annotation.concurrent.Immutable; + +import static org.apache.hudi.common.util.ConfigUtils.DELTA_STREAMER_CONFIG_PREFIX; +import static org.apache.hudi.common.util.ConfigUtils.STREAMER_CONFIG_PREFIX; + +/** + * Parquet DFS Source Configs + */ +@Immutable +@ConfigClassProperty(name = "Parquet DFS Source Configs", +groupName = ConfigGroups.Names.HUDI_STREAMER, +subGroupName = ConfigGroups.SubGroupNames.DELTA_STREAMER_SOURCE, +description = "Configurations controlling the behavior of Parquet DFS source in Hudi Streamer.") +public class ParquetDFSSourceConfig extends HoodieConfig { + +public static final ConfigProperty PARQUET_DFS_MERGE_SCHEMA = ConfigProperty +.key(STREAMER_CONFIG_PREFIX + "source.parquet.dfs.mergeSchema") +.defaultValue(true) +.withAlternatives(DELTA_STREAMER_CONFIG_PREFIX + "source.parquet.dfs.mergeSchema") +.markAdvanced() Review Comment: add `sinceVersion("1.0.0")` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Merge schema in ParuqetDFSSource [hudi]
rohitmittapalli commented on PR #10199: URL: https://github.com/apache/hudi/pull/10199#issuecomment-1894619834 > @rohitmittapalli can you also file a jira and update the title with the jira id pls? Requested a JIRA account unable to file until that gets approved -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Merge schema in ParuqetDFSSource [hudi]
rohitmittapalli commented on code in PR #10199: URL: https://github.com/apache/hudi/pull/10199#discussion_r1454133265 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/config/ParquetDFSSourceConfig.java: ## @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.utilities.config; + +import org.apache.hudi.common.config.ConfigClassProperty; +import org.apache.hudi.common.config.ConfigGroups; +import org.apache.hudi.common.config.ConfigProperty; +import org.apache.hudi.common.config.HoodieConfig; + +import javax.annotation.concurrent.Immutable; + +import static org.apache.hudi.common.util.ConfigUtils.DELTA_STREAMER_CONFIG_PREFIX; +import static org.apache.hudi.common.util.ConfigUtils.STREAMER_CONFIG_PREFIX; + +/** + * Parquet DFS Source Configs + */ +@Immutable +@ConfigClassProperty(name = "Parquet DFS Source Configs", +groupName = ConfigGroups.Names.HUDI_STREAMER, +subGroupName = ConfigGroups.SubGroupNames.DELTA_STREAMER_SOURCE, +description = "Configurations controlling the behavior of Parquet DFS source in Hudi Streamer.") +public class ParquetDFSSourceConfig extends HoodieConfig { + +public static final ConfigProperty PARQUET_DFS_MERGE_SCHEMA = ConfigProperty +.key(STREAMER_CONFIG_PREFIX + "source.parquet.dfs.mergeSchema") +.defaultValue(true) Review Comment: I've set default to true as per @nsivabalan's request here: https://github.com/apache/hudi/pull/10199#discussion_r1408722685 Essentially the key difference is that the schema will be merged across all the parquet files in the commit, in the past the schema would be inherited by the first file in the commit. In my opinion, this should be the default case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Merge schema in ParuqetDFSSource [hudi]
xushiyan commented on PR #10199: URL: https://github.com/apache/hudi/pull/10199#issuecomment-1894614464 @rohitmittapalli can you also file a jira and update the title with the jira id pls? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Merge schema in ParuqetDFSSource [hudi]
xushiyan commented on code in PR #10199: URL: https://github.com/apache/hudi/pull/10199#discussion_r1454129825 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/config/ParquetDFSSourceConfig.java: ## @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.utilities.config; + +import org.apache.hudi.common.config.ConfigClassProperty; +import org.apache.hudi.common.config.ConfigGroups; +import org.apache.hudi.common.config.ConfigProperty; +import org.apache.hudi.common.config.HoodieConfig; + +import javax.annotation.concurrent.Immutable; + +import static org.apache.hudi.common.util.ConfigUtils.DELTA_STREAMER_CONFIG_PREFIX; +import static org.apache.hudi.common.util.ConfigUtils.STREAMER_CONFIG_PREFIX; + +/** + * Parquet DFS Source Configs + */ +@Immutable +@ConfigClassProperty(name = "Parquet DFS Source Configs", +groupName = ConfigGroups.Names.HUDI_STREAMER, +subGroupName = ConfigGroups.SubGroupNames.DELTA_STREAMER_SOURCE, +description = "Configurations controlling the behavior of Parquet DFS source in Hudi Streamer.") +public class ParquetDFSSourceConfig extends HoodieConfig { + +public static final ConfigProperty PARQUET_DFS_MERGE_SCHEMA = ConfigProperty +.key(STREAMER_CONFIG_PREFIX + "source.parquet.dfs.mergeSchema") +.defaultValue(true) Review Comment: can you clarify by setting this default to true, what is the impact to existing pipelines that using this DFS source? should it be false by default to be compatible? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Merge schema in ParuqetDFSSource [hudi]
hudi-bot commented on PR #10199: URL: https://github.com/apache/hudi/pull/10199#issuecomment-1882481681 ## CI report: * 9c61cc3b1ff124314bb7cacb82bb141762678d54 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21878) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Merge schema in ParuqetDFSSource [hudi]
hudi-bot commented on PR #10199: URL: https://github.com/apache/hudi/pull/10199#issuecomment-1882430073 ## CI report: * Unknown: [CANCELED](TBD) * 9c61cc3b1ff124314bb7cacb82bb141762678d54 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21878) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Merge schema in ParuqetDFSSource [hudi]
hudi-bot commented on PR #10199: URL: https://github.com/apache/hudi/pull/10199#issuecomment-1882355671 ## CI report: * 378a6a619dc288301c70275483bbb0ecfa73a7f1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21875) * Unknown: [CANCELED](TBD) * 9c61cc3b1ff124314bb7cacb82bb141762678d54 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21878) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Merge schema in ParuqetDFSSource [hudi]
hudi-bot commented on PR #10199: URL: https://github.com/apache/hudi/pull/10199#issuecomment-1882339654 ## CI report: * 378a6a619dc288301c70275483bbb0ecfa73a7f1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21875) * Unknown: [CANCELED](TBD) * 9c61cc3b1ff124314bb7cacb82bb141762678d54 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Merge schema in ParuqetDFSSource [hudi]
rohitmittapalli commented on PR #10199: URL: https://github.com/apache/hudi/pull/10199#issuecomment-1882325117 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Merge schema in ParuqetDFSSource [hudi]
hudi-bot commented on PR #10199: URL: https://github.com/apache/hudi/pull/10199#issuecomment-1882239471 ## CI report: * f7566099db43c39a06db5e4ae905a65dfd69a7ca Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21190) * 378a6a619dc288301c70275483bbb0ecfa73a7f1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21875) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Merge schema in ParuqetDFSSource [hudi]
hudi-bot commented on PR #10199: URL: https://github.com/apache/hudi/pull/10199#issuecomment-1882224728 ## CI report: * f7566099db43c39a06db5e4ae905a65dfd69a7ca Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21190) * 378a6a619dc288301c70275483bbb0ecfa73a7f1 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Merge schema in ParuqetDFSSource [hudi]
hudi-bot commented on PR #10199: URL: https://github.com/apache/hudi/pull/10199#issuecomment-1882210046 ## CI report: * f7566099db43c39a06db5e4ae905a65dfd69a7ca Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21190) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Merge schema in ParuqetDFSSource [hudi]
rohitmittapalli commented on PR #10199: URL: https://github.com/apache/hudi/pull/10199#issuecomment-1882191866 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Merge schema in ParuqetDFSSource [hudi]
nsivabalan commented on code in PR #10199: URL: https://github.com/apache/hudi/pull/10199#discussion_r1408722685 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/ParquetDFSSource.java: ## @@ -52,6 +52,6 @@ public Pair>, String> fetchNextBatch(Option lastCkpt } private Dataset fromFiles(String pathStr) { -return sparkSession.read().parquet(pathStr.split(",")); +return sparkSession.read().option("mergeSchema", "true").parquet(pathStr.split(",")); Review Comment: Can we add a config property for this and enable based on that. you can introduce a new Config class named ParquetDFSSourceConfig and add a config property for MergeSchema. set default to true. you can take a look at https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/config/DFSPathSelectorConfig.java for reference. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Merge schema in ParuqetDFSSource [hudi]
hudi-bot commented on PR #10199: URL: https://github.com/apache/hudi/pull/10199#issuecomment-1831064240 ## CI report: * f7566099db43c39a06db5e4ae905a65dfd69a7ca Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21190) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Merge schema in ParuqetDFSSource [hudi]
hudi-bot commented on PR #10199: URL: https://github.com/apache/hudi/pull/10199#issuecomment-1830876966 ## CI report: * f7566099db43c39a06db5e4ae905a65dfd69a7ca Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21190) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Merge schema in ParuqetDFSSource [hudi]
hudi-bot commented on PR #10199: URL: https://github.com/apache/hudi/pull/10199#issuecomment-1830868076 ## CI report: * f7566099db43c39a06db5e4ae905a65dfd69a7ca UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Merge schema in ParuqetDFSSource [hudi]
rohitmittapalli commented on code in PR #10199: URL: https://github.com/apache/hudi/pull/10199#discussion_r1408463667 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/ParquetDFSSource.java: ## @@ -32,7 +32,7 @@ /** * DFS Source that reads parquet data. */ -public class ParquetDFSSource extends RowSource { +πpublic class ParquetDFSSource extends RowSource { Review Comment: ```suggestion public class ParquetDFSSource extends RowSource { ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Merge schema in ParuqetDFSSource [hudi]
rohitmittapalli opened a new pull request, #10199: URL: https://github.com/apache/hudi/pull/10199 ### Change Logs ParquetDFSSource will merge the schema across files in a particular read. ### Impact ParquetDFSSource will merge the schema across files in a particular read. ### Risk level (write none, low medium or high below) Low ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [X] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [X] Change Logs and Impact were stated clearly - [X] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org