Re: [PR] [HUDI-7334] Remove EMBEDDED_KV_STORE based FSV usage in tests [hudi]
hudi-bot commented on PR #10551: URL: https://github.com/apache/hudi/pull/10551#issuecomment-1913487801 ## CI report: * 1a51332ae5c8ea94b303c1e0084cf4c33cba5113 UNKNOWN * b5d8bc6da8e624931f73a12253997c1fb101e697 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22188) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7218] Integrate new HFile reader with file reader factory [hudi]
nsivabalan commented on code in PR #10330: URL: https://github.com/apache/hudi/pull/10330#discussion_r1468774813 ## hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieAvroHFileReader.java: ## @@ -7,205 +7,136 @@ * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * - * http://www.apache.org/licenses/LICENSE-2.0 + * http://www.apache.org/licenses/LICENSE-2.0 * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. */ package org.apache.hudi.io.storage; -import org.apache.hudi.avro.HoodieAvroUtils; import org.apache.hudi.common.bloom.BloomFilter; import org.apache.hudi.common.bloom.BloomFilterFactory; import org.apache.hudi.common.fs.FSUtils; import org.apache.hudi.common.model.HoodieAvroIndexedRecord; import org.apache.hudi.common.model.HoodieRecord; import org.apache.hudi.common.model.HoodieRecordLocation; import org.apache.hudi.common.util.Option; -import org.apache.hudi.common.util.VisibleForTesting; import org.apache.hudi.common.util.collection.ClosableIterator; import org.apache.hudi.common.util.collection.CloseableMappingIterator; import org.apache.hudi.common.util.collection.Pair; -import org.apache.hudi.common.util.io.ByteBufferBackedInputStream; import org.apache.hudi.exception.HoodieException; import org.apache.hudi.exception.HoodieIOException; +import org.apache.hudi.io.hfile.HFileReader; +import org.apache.hudi.io.hfile.HFileReaderImpl; +import org.apache.hudi.io.hfile.KeyValue; +import org.apache.hudi.io.hfile.UTF8StringKey; import org.apache.hudi.util.Lazy; import org.apache.avro.Schema; import org.apache.avro.generic.GenericRecord; import org.apache.avro.generic.IndexedRecord; import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; -import org.apache.hadoop.fs.PositionedReadable; -import org.apache.hadoop.fs.Seekable; -import org.apache.hadoop.hbase.Cell; -import org.apache.hadoop.hbase.KeyValue; -import org.apache.hadoop.hbase.io.hfile.CacheConfig; -import org.apache.hadoop.hbase.io.hfile.HFile; -import org.apache.hadoop.hbase.io.hfile.HFileInfo; -import org.apache.hadoop.hbase.io.hfile.HFileScanner; -import org.apache.hadoop.hbase.nio.ByteBuff; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.IOException; -import java.util.Arrays; +import java.nio.ByteBuffer; import java.util.Collections; import java.util.Iterator; import java.util.List; import java.util.Objects; import java.util.Set; -import java.util.SortedSet; import java.util.TreeSet; import java.util.stream.Collectors; -import static org.apache.hudi.common.util.CollectionUtils.toStream; -import static org.apache.hudi.common.util.StringUtils.getUTF8Bytes; +import static org.apache.hudi.common.util.StringUtils.getStringFromUTF8Bytes; import static org.apache.hudi.common.util.TypeUtils.unsafeCast; +import static org.apache.hudi.io.hfile.HFileUtils.isPrefixOfKey; /** - * NOTE: PLEASE READ DOCS & COMMENTS CAREFULLY BEFORE MAKING CHANGES - * - * {@link HoodieFileReader} implementation allowing to read from {@link HFile}. + * An implementation of {@link BaseHoodieAvroHFileReader} using built-in {@link HFileReader}. */ -public class HoodieAvroHFileReader extends HoodieAvroFileReaderBase implements HoodieSeekingFileReader { - - // TODO HoodieHFileReader right now tightly coupled to MT, we should break that coupling - public static final String SCHEMA_KEY = "schema"; - public static final String KEY_BLOOM_FILTER_META_BLOCK = "bloomFilter"; - public static final String KEY_BLOOM_FILTER_TYPE_CODE = "bloomFilterTypeCode"; - - public static final String KEY_FIELD_NAME = "key"; - public static final String KEY_MIN_RECORD = "minRecordKey"; - public static final String KEY_MAX_RECORD = "maxRecordKey"; - +public class HoodieAvroHFileReader extends BaseHoodieAvroHFileReader { private static final Logger LOG = LoggerFactory.getLogger(HoodieAvroHFileReader.class); - private final Path path; - private final FileSystem fs; - private final Configuration hadoopConf; - private final CacheConfig config; - private final Option content; + private final Configuration conf; + private final Option path; + private final Option bytesContent; + private Option shar
[jira] [Updated] (HUDI-7354) Flink Batch Read from Hudi table does not return any rows
[ https://issues.apache.org/jira/browse/HUDI-7354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated HUDI-7354: Attachment: cleanup.sql > Flink Batch Read from Hudi table does not return any rows > - > > Key: HUDI-7354 > URL: https://issues.apache.org/jira/browse/HUDI-7354 > Project: Apache Hudi > Issue Type: Bug > Components: flink-sql >Affects Versions: 0.14.1 >Reporter: Prabhu Joseph >Priority: Major > Attachments: cleanup.sql, flink-hudi.sql > > > Flink Batch Read from Hudi table does not return any rows. The same flink sql > script returns 8 rows as expected on 0.14.0 Hudi version. > *Repro Steps* > 1. Flink 1.18.1 and Hudi 0.14.0 > 2. Open Flink YARN Session > {code} > flink-yarn-session -d -D execution.checkpointing.interval=10s -D > state.checkpoint-storage=filesystem -D > state.checkpoints.dir=s3://prabhuflinks3/test-output/flink/output/20eab3b1-d58a-491c-8819-15e451a549eb > {code} > 3. Place CSV Input Data > {code} > cat > data < 1,Danny,23 > 2,Stephen,33 > 3,Julian,53 > 4,Fabian,31 > 5,Sophia,18 > 6,Emma,20 > 7,Bob,44 > 8,Han,56 > EOF > hadoop fs -mkdir -p > s3://prabhuflinks3/test-output/flink/output/8d007d79-913d-4ed4-a6e4-9af591f24c36/csvinput/ > hadoop fs -put data > s3://prabhuflinks3/test-output/flink/output/8d007d79-913d-4ed4-a6e4-9af591f24c36/csvinput/ > {code} > 4. Run attached Flink sql (flink-hudi.sql) script > {code} > /usr/lib/flink/bin/sql-client.sh -f flink-hudi.sql > {code} > The script makes a flink filesystem table with CSV data of 8 rows. Then, it > forms a Hudi table and puts in the data from the filesystem table. Finally, > it runs a select query from the Hudi table. The select query does not return > any data. > 5. Cleanup the tables and databases using cleanup.sql > *Analysis* > The select query and insert query run together. The select query ends quickly > since the Hudi table has no data yet. In Hudi 0.14.0, the select query waits > until the data loads and then retrieves it. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7354) Flink Batch Read from Hudi table does not return any rows
[ https://issues.apache.org/jira/browse/HUDI-7354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated HUDI-7354: Attachment: flink-hudi.sql > Flink Batch Read from Hudi table does not return any rows > - > > Key: HUDI-7354 > URL: https://issues.apache.org/jira/browse/HUDI-7354 > Project: Apache Hudi > Issue Type: Bug > Components: flink-sql >Affects Versions: 0.14.1 >Reporter: Prabhu Joseph >Priority: Major > Attachments: cleanup.sql, flink-hudi.sql > > > Flink Batch Read from Hudi table does not return any rows. The same flink sql > script returns 8 rows as expected on 0.14.0 Hudi version. > *Repro Steps* > 1. Flink 1.18.1 and Hudi 0.14.0 > 2. Open Flink YARN Session > {code} > flink-yarn-session -d -D execution.checkpointing.interval=10s -D > state.checkpoint-storage=filesystem -D > state.checkpoints.dir=s3://prabhuflinks3/test-output/flink/output/20eab3b1-d58a-491c-8819-15e451a549eb > {code} > 3. Place CSV Input Data > {code} > cat > data < 1,Danny,23 > 2,Stephen,33 > 3,Julian,53 > 4,Fabian,31 > 5,Sophia,18 > 6,Emma,20 > 7,Bob,44 > 8,Han,56 > EOF > hadoop fs -mkdir -p > s3://prabhuflinks3/test-output/flink/output/8d007d79-913d-4ed4-a6e4-9af591f24c36/csvinput/ > hadoop fs -put data > s3://prabhuflinks3/test-output/flink/output/8d007d79-913d-4ed4-a6e4-9af591f24c36/csvinput/ > {code} > 4. Run attached Flink sql (flink-hudi.sql) script > {code} > /usr/lib/flink/bin/sql-client.sh -f flink-hudi.sql > {code} > The script makes a flink filesystem table with CSV data of 8 rows. Then, it > forms a Hudi table and puts in the data from the filesystem table. Finally, > it runs a select query from the Hudi table. The select query does not return > any data. > 5. Cleanup the tables and databases using cleanup.sql > *Analysis* > The select query and insert query run together. The select query ends quickly > since the Hudi table has no data yet. In Hudi 0.14.0, the select query waits > until the data loads and then retrieves it. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7354) Flink Batch Read from Hudi table does not return any rows
[ https://issues.apache.org/jira/browse/HUDI-7354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated HUDI-7354: Description: Flink Batch Read from Hudi table does not return any rows. The same flink sql script returns 8 rows as expected on 0.14.0 Hudi version. *Repro Steps* 1. Flink 1.18.1 and Hudi 0.14.0 2. Open Flink YARN Session {code} flink-yarn-session -d -D execution.checkpointing.interval=10s -D state.checkpoint-storage=filesystem -D state.checkpoints.dir=s3://prabhuflinks3/test-output/flink/output/20eab3b1-d58a-491c-8819-15e451a549eb {code} 3. Place CSV Input Data {code} cat > data < data < Flink Batch Read from Hudi table does not return any rows > - > > Key: HUDI-7354 > URL: https://issues.apache.org/jira/browse/HUDI-7354 > Project: Apache Hudi > Issue Type: Bug > Components: flink-sql >Affects Versions: 0.14.1 >Reporter: Prabhu Joseph >Priority: Major > > Flink Batch Read from Hudi table does not return any rows. The same flink sql > script returns 8 rows as expected on 0.14.0 Hudi version. > *Repro Steps* > 1. Flink 1.18.1 and Hudi 0.14.0 > 2. Open Flink YARN Session > {code} > flink-yarn-session -d -D execution.checkpointing.interval=10s -D > state.checkpoint-storage=filesystem -D > state.checkpoints.dir=s3://prabhuflinks3/test-output/flink/output/20eab3b1-d58a-491c-8819-15e451a549eb > {code} > 3. Place CSV Input Data > {code} > cat > data < 1,Danny,23 > 2,Stephen,33 > 3,Julian,53 > 4,Fabian,31 > 5,Sophia,18 > 6,Emma,20 > 7,Bob,44 > 8,Han,56 > EOF > hadoop fs -mkdir -p > s3://prabhuflinks3/test-output/flink/output/8d007d79-913d-4ed4-a6e4-9af591f24c36/csvinput/ > hadoop fs -put data > s3://prabhuflinks3/test-output/flink/output/8d007d79-913d-4ed4-a6e4-9af591f24c36/csvinput/ > {code} > 4. Run attached Flink sql (flink-hudi.sql) script > {code} > /usr/lib/flink/bin/sql-client.sh -f flink-hudi.sql > {code} > The script makes a flink filesystem table with CSV data of 8 rows. Then, it > forms a Hudi table and puts in the data from the filesystem table. Finally, > it runs a select query from the Hudi table. The select query does not return > any data. > 5. Cleanup the tables and databases using cleanup.sql > *Analysis* > The select query and insert query run together. The select query ends quickly > since the Hudi table has no data yet. In Hudi 0.14.0, the select query waits > until the data loads and then retrieves it. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7354) Flink Batch Read from Hudi table does not return any rows
Prabhu Joseph created HUDI-7354: --- Summary: Flink Batch Read from Hudi table does not return any rows Key: HUDI-7354 URL: https://issues.apache.org/jira/browse/HUDI-7354 Project: Apache Hudi Issue Type: Bug Components: flink-sql Affects Versions: 0.14.1 Reporter: Prabhu Joseph Flink Batch Read from Hudi table does not return any rows. The same flink sql script returns 8 rows as expected on 0.14.0 Hudi version. *Repro Steps* 1. Flink 1.18.1 and Hudi 0.14.0 2. Open Flink YARN Session {code} flink-yarn-session -d -D execution.checkpointing.interval=10s -D state.checkpoint-storage=filesystem -D state.checkpoints.dir=s3://prabhuflinks3/test-output/flink/output/20eab3b1-d58a-491c-8819-15e451a549eb {code} 3. Place CSV Input Data {code} cat > data <
Re: [PR] [HUDI-7334] Remove EMBEDDED_KV_STORE based FSV usage in tests [hudi]
hudi-bot commented on PR #10551: URL: https://github.com/apache/hudi/pull/10551#issuecomment-1913477595 ## CI report: * 1a51332ae5c8ea94b303c1e0084cf4c33cba5113 UNKNOWN * 977647a0ed72e2a2e8203924aa9a621463c9a193 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22187) * b5d8bc6da8e624931f73a12253997c1fb101e697 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22188) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7334] Remove EMBEDDED_KV_STORE based FSV usage in tests [hudi]
hudi-bot commented on PR #10551: URL: https://github.com/apache/hudi/pull/10551#issuecomment-1913476402 ## CI report: * 1a51332ae5c8ea94b303c1e0084cf4c33cba5113 UNKNOWN * 977647a0ed72e2a2e8203924aa9a621463c9a193 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22187) * b5d8bc6da8e624931f73a12253997c1fb101e697 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7353) Fix TestOrcBootstrap and TestBootstrap
[ https://issues.apache.org/jira/browse/HUDI-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Liu updated HUDI-7353: -- Description: Both tests failed when we used different FSV storage types. (was: The test would fail when we use different FSV storage type.) > Fix TestOrcBootstrap and TestBootstrap > -- > > Key: HUDI-7353 > URL: https://issues.apache.org/jira/browse/HUDI-7353 > Project: Apache Hudi > Issue Type: Bug >Reporter: Lin Liu >Assignee: Y Ethan Guo >Priority: Major > > Both tests failed when we used different FSV storage types. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7353) Fix TestOrcBootstrap and TestBootstrap
[ https://issues.apache.org/jira/browse/HUDI-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Liu updated HUDI-7353: -- Summary: Fix TestOrcBootstrap and TestBootstrap (was: Fix TestOrcBootstrap) > Fix TestOrcBootstrap and TestBootstrap > -- > > Key: HUDI-7353 > URL: https://issues.apache.org/jira/browse/HUDI-7353 > Project: Apache Hudi > Issue Type: Bug >Reporter: Lin Liu >Assignee: Y Ethan Guo >Priority: Major > > The test would fail when we use different FSV storage type. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7335] Create hudi-hadoop-common for hadoop-specific implementation [hudi]
hudi-bot commented on PR #10564: URL: https://github.com/apache/hudi/pull/10564#issuecomment-1913368369 ## CI report: * 7143fc0d8b881c9d28040d2b9c290c4d8b2e0f54 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22185) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7336][RFR|DNM] Introduce new HoodieStorage abstraction [hudi]
hudi-bot commented on PR #10567: URL: https://github.com/apache/hudi/pull/10567#issuecomment-1913368388 ## CI report: * ea050a0d021273313ef3d9e5e3c566186e0b28ee Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22186) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7334] Remove EMBEDDED_KV_STORE based FSV usage in tests [hudi]
hudi-bot commented on PR #10551: URL: https://github.com/apache/hudi/pull/10551#issuecomment-1913342966 ## CI report: * 1a51332ae5c8ea94b303c1e0084cf4c33cba5113 UNKNOWN * 977647a0ed72e2a2e8203924aa9a621463c9a193 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22187) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7334] Remove EMBEDDED_KV_STORE based FSV usage in tests [hudi]
hudi-bot commented on PR #10551: URL: https://github.com/apache/hudi/pull/10551#issuecomment-191580 ## CI report: * 1a51332ae5c8ea94b303c1e0084cf4c33cba5113 UNKNOWN * 4612643dfe4861c49642612a3918b67ed0785eda Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22164) * 09980192013540f9465d1549022bd06de0414238 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22184) * 977647a0ed72e2a2e8203924aa9a621463c9a193 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22187) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7336][RFR|DNM] Introduce new HoodieStorage abstraction [hudi]
hudi-bot commented on PR #10567: URL: https://github.com/apache/hudi/pull/10567#issuecomment-1913331895 ## CI report: * 8b765537863e155372fed079ea520f0a91a45b84 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22170) * ea050a0d021273313ef3d9e5e3c566186e0b28ee Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22186) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7334] Remove EMBEDDED_KV_STORE based FSV usage in tests [hudi]
hudi-bot commented on PR #10551: URL: https://github.com/apache/hudi/pull/10551#issuecomment-1913331858 ## CI report: * 1a51332ae5c8ea94b303c1e0084cf4c33cba5113 UNKNOWN * 4612643dfe4861c49642612a3918b67ed0785eda Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22164) * 09980192013540f9465d1549022bd06de0414238 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22184) * 977647a0ed72e2a2e8203924aa9a621463c9a193 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7335] Create hudi-hadoop-common for hadoop-specific implementation [hudi]
hudi-bot commented on PR #10564: URL: https://github.com/apache/hudi/pull/10564#issuecomment-1913331873 ## CI report: * 5b1fb24ea8a64c0373fa8e901802d6f3d0f5ff33 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22163) * 7143fc0d8b881c9d28040d2b9c290c4d8b2e0f54 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22185) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7353) Fix TestOrcBootstrap
Lin Liu created HUDI-7353: - Summary: Fix TestOrcBootstrap Key: HUDI-7353 URL: https://issues.apache.org/jira/browse/HUDI-7353 Project: Apache Hudi Issue Type: Bug Reporter: Lin Liu Assignee: Y Ethan Guo The test would fail when we use different FSV storage type. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7336][RFR|DNM] Introduce new HoodieStorage abstraction [hudi]
hudi-bot commented on PR #10567: URL: https://github.com/apache/hudi/pull/10567#issuecomment-1913330268 ## CI report: * 8b765537863e155372fed079ea520f0a91a45b84 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22170) * ea050a0d021273313ef3d9e5e3c566186e0b28ee UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7335] Create hudi-hadoop-common for hadoop-specific implementation [hudi]
hudi-bot commented on PR #10564: URL: https://github.com/apache/hudi/pull/10564#issuecomment-1913330252 ## CI report: * 5b1fb24ea8a64c0373fa8e901802d6f3d0f5ff33 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22163) * 7143fc0d8b881c9d28040d2b9c290c4d8b2e0f54 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7334] Remove EMBEDDED_KV_STORE based FSV usage in tests [hudi]
hudi-bot commented on PR #10551: URL: https://github.com/apache/hudi/pull/10551#issuecomment-1913330242 ## CI report: * 1a51332ae5c8ea94b303c1e0084cf4c33cba5113 UNKNOWN * 4612643dfe4861c49642612a3918b67ed0785eda Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22164) * 09980192013540f9465d1549022bd06de0414238 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7351] Hive-sync partition pushdown does not work with glue [hudi]
parisni commented on code in PR #10572: URL: https://github.com/apache/hudi/pull/10572#discussion_r1468632782 ## hudi-aws/src/main/java/org/apache/hudi/aws/sync/AWSGlueCatalogSyncClient.java: ## @@ -578,6 +578,14 @@ public Map getMetastoreSchema(String tableName) { } } + @Override + public List getMetastoreFieldSchemas(String tableName) { +Map schema = getMetastoreSchema(tableName); +return schema.entrySet().stream() + .map(f -> new FieldSchema(f.getKey(), f.getValue())) + .collect(Collectors.toList()); + } Review Comment: it needs an integration tests with a glue endpoint. It might be possible with moto. However it's a large task. I can work on it on an other PR. That would make the aws module more stable i hope -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7348) Replace Configuration with StorageConfiguration for storage configuration
[ https://issues.apache.org/jira/browse/HUDI-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7348: Summary: Replace Configuration with StorageConfiguration for storage configuration (was: Replace Configuration with TypedProperties for storage configuration) > Replace Configuration with StorageConfiguration for storage configuration > - > > Key: HUDI-7348 > URL: https://issues.apache.org/jira/browse/HUDI-7348 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] Hudi 6868 - Support extracting passwords from credential store for Hive Sync [hudi]
hudi-bot commented on PR #10577: URL: https://github.com/apache/hudi/pull/10577#issuecomment-1913306009 ## CI report: * ff12f8a7d10731760db2cfab799618a406507979 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22183) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Hudi 6868 - Support extracting passwords from credential store for Hive Sync [hudi]
hudi-bot commented on PR #10577: URL: https://github.com/apache/hudi/pull/10577#issuecomment-1913274310 ## CI report: * ff12f8a7d10731760db2cfab799618a406507979 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22183) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Hudi 6868 - Support extracting passwords from credential store for Hive Sync [hudi]
hudi-bot commented on PR #10577: URL: https://github.com/apache/hudi/pull/10577#issuecomment-1913272241 ## CI report: * ff12f8a7d10731760db2cfab799618a406507979 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Hudi 6868 - Support extracting passwords from credential store for Hive Sync [hudi]
ad1happy2go opened a new pull request, #10577: URL: https://github.com/apache/hudi/pull/10577 ### Change Logs Hive Sync was not able to extract the password from Hadoop credential store. Added logic to do the same if available ### Impact _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write none, low medium or high below) low ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [BUG] Failure Encountered When Reading Hudi with Flink in Batch Runtime Mode and FlinkOptions.READ_AS_STREAMING=false [hudi]
ailinzhou opened a new issue, #10576: URL: https://github.com/apache/hudi/issues/10576 I am currently experiencing an issue when attempting to read Hudi with Flink. The problem arises when I configure the Flink RuntimeMode as 'batch' and set the Hudi FlinkOptions.READ_AS_STREAMING to 'false'. A clear and concise description of the problem. **To Reproduce** 1. Set Flink RuntimeMode to 'batch'. 2. Set Hudi FlinkOptions.READ_AS_STREAMING to 'false'. 3. Attempt to read Hudi with Flink. **Expected behavior** I expected read Hudi table in batch successfully with Flink under these configurations. ** Actual behavior ** A failure occurs when attempting to read Hudi with Flink under these configurations. **Environment Description** * Hudi version : From 1.10 ~ 1.14 * Flink version: 1.13 **Additional context** In the `HoodieTableSource` implementation for Flink's `DynamicTableSource`, a `ScanRuntimeProvider` is provided. This `ScanRuntimeProvider` implements the `produceDataStream` method, which generates a `DataStreamSource`. However, when in Bounded mode, it not explicitly specify the `Boundedness` parameter. By default, Flink uses `Boundedness.CONTINUOUS_UNBOUNDED` as the default parameter, which could potentially be the cause of this issue. [Code at Hudi HoodieTableSource.java ](https://github.com/apache/hudi/blob/4c7ac6112daab349ebcdd1fbb2216d9d1138ca14/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSource.java#L227C1-L228C1) ``` java if (conf.getBoolean(FlinkOptions.READ_AS_STREAMING)) { ... } else { ... DataStreamSource source = execEnv.addSource(func, asSummaryString(), typeInfo); ... } ``` Perhaps the code could be modified as follows: ``` java if (!isBounded()) { ... } else { ... DataStreamSource source = execEnv.addSource(func, asSummaryString(), typeInfo, Boundedness.BOUNDED); ... } ``` **Stacktrace** ``` java Caused by: java.lang.IllegalStateException: Detected an UNBOUNDED source with the 'execution.runtime-mode' set to 'BATCH'. This combination is not allowed, please set the 'execution.runtime-mode' to STREAMING or AUTOMATIC org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: Detected an UNBOUNDED source with the 'execution.runtime-mode' set to 'BATCH'. This combination is not allowed, please set the 'execution.runtime-mode' to STREAMING or AUTOMATIC at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:381) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:223) at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:812) at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:246) at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1054) at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Issue with reading the debezium inputs [hudi]
ad1happy2go commented on issue #10561: URL: https://github.com/apache/hudi/issues/10561#issuecomment-1913161095 @zyperd Thanks a lot. As discussed can you create a PR for adding the catch block where we can't instantiate using Metrics provider and print error message. Raised jira on same. Let me know in case you need any help. thanks. https://issues.apache.org/jira/browse/HUDI-7352 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7352) Add WARN log in catch block when not able to instantiate with Metrics Provider
Aditya Goenka created HUDI-7352: --- Summary: Add WARN log in catch block when not able to instantiate with Metrics Provider Key: HUDI-7352 URL: https://issues.apache.org/jira/browse/HUDI-7352 Project: Apache Hudi Issue Type: Improvement Components: deltastreamer Reporter: Aditya Goenka Fix For: 1.1.0 Github Issue - [https://github.com/apache/hudi/issues/10561] -- This message was sent by Atlassian Jira (v8.20.10#820010)