Re: [PR] [HUDI-7140] [DNM] Trial Patch to test CI run [hudi]
hudi-bot commented on PR #10176: URL: https://github.com/apache/hudi/pull/10176#issuecomment-1833257061 ## CI report: * a9ac4a84bfe187f9a85815aa0ce7f766f7e0b76e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21247) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-5823][RFC-65] RFC for Partition Lifecycle Management [hudi]
stream2000 commented on PR #8062: URL: https://github.com/apache/hudi/pull/8062#issuecomment-1833238919 For 1.0.0 and later hudi version which supports efficient completion time queries on the timeline(#9565), we can get partition's `lastModifiedTime` by scanning the timeline and get the last write commit for the partition. Also for efficiency, we can store the partitions' last modified time and current completion time in the replace commit metadata. The next time we need to calculate the partitions' last modified time, we can build incrementally from the replace commit metadata of the last ttl management. @danny0405 Added new `lastModifiedTime` calculation method for 1.0.0 and later hudi version. We plan to implement the file listing based `lastModifiedTime` at first and implement the timeline-based `lastModifiedTime` calculation in a separate PR. This will help users with earlier hudi versions easy to pick the function to their code base. I have addressed all comments according to online/offline discussions. If there is no other concern, we can move on this~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [SUPPORT] Kafka Avro Confluent Schema Registry version 7 compatibility issues [hudi]
zachtrong opened a new issue, #10217: URL: https://github.com/apache/hudi/issues/10217 **Context** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? Yes **Problem description** Latest Apache Hudi release (0.14.0) are using kafka-avro-serializer-5.3.4.jar, which causes deserialization issues when apply with Kafka Avro datasource and confluent REST api version 6/7. Sample Avro schema: ``` {"id":36,"subject":"test-value","version":12,"schema":"{\"type\":\"record\",\"name\":\"test\",\"namespace\":\"test\",\"fields\":[{\"name\":\"id\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"name\",\"type\":[\"null\",\"string\"],\"default\":null}],\"connect.name\":\"mongo.bot.test.test\"}","references":[]} ``` Key error is **"references": []**. **Suggestion** Upgrade jar to io.confluent:kafka-avro-serializer:7.5.1 from confluent repository. Reference: [kafka-avro-serializer:7.5.1](https://github.com/confluentinc/schema-registry/blob/v7.5.1/client/src/main/java/io/confluent/kafka/schemaregistry/client/rest/entities/SchemaString.java) **Expected behavior** Apache Hudi is able to parse the above Avro schema without error. **Environment Description** * Hudi version: 0.14.0 * Spark version: 3.4.1 * Hive version: 3.1.3 * Hadoop version: 3.3.4 * Storage (HDFS/S3/GCS..): S3 * Running on Docker? (yes/no): yes **Additional context** Add any other context about the problem here. **Stacktrace** ```Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id 25 Caused by: com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: Unrecognized field "references" (class io.confluent.kafka.schemaregistry.client.rest.entities.SchemaString), not marked as ignorable (one known property: "schema"]) at [Source: (sun.net.www.protocol.http.HttpURLConnection$HttpInputStream); line: 1, column: 2063] (through reference chain: io.confluent.kafka.schemaregistry.client.rest.entities.SchemaString["references"]) at com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException.from(UnrecognizedPropertyException.java:61) at com.fasterxml.jackson.databind.DeserializationContext.handleUnknownProperty(DeserializationContext.java:1132) at com.fasterxml.jackson.databind.deser.std.StdDeserializer.handleUnknownProperty(StdDeserializer.java:2202) at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownProperty(BeanDeserializerBase.java:1705) at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownVanilla(BeanDeserializerBase.java:1683) at com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:320) at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:177) at com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323) at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4730) at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3722) at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:221) at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:265) at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:495) at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:488) at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getSchemaByIdFromRegistry(CachedSchemaRegistryClient.java:177) at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getBySubjectAndId(CachedSchemaRegistryClient.java:256) at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getById(CachedSchemaRegistryClient.java:235) at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:107) at org.apache.hudi.utilities.deser.KafkaAvroSchemaDeserializer.deserialize(KafkaAvroSchemaDeserializer.java:79) at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:79) at io.confluent.kafka.serializers.KafkaAvroDeserializer.deserialize(KafkaAvroDeserializer.java:55) at org.apache.kafka.common.serialization.Deserializer.deserialize(Deserializer.java:60) at org.apache.kafka.clients.consumer.internals.Fetcher.parseRecord(Fetcher.java:1386) at org.apache.kafka.clients.consumer.internals.Fetcher.access$3400(Fetcher.java:133) at org.apache.kafka.clients.consumer.
Re: [PR] [HUDI-7159]Check the table type between hoodie.properies and table options [hudi]
hudi-bot commented on PR #10209: URL: https://github.com/apache/hudi/pull/10209#issuecomment-1833155126 ## CI report: * a22a697252040fc83d09e2b443942859b0b1d421 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21218) * a5136d4c7459d2902dc00750c24b5e48820ea619 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21248) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7159]Check the table type between hoodie.properies and table options [hudi]
hudi-bot commented on PR #10209: URL: https://github.com/apache/hudi/pull/10209#issuecomment-1833149284 ## CI report: * a22a697252040fc83d09e2b443942859b0b1d421 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21218) * a5136d4c7459d2902dc00750c24b5e48820ea619 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7140] [DNM] Trial Patch to test CI run [hudi]
hudi-bot commented on PR #10176: URL: https://github.com/apache/hudi/pull/10176#issuecomment-1833149163 ## CI report: * 3c894596a90a326707d4aa052e34cf9f09daae75 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21224) * a9ac4a84bfe187f9a85815aa0ce7f766f7e0b76e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21247) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7137] Implement Bootstrap for new FG reader [hudi]
hudi-bot commented on PR #10137: URL: https://github.com/apache/hudi/pull/10137#issuecomment-1833149064 ## CI report: * 77205b47c45501a0d9de1ebc74d5bb8c960cd95a UNKNOWN * 993d0dc63f3f552b8bf5b52c113f3ae8ef53304c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21241) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7077] Fix OOM error for a test [hudi]
hudi-bot commented on PR #10216: URL: https://github.com/apache/hudi/pull/10216#issuecomment-1833143559 ## CI report: * a015a4f2dce1814e8387a41a6cf9842404a874c1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21245) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7125] Fix bugs for CDC queries [hudi]
hudi-bot commented on PR #10144: URL: https://github.com/apache/hudi/pull/10144#issuecomment-1833143354 ## CI report: * fcc90e964c0ee3a12f0f90bf216051e0bc3b7eaa Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21244) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7140] [DNM] Trial Patch to test CI run [hudi]
hudi-bot commented on PR #10176: URL: https://github.com/apache/hudi/pull/10176#issuecomment-1833143442 ## CI report: * 3c894596a90a326707d4aa052e34cf9f09daae75 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21224) * a9ac4a84bfe187f9a85815aa0ce7f766f7e0b76e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6207] spark support bucket index query for table with bucket index [hudi]
KnightChess commented on code in PR #10191: URL: https://github.com/apache/hudi/pull/10191#discussion_r1410172249 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/BucketIndexSupport.scala: ## @@ -0,0 +1,164 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi + +import org.apache.hadoop.fs.FileStatus +import org.apache.hudi.common.config.HoodieMetadataConfig +import org.apache.hudi.common.fs.FSUtils +import org.apache.hudi.common.table.HoodieTableConfig +import org.apache.hudi.config.HoodieIndexConfig +import org.apache.hudi.index.HoodieIndex +import org.apache.hudi.index.HoodieIndex.IndexType +import org.apache.hudi.index.bucket.BucketIdentifier +import org.apache.log4j.LogManager +import org.apache.spark.sql.catalyst.expressions +import org.apache.spark.sql.catalyst.expressions.{And, Attribute, EmptyRow, Expression, Literal} +import org.apache.spark.sql.types.{DoubleType, FloatType} +import org.apache.spark.util.collection.BitSet + +import scala.collection.{JavaConverters, mutable} + +class BucketIndexSupport(metadataConfig: HoodieMetadataConfig) { + + private val log = LogManager.getLogger(getClass); + + /** + * Returns the configured bucket field for the table + */ + private def getBucketHashField: Option[String] = { +val bucketHashFields = metadataConfig.getString(HoodieIndexConfig.BUCKET_INDEX_HASH_FIELD) +if (bucketHashFields == null) { + val recordKeys = metadataConfig.getString(HoodieTableConfig.RECORDKEY_FIELDS) + if (recordKeys == null) { +Option.apply(null) + } else { +val recordKeyArray = recordKeys.split(",") +if (recordKeyArray.length == 1) { + Option.apply(recordKeyArray(0)) +} else { + log.warn("bucket query index only support one bucket field") + Option.apply(null) +} + } +} else { + val fields = bucketHashFields.split(",") + if (fields.length == 1) { +Option.apply(fields(0)) + } else { +log.warn("bucket query index only support one bucket field") +Option.apply(null) + } +} + } + + def getCandidateFiles(allFiles: Seq[FileStatus], bucketIds: BitSet): Set[String] = { +val candidateFiles: mutable.Set[String] = mutable.Set.empty +for (file <- allFiles) { + val fileId = FSUtils.getFileIdFromFilePath(file.getPath) + val fileBucketId = BucketIdentifier.bucketIdFromFileId(fileId) + if (bucketIds.get(fileBucketId)) { +candidateFiles += file.getPath.getName + } +} +candidateFiles.toSet + } + + def filterQueriesWithBucketHashField(queryFilters: Seq[Expression]): Option[BitSet] = { +val bucketNumber = metadataConfig.getInt(HoodieIndexConfig.BUCKET_INDEX_NUM_BUCKETS) +val bucketHashFieldOpt = getBucketHashField +if (bucketHashFieldOpt.isEmpty || queryFilters.isEmpty) { + None +} else { + val matchedBuckets = getExpressionBuckets(queryFilters.reduce(And), bucketHashFieldOpt.get, bucketNumber) + + val numBucketsSelected = matchedBuckets.cardinality() + + // None means all the buckets need to be scanned + if (numBucketsSelected == bucketNumber) { +log.info("bucket query match all file slice, fallback other index") +None + } else { +Some(matchedBuckets) + } +} + } + + private def getExpressionBuckets(expr: Expression, bucketColumnName: String, numBuckets: Int): BitSet = { + +def getBucketNumber(attr: Attribute, v: Any): Int = { + BucketIdentifier.getBucketId(JavaConverters.seqAsJavaListConverter(List.apply(String.valueOf(v))).asJava, numBuckets) Review Comment: > In order to support index key pruning, only conjunction predicates that concantenate equastions of the hash keys are supported, see yes, I have also considered this scenario, but its usage is very restricted and not very flexible. And the business use cases are also limited. I can independently apply additional logic processing to multiple fields, decoupling it from the processing of a single field. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub
Re: [PR] [HUDI-6207] spark support bucket index query for table with bucket index [hudi]
KnightChess commented on code in PR #10191: URL: https://github.com/apache/hudi/pull/10191#discussion_r1410164643 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/BucketIndexSupport.scala: ## @@ -0,0 +1,164 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi + +import org.apache.hadoop.fs.FileStatus +import org.apache.hudi.common.config.HoodieMetadataConfig +import org.apache.hudi.common.fs.FSUtils +import org.apache.hudi.common.table.HoodieTableConfig +import org.apache.hudi.config.HoodieIndexConfig +import org.apache.hudi.index.HoodieIndex +import org.apache.hudi.index.HoodieIndex.IndexType +import org.apache.hudi.index.bucket.BucketIdentifier +import org.apache.log4j.LogManager +import org.apache.spark.sql.catalyst.expressions +import org.apache.spark.sql.catalyst.expressions.{And, Attribute, EmptyRow, Expression, Literal} +import org.apache.spark.sql.types.{DoubleType, FloatType} +import org.apache.spark.util.collection.BitSet + +import scala.collection.{JavaConverters, mutable} + +class BucketIndexSupport(metadataConfig: HoodieMetadataConfig) { + + private val log = LogManager.getLogger(getClass); + + /** + * Returns the configured bucket field for the table + */ + private def getBucketHashField: Option[String] = { +val bucketHashFields = metadataConfig.getString(HoodieIndexConfig.BUCKET_INDEX_HASH_FIELD) +if (bucketHashFields == null) { + val recordKeys = metadataConfig.getString(HoodieTableConfig.RECORDKEY_FIELDS) + if (recordKeys == null) { +Option.apply(null) + } else { +val recordKeyArray = recordKeys.split(",") +if (recordKeyArray.length == 1) { + Option.apply(recordKeyArray(0)) +} else { + log.warn("bucket query index only support one bucket field") + Option.apply(null) +} + } +} else { + val fields = bucketHashFields.split(",") + if (fields.length == 1) { +Option.apply(fields(0)) + } else { +log.warn("bucket query index only support one bucket field") +Option.apply(null) + } +} + } + + def getCandidateFiles(allFiles: Seq[FileStatus], bucketIds: BitSet): Set[String] = { +val candidateFiles: mutable.Set[String] = mutable.Set.empty +for (file <- allFiles) { + val fileId = FSUtils.getFileIdFromFilePath(file.getPath) + val fileBucketId = BucketIdentifier.bucketIdFromFileId(fileId) + if (bucketIds.get(fileBucketId)) { +candidateFiles += file.getPath.getName + } +} +candidateFiles.toSet + } + + def filterQueriesWithBucketHashField(queryFilters: Seq[Expression]): Option[BitSet] = { +val bucketNumber = metadataConfig.getInt(HoodieIndexConfig.BUCKET_INDEX_NUM_BUCKETS) +val bucketHashFieldOpt = getBucketHashField +if (bucketHashFieldOpt.isEmpty || queryFilters.isEmpty) { + None +} else { + val matchedBuckets = getExpressionBuckets(queryFilters.reduce(And), bucketHashFieldOpt.get, bucketNumber) + + val numBucketsSelected = matchedBuckets.cardinality() + + // None means all the buckets need to be scanned + if (numBucketsSelected == bucketNumber) { +log.info("bucket query match all file slice, fallback other index") Review Comment: sounds good -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6207] spark support bucket index query for table with bucket index [hudi]
KnightChess commented on code in PR #10191: URL: https://github.com/apache/hudi/pull/10191#discussion_r1410164092 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieFileIndex.scala: ## @@ -340,9 +347,18 @@ case class HoodieFileIndex(spark: SparkSession, // and candidate files are obtained from these file slices. lazy val queryReferencedColumns = collectReferencedColumns(spark, queryFilters, schema) - +// bucket query index +var bucketIds = Option.empty[BitSet] +if (bucketIndex.isIndexAvailable && isDataSkippingEnabled) { + bucketIds = bucketIndex.filterQueriesWithBucketHashField(queryFilters) +} +// record index lazy val (_, recordKeys) = recordLevelIndex.filterQueriesWithRecordKey(queryFilters) -if (!isMetadataTableEnabled || !isDataSkippingEnabled) { + +// index chose +if (bucketIndex.isIndexAvailable && bucketIds.isDefined && bucketIds.get.cardinality() > 0) { + Option.apply(bucketIndex.getCandidateFiles(allBaseFiles, bucketIds.get)) Review Comment: > file skipping within a file group sorry, I can't understand the meaning of this sentence, can you explain it in detail? in my opinion, a file group consists of multiple file silce. And allBaseeFile will returen latest file slice. ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieFileIndex.scala: ## @@ -340,9 +347,18 @@ case class HoodieFileIndex(spark: SparkSession, // and candidate files are obtained from these file slices. lazy val queryReferencedColumns = collectReferencedColumns(spark, queryFilters, schema) - +// bucket query index +var bucketIds = Option.empty[BitSet] +if (bucketIndex.isIndexAvailable && isDataSkippingEnabled) { + bucketIds = bucketIndex.filterQueriesWithBucketHashField(queryFilters) +} +// record index lazy val (_, recordKeys) = recordLevelIndex.filterQueriesWithRecordKey(queryFilters) -if (!isMetadataTableEnabled || !isDataSkippingEnabled) { + +// index chose +if (bucketIndex.isIndexAvailable && bucketIds.isDefined && bucketIds.get.cardinality() > 0) { + Option.apply(bucketIndex.getCandidateFiles(allBaseFiles, bucketIds.get)) Review Comment: > file skipping within a file group sorry, I can't understand the meaning of this sentence, can you explain it in detail? in my opinion, a file group consists of multiple file silce. And allBaseeFile will returen latest file slice. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated: [HUDI-7160] Copy over schema properties when adding Hudi Metadata fields (#10212)
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 38db88c8a2b [HUDI-7160] Copy over schema properties when adding Hudi Metadata fields (#10212) 38db88c8a2b is described below commit 38db88c8a2bb0c378295324692c4c0388e60e466 Author: Tim Brown AuthorDate: Wed Nov 29 22:54:12 2023 -0600 [HUDI-7160] Copy over schema properties when adding Hudi Metadata fields (#10212) --- .../java/org/apache/hudi/avro/HoodieAvroUtils.java | 3 +++ .../org/apache/hudi/avro/TestHoodieAvroUtils.java | 25 ++ 2 files changed, 28 insertions(+) diff --git a/hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java b/hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java index 3800d9c1053..ac7dcd42979 100644 --- a/hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java +++ b/hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java @@ -302,6 +302,9 @@ public class HoodieAvroUtils { } Schema mergedSchema = Schema.createRecord(schema.getName(), schema.getDoc(), schema.getNamespace(), false); +for (Map.Entry prop : schema.getObjectProps().entrySet()) { + mergedSchema.addProp(prop.getKey(), prop.getValue()); +} mergedSchema.setFields(parentFields); return mergedSchema; } diff --git a/hudi-common/src/test/java/org/apache/hudi/avro/TestHoodieAvroUtils.java b/hudi-common/src/test/java/org/apache/hudi/avro/TestHoodieAvroUtils.java index 28b05435244..eb20081475f 100644 --- a/hudi-common/src/test/java/org/apache/hudi/avro/TestHoodieAvroUtils.java +++ b/hudi-common/src/test/java/org/apache/hudi/avro/TestHoodieAvroUtils.java @@ -99,6 +99,12 @@ public class TestHoodieAvroUtils { + "{\"name\": \"non_pii_col\", \"type\": \"string\"}," + "{\"name\": \"pii_col\", \"type\": \"string\", \"column_category\": \"user_profile\"}]}"; + private static final String EXAMPLE_SCHEMA_WITH_PROPS = "{\"type\": \"record\",\"name\": \"testrec\",\"fields\": [ " + + "{\"name\": \"timestamp\",\"type\": \"double\", \"custom_field_property\":\"value\"},{\"name\": \"_row_key\", \"type\": \"string\"}," + + "{\"name\": \"non_pii_col\", \"type\": \"string\"}," + + "{\"name\": \"pii_col\", \"type\": \"string\", \"column_category\": \"user_profile\"}], " + + "\"custom_schema_property\": \"custom_schema_property_value\"}"; + private static int NUM_FIELDS_IN_EXAMPLE_SCHEMA = 4; private static String SCHEMA_WITH_METADATA_FIELD = "{\"type\": \"record\",\"name\": \"testrec2\",\"fields\": [ " @@ -604,4 +610,23 @@ public class TestHoodieAvroUtils { .subtract((BigDecimal) unwrapAvroValueWrapper(wrapperValue)).toPlainString()); } } + + @Test + public void testAddMetadataFields() { +Schema baseSchema = new Schema.Parser().parse(EXAMPLE_SCHEMA_WITH_PROPS); +Schema schemaWithMetadata = HoodieAvroUtils.addMetadataFields(baseSchema); +List updatedFields = schemaWithMetadata.getFields(); +// assert fields added in expected order +assertEquals(HoodieRecord.COMMIT_TIME_METADATA_FIELD, updatedFields.get(0).name()); +assertEquals(HoodieRecord.COMMIT_SEQNO_METADATA_FIELD, updatedFields.get(1).name()); +assertEquals(HoodieRecord.RECORD_KEY_METADATA_FIELD, updatedFields.get(2).name()); +assertEquals(HoodieRecord.PARTITION_PATH_METADATA_FIELD, updatedFields.get(3).name()); +assertEquals(HoodieRecord.FILENAME_METADATA_FIELD, updatedFields.get(4).name()); +// assert original fields are copied over +List originalFieldsInUpdatedSchema = updatedFields.subList(5, updatedFields.size()); +assertEquals(baseSchema.getFields(), originalFieldsInUpdatedSchema); +// validate properties are properly copied over +assertEquals("custom_schema_property_value", schemaWithMetadata.getProp("custom_schema_property")); +assertEquals("value", originalFieldsInUpdatedSchema.get(0).getProp("custom_field_property")); + } }
Re: [PR] [HUDI-7160] Copy over schema properties when adding Hudi Metadata fields [hudi]
nsivabalan merged PR #10212: URL: https://github.com/apache/hudi/pull/10212 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated: [HUDI-7161] Add commit action type and extra metadata to write callback on commit message (#10213)
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new b244e5a7b7b [HUDI-7161] Add commit action type and extra metadata to write callback on commit message (#10213) b244e5a7b7b is described below commit b244e5a7b7b4f806d51663d602b39fd724ed5d62 Author: Rajesh Mahindra <76502047+rmahindra...@users.noreply.github.com> AuthorDate: Wed Nov 29 20:53:34 2023 -0800 [HUDI-7161] Add commit action type and extra metadata to write callback on commit message (#10213) - Co-authored-by: rmahindra123 --- .../common/HoodieWriteCommitCallbackMessage.java | 36 +- .../apache/hudi/client/BaseHoodieWriteClient.java | 3 +- 2 files changed, 37 insertions(+), 2 deletions(-) diff --git a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/callback/common/HoodieWriteCommitCallbackMessage.java b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/callback/common/HoodieWriteCommitCallbackMessage.java index 8210693a756..808f643da56 100644 --- a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/callback/common/HoodieWriteCommitCallbackMessage.java +++ b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/callback/common/HoodieWriteCommitCallbackMessage.java @@ -20,9 +20,11 @@ package org.apache.hudi.callback.common; import org.apache.hudi.ApiMaturityLevel; import org.apache.hudi.PublicAPIClass; import org.apache.hudi.common.model.HoodieWriteStat; +import org.apache.hudi.common.util.Option; import java.io.Serializable; import java.util.List; +import java.util.Map; /** * Base callback message, which contains commitTime and tableName only for now. @@ -52,11 +54,35 @@ public class HoodieWriteCommitCallbackMessage implements Serializable { */ private final List hoodieWriteStat; - public HoodieWriteCommitCallbackMessage(String commitTime, String tableName, String basePath, List hoodieWriteStat) { + /** + * Action Type of the commit. + */ + private final Option commitActionType; + + /** + * Extra metadata in the commit. + */ + private final Option> extraMetadata; + + public HoodieWriteCommitCallbackMessage(String commitTime, + String tableName, + String basePath, + List hoodieWriteStat) { +this(commitTime, tableName, basePath, hoodieWriteStat, Option.empty(), Option.empty()); + } + + public HoodieWriteCommitCallbackMessage(String commitTime, + String tableName, + String basePath, + List hoodieWriteStat, + Option commitActionType, + Option> extraMetadata) { this.commitTime = commitTime; this.tableName = tableName; this.basePath = basePath; this.hoodieWriteStat = hoodieWriteStat; +this.commitActionType = commitActionType; +this.extraMetadata = extraMetadata; } public String getCommitTime() { @@ -74,4 +100,12 @@ public class HoodieWriteCommitCallbackMessage implements Serializable { public List getHoodieWriteStat() { return hoodieWriteStat; } + + public Option getCommitActionType() { +return commitActionType; + } + + public Option> getExtraMetadata() { +return extraMetadata; + } } diff --git a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java index 7dbd07ea1cc..a3aa6699027 100644 --- a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java +++ b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java @@ -265,7 +265,8 @@ public abstract class BaseHoodieWriteClient extends BaseHoodieClient if (null == commitCallback) { commitCallback = HoodieCommitCallbackFactory.create(config); } - commitCallback.call(new HoodieWriteCommitCallbackMessage(instantTime, config.getTableName(), config.getBasePath(), stats)); + commitCallback.call(new HoodieWriteCommitCallbackMessage( + instantTime, config.getTableName(), config.getBasePath(), stats, Option.of(commitActionType), extraMetadata)); } return true; }
Re: [PR] [HUDI-7161] Add commit action type and extra metadata to write callback on commit message [hudi]
nsivabalan merged PR #10213: URL: https://github.com/apache/hudi/pull/10213 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7059] Hudi filter pushdown for positional merging [hudi]
linliu-code commented on code in PR #10167: URL: https://github.com/apache/hudi/pull/10167#discussion_r1410140356 ## hudi-spark-datasource/hudi-spark3.5.x/src/main/scala/org/apache/spark/sql/adapter/Spark3_5Adapter.scala: ## @@ -127,4 +126,17 @@ class Spark3_5Adapter extends BaseSpark3Adapter { case OFF_HEAP => "OFF_HEAP" case _ => throw new IllegalArgumentException(s"Invalid StorageLevel: $level") } + + override def appendRowIndexColumnForParquetFileReader(requiredSchema: StructType, shouldUseRecordPosition: Boolean): StructType = { +if (shouldUseRecordPosition) StructType(requiredSchema.toArray :+ FileSourceGeneratedMetadataStructField( + ROW_INDEX_TEMPORARY_COLUMN_NAME, ROW_INDEX_TEMPORARY_COLUMN_NAME, LongType, nullable = false)) else requiredSchema + } + + override def appendRowIndexColumnForFileGroupReader(requiredSchema: StructType, shouldUseRecordPosition: Boolean): StructType = { +if (shouldUseRecordPosition) StructType(requiredSchema.toArray :+ ROW_INDEX_FIELD) else requiredSchema + } + + override def getDataFilters(requiredFilters: Seq[Filter], recordKeyRelatedFilters: Seq[Filter], shouldUseRecordPosition: Boolean): Seq[Filter] = { +requiredFilters ++ recordKeyRelatedFilters + } Review Comment: Do you have any examples to do that? What benefits? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7059] Hudi filter pushdown for positional merging [hudi]
linliu-code commented on code in PR #10167: URL: https://github.com/apache/hudi/pull/10167#discussion_r1410140489 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedParquetFileFormat.scala: ## @@ -300,12 +303,13 @@ class HoodieFileGroupReaderBasedParquetFileFormat(tableState: HoodieTableState, val baseFileReader = super.buildReaderWithPartitionValues(sparkSession, dataSchema, partitionSchema, requiredSchema, filters ++ requiredFilters, options, new Configuration(hadoopConf)) -//file reader for reading a hudi base file that needs to be merged with log files +// File reader for reading a Hoodie base file that needs to be merged with log files val preMergeBaseFileReader = if (isMOR) { // Add support for reading files using inline file system. - super.buildReaderWithPartitionValues(sparkSession, dataSchema, partitionSchema, requiredSchemaWithMandatory, -if (shouldUseRecordPosition) requiredFilters else recordKeyRelatedFilters ++ requiredFilters, -options, new Configuration(hadoopConf)) + val appliedRequiredSchema = sparkAdapter.appendRowIndexColumnForParquetFileReader(requiredSchemaWithMandatory, shouldUseRecordPosition) Review Comment: Ok. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7077] Fix OOM error for a test [hudi]
hudi-bot commented on PR #10216: URL: https://github.com/apache/hudi/pull/10216#issuecomment-1833073697 ## CI report: * 9206bb059d0f22ee3e1110b3d269ce2f777c358f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21243) * a015a4f2dce1814e8387a41a6cf9842404a874c1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21245) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7125] Fix bugs for CDC queries [hudi]
hudi-bot commented on PR #10144: URL: https://github.com/apache/hudi/pull/10144#issuecomment-1833073505 ## CI report: * 6de03812710b802b83d8b2efb8c31849d4be0202 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21242) * fcc90e964c0ee3a12f0f90bf216051e0bc3b7eaa Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21244) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Allow concurrent modification for heartbeat map [hudi]
hudi-bot commented on PR #10215: URL: https://github.com/apache/hudi/pull/10215#issuecomment-1833068070 ## CI report: * bd5d820f323c66fbcf7492c61d23585a581e76cc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21238) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7125] Fix bugs for CDC queries [hudi]
hudi-bot commented on PR #10144: URL: https://github.com/apache/hudi/pull/10144#issuecomment-1833063296 ## CI report: * 3e63db8a1620a25197071d21714d06144f1fbb04 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21237) * 6de03812710b802b83d8b2efb8c31849d4be0202 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21242) * fcc90e964c0ee3a12f0f90bf216051e0bc3b7eaa UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7077] Fix OOM error for a test [hudi]
hudi-bot commented on PR #10216: URL: https://github.com/apache/hudi/pull/10216#issuecomment-1833058403 ## CI report: * 9206bb059d0f22ee3e1110b3d269ce2f777c358f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21243) * a015a4f2dce1814e8387a41a6cf9842404a874c1 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7041] Optimize the memory usage of timeline server for table service [hudi]
zhuanshenbsj1 commented on PR #10002: URL: https://github.com/apache/hudi/pull/10002#issuecomment-1833053275 > Hi, Could you please merge the 0.14.1 (0.x) branch to support it? When will version 0.14.1 be released? > > @zhuanshenbsj1 @danny0405 Yes, it will merge to 0.14.1 and will be released soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7041] Optimize the memory usage of timeline server for table service [hudi]
zyclove commented on PR #10002: URL: https://github.com/apache/hudi/pull/10002#issuecomment-1833035662 Hi, Could you please merge the 0.14.1 (0.x) branch to support it? When will version 0.14.1 be released? @zhuanshenbsj1 @danny0405 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7059] Hudi filter pushdown for positional merging [hudi]
yihua commented on code in PR #10167: URL: https://github.com/apache/hudi/pull/10167#discussion_r1410100073 ## hudi-spark-datasource/hudi-spark3.5.x/src/main/scala/org/apache/spark/sql/adapter/Spark3_5Adapter.scala: ## @@ -127,4 +126,17 @@ class Spark3_5Adapter extends BaseSpark3Adapter { case OFF_HEAP => "OFF_HEAP" case _ => throw new IllegalArgumentException(s"Invalid StorageLevel: $level") } + + override def appendRowIndexColumnForParquetFileReader(requiredSchema: StructType, shouldUseRecordPosition: Boolean): StructType = { +if (shouldUseRecordPosition) StructType(requiredSchema.toArray :+ FileSourceGeneratedMetadataStructField( + ROW_INDEX_TEMPORARY_COLUMN_NAME, ROW_INDEX_TEMPORARY_COLUMN_NAME, LongType, nullable = false)) else requiredSchema + } + + override def appendRowIndexColumnForFileGroupReader(requiredSchema: StructType, shouldUseRecordPosition: Boolean): StructType = { +if (shouldUseRecordPosition) StructType(requiredSchema.toArray :+ ROW_INDEX_FIELD) else requiredSchema + } + + override def getDataFilters(requiredFilters: Seq[Filter], recordKeyRelatedFilters: Seq[Filter], shouldUseRecordPosition: Boolean): Seq[Filter] = { +requiredFilters ++ recordKeyRelatedFilters + } Review Comment: Can we directly use Spark version as the criteria to fetch the row index with the Spark parquet reader instead of adding new APIs here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7125] Fix bugs for CDC queries [hudi]
hudi-bot commented on PR #10144: URL: https://github.com/apache/hudi/pull/10144#issuecomment-1833029957 ## CI report: * 3e63db8a1620a25197071d21714d06144f1fbb04 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21237) * 6de03812710b802b83d8b2efb8c31849d4be0202 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21242) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7077] Fix OOM error for a test [hudi]
hudi-bot commented on PR #10216: URL: https://github.com/apache/hudi/pull/10216#issuecomment-1833030144 ## CI report: * 9206bb059d0f22ee3e1110b3d269ce2f777c358f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21243) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6953] Adding test for composite keys with bulk insert row writer [hudi]
hudi-bot commented on PR #10214: URL: https://github.com/apache/hudi/pull/10214#issuecomment-1833030107 ## CI report: * 0ee77f22a2f213a1c581e443a52eb6965832abc4 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21233) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7137] Implement Bootstrap for new FG reader [hudi]
hudi-bot commented on PR #10137: URL: https://github.com/apache/hudi/pull/10137#issuecomment-1833029928 ## CI report: * 77205b47c45501a0d9de1ebc74d5bb8c960cd95a UNKNOWN * 31fe075c72fb189b9155e48ab3399e9199cc293a Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21231) * 993d0dc63f3f552b8bf5b52c113f3ae8ef53304c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21241) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6207] spark support bucket index query for table with bucket index [hudi]
danny0405 commented on code in PR #10191: URL: https://github.com/apache/hudi/pull/10191#discussion_r1410095547 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/BucketIndexSupport.scala: ## @@ -0,0 +1,164 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi + +import org.apache.hadoop.fs.FileStatus +import org.apache.hudi.common.config.HoodieMetadataConfig +import org.apache.hudi.common.fs.FSUtils +import org.apache.hudi.common.table.HoodieTableConfig +import org.apache.hudi.config.HoodieIndexConfig +import org.apache.hudi.index.HoodieIndex +import org.apache.hudi.index.HoodieIndex.IndexType +import org.apache.hudi.index.bucket.BucketIdentifier +import org.apache.log4j.LogManager +import org.apache.spark.sql.catalyst.expressions +import org.apache.spark.sql.catalyst.expressions.{And, Attribute, EmptyRow, Expression, Literal} +import org.apache.spark.sql.types.{DoubleType, FloatType} +import org.apache.spark.util.collection.BitSet + +import scala.collection.{JavaConverters, mutable} + +class BucketIndexSupport(metadataConfig: HoodieMetadataConfig) { + + private val log = LogManager.getLogger(getClass); + + /** + * Returns the configured bucket field for the table + */ + private def getBucketHashField: Option[String] = { +val bucketHashFields = metadataConfig.getString(HoodieIndexConfig.BUCKET_INDEX_HASH_FIELD) +if (bucketHashFields == null) { + val recordKeys = metadataConfig.getString(HoodieTableConfig.RECORDKEY_FIELDS) + if (recordKeys == null) { +Option.apply(null) + } else { +val recordKeyArray = recordKeys.split(",") +if (recordKeyArray.length == 1) { + Option.apply(recordKeyArray(0)) +} else { + log.warn("bucket query index only support one bucket field") + Option.apply(null) +} + } +} else { + val fields = bucketHashFields.split(",") + if (fields.length == 1) { +Option.apply(fields(0)) + } else { +log.warn("bucket query index only support one bucket field") +Option.apply(null) + } +} + } + + def getCandidateFiles(allFiles: Seq[FileStatus], bucketIds: BitSet): Set[String] = { +val candidateFiles: mutable.Set[String] = mutable.Set.empty +for (file <- allFiles) { + val fileId = FSUtils.getFileIdFromFilePath(file.getPath) + val fileBucketId = BucketIdentifier.bucketIdFromFileId(fileId) + if (bucketIds.get(fileBucketId)) { +candidateFiles += file.getPath.getName + } +} +candidateFiles.toSet + } + + def filterQueriesWithBucketHashField(queryFilters: Seq[Expression]): Option[BitSet] = { +val bucketNumber = metadataConfig.getInt(HoodieIndexConfig.BUCKET_INDEX_NUM_BUCKETS) +val bucketHashFieldOpt = getBucketHashField +if (bucketHashFieldOpt.isEmpty || queryFilters.isEmpty) { + None +} else { + val matchedBuckets = getExpressionBuckets(queryFilters.reduce(And), bucketHashFieldOpt.get, bucketNumber) + + val numBucketsSelected = matchedBuckets.cardinality() + + // None means all the buckets need to be scanned + if (numBucketsSelected == bucketNumber) { +log.info("bucket query match all file slice, fallback other index") +None + } else { +Some(matchedBuckets) + } +} + } + + private def getExpressionBuckets(expr: Expression, bucketColumnName: String, numBuckets: Int): BitSet = { + +def getBucketNumber(attr: Attribute, v: Any): Int = { + BucketIdentifier.getBucketId(JavaConverters.seqAsJavaListConverter(List.apply(String.valueOf(v))).asJava, numBuckets) Review Comment: In order to support index key pruning, only conjunction predicates that concantenate equastions of the hash keys are supported, see: https://github.com/apache/hudi/blob/d1c4ead8a80bc44731f1b615ba9166041c144948/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSource.java#L315 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apach
Re: [PR] [HUDI-7077] Fix OOM error for a test [hudi]
hudi-bot commented on PR #10216: URL: https://github.com/apache/hudi/pull/10216#issuecomment-1833023982 ## CI report: * 9206bb059d0f22ee3e1110b3d269ce2f777c358f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7125] Fix bugs for CDC queries [hudi]
hudi-bot commented on PR #10144: URL: https://github.com/apache/hudi/pull/10144#issuecomment-1833023807 ## CI report: * 3e63db8a1620a25197071d21714d06144f1fbb04 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21237) * 6de03812710b802b83d8b2efb8c31849d4be0202 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7137] Implement Bootstrap for new FG reader [hudi]
hudi-bot commented on PR #10137: URL: https://github.com/apache/hudi/pull/10137#issuecomment-1833023754 ## CI report: * 77205b47c45501a0d9de1ebc74d5bb8c960cd95a UNKNOWN * 31fe075c72fb189b9155e48ab3399e9199cc293a Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21231) * 993d0dc63f3f552b8bf5b52c113f3ae8ef53304c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7059] Hudi filter pushdown for positional merging [hudi]
yihua commented on code in PR #10167: URL: https://github.com/apache/hudi/pull/10167#discussion_r1410093495 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedParquetFileFormat.scala: ## @@ -300,12 +303,13 @@ class HoodieFileGroupReaderBasedParquetFileFormat(tableState: HoodieTableState, val baseFileReader = super.buildReaderWithPartitionValues(sparkSession, dataSchema, partitionSchema, requiredSchema, filters ++ requiredFilters, options, new Configuration(hadoopConf)) -//file reader for reading a hudi base file that needs to be merged with log files +// File reader for reading a Hoodie base file that needs to be merged with log files val preMergeBaseFileReader = if (isMOR) { // Add support for reading files using inline file system. - super.buildReaderWithPartitionValues(sparkSession, dataSchema, partitionSchema, requiredSchemaWithMandatory, -if (shouldUseRecordPosition) requiredFilters else recordKeyRelatedFilters ++ requiredFilters, -options, new Configuration(hadoopConf)) + val appliedRequiredSchema = sparkAdapter.appendRowIndexColumnForParquetFileReader(requiredSchemaWithMandatory, shouldUseRecordPosition) Review Comment: Could you add a test to validate the logic? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6207] spark support bucket index query for table with bucket index [hudi]
danny0405 commented on code in PR #10191: URL: https://github.com/apache/hudi/pull/10191#discussion_r1410092117 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/BucketIndexSupport.scala: ## @@ -0,0 +1,164 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi + +import org.apache.hadoop.fs.FileStatus +import org.apache.hudi.common.config.HoodieMetadataConfig +import org.apache.hudi.common.fs.FSUtils +import org.apache.hudi.common.table.HoodieTableConfig +import org.apache.hudi.config.HoodieIndexConfig +import org.apache.hudi.index.HoodieIndex +import org.apache.hudi.index.HoodieIndex.IndexType +import org.apache.hudi.index.bucket.BucketIdentifier +import org.apache.log4j.LogManager +import org.apache.spark.sql.catalyst.expressions +import org.apache.spark.sql.catalyst.expressions.{And, Attribute, EmptyRow, Expression, Literal} +import org.apache.spark.sql.types.{DoubleType, FloatType} +import org.apache.spark.util.collection.BitSet + +import scala.collection.{JavaConverters, mutable} + +class BucketIndexSupport(metadataConfig: HoodieMetadataConfig) { + + private val log = LogManager.getLogger(getClass); + + /** + * Returns the configured bucket field for the table + */ + private def getBucketHashField: Option[String] = { +val bucketHashFields = metadataConfig.getString(HoodieIndexConfig.BUCKET_INDEX_HASH_FIELD) +if (bucketHashFields == null) { + val recordKeys = metadataConfig.getString(HoodieTableConfig.RECORDKEY_FIELDS) + if (recordKeys == null) { +Option.apply(null) + } else { +val recordKeyArray = recordKeys.split(",") +if (recordKeyArray.length == 1) { + Option.apply(recordKeyArray(0)) +} else { + log.warn("bucket query index only support one bucket field") + Option.apply(null) +} + } +} else { + val fields = bucketHashFields.split(",") + if (fields.length == 1) { +Option.apply(fields(0)) + } else { +log.warn("bucket query index only support one bucket field") +Option.apply(null) + } +} + } + + def getCandidateFiles(allFiles: Seq[FileStatus], bucketIds: BitSet): Set[String] = { +val candidateFiles: mutable.Set[String] = mutable.Set.empty +for (file <- allFiles) { + val fileId = FSUtils.getFileIdFromFilePath(file.getPath) + val fileBucketId = BucketIdentifier.bucketIdFromFileId(fileId) + if (bucketIds.get(fileBucketId)) { +candidateFiles += file.getPath.getName + } +} +candidateFiles.toSet + } + + def filterQueriesWithBucketHashField(queryFilters: Seq[Expression]): Option[BitSet] = { +val bucketNumber = metadataConfig.getInt(HoodieIndexConfig.BUCKET_INDEX_NUM_BUCKETS) +val bucketHashFieldOpt = getBucketHashField +if (bucketHashFieldOpt.isEmpty || queryFilters.isEmpty) { + None +} else { + val matchedBuckets = getExpressionBuckets(queryFilters.reduce(And), bucketHashFieldOpt.get, bucketNumber) + + val numBucketsSelected = matchedBuckets.cardinality() + + // None means all the buckets need to be scanned + if (numBucketsSelected == bucketNumber) { +log.info("bucket query match all file slice, fallback other index") +None + } else { +Some(matchedBuckets) + } +} + } + + private def getExpressionBuckets(expr: Expression, bucketColumnName: String, numBuckets: Int): BitSet = { + +def getBucketNumber(attr: Attribute, v: Any): Int = { + BucketIdentifier.getBucketId(JavaConverters.seqAsJavaListConverter(List.apply(String.valueOf(v))).asJava, numBuckets) Review Comment: Maybe you just take a reference of the Flink implementation: https://github.com/apache/hudi/blob/master/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/prune/PrimaryKeyPruners.java -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7161] Add commit action type and extra metadata to write callback on commit message [hudi]
hudi-bot commented on PR #10213: URL: https://github.com/apache/hudi/pull/10213#issuecomment-1833018200 ## CI report: * 3ac05bdf864a129a74110e1ddacf1f0c8a85 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21234) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7137] Implement Bootstrap for new FG reader [hudi]
hudi-bot commented on PR #10137: URL: https://github.com/apache/hudi/pull/10137#issuecomment-1833017794 ## CI report: * 77205b47c45501a0d9de1ebc74d5bb8c960cd95a UNKNOWN * 31fe075c72fb189b9155e48ab3399e9199cc293a Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21231) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6207] spark support bucket index query for table with bucket index [hudi]
danny0405 commented on code in PR #10191: URL: https://github.com/apache/hudi/pull/10191#discussion_r1410090727 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/BucketIndexSupport.scala: ## @@ -0,0 +1,164 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi + +import org.apache.hadoop.fs.FileStatus +import org.apache.hudi.common.config.HoodieMetadataConfig +import org.apache.hudi.common.fs.FSUtils +import org.apache.hudi.common.table.HoodieTableConfig +import org.apache.hudi.config.HoodieIndexConfig +import org.apache.hudi.index.HoodieIndex +import org.apache.hudi.index.HoodieIndex.IndexType +import org.apache.hudi.index.bucket.BucketIdentifier +import org.apache.log4j.LogManager +import org.apache.spark.sql.catalyst.expressions +import org.apache.spark.sql.catalyst.expressions.{And, Attribute, EmptyRow, Expression, Literal} +import org.apache.spark.sql.types.{DoubleType, FloatType} +import org.apache.spark.util.collection.BitSet + +import scala.collection.{JavaConverters, mutable} + +class BucketIndexSupport(metadataConfig: HoodieMetadataConfig) { + + private val log = LogManager.getLogger(getClass); + + /** + * Returns the configured bucket field for the table + */ + private def getBucketHashField: Option[String] = { +val bucketHashFields = metadataConfig.getString(HoodieIndexConfig.BUCKET_INDEX_HASH_FIELD) +if (bucketHashFields == null) { + val recordKeys = metadataConfig.getString(HoodieTableConfig.RECORDKEY_FIELDS) + if (recordKeys == null) { +Option.apply(null) + } else { +val recordKeyArray = recordKeys.split(",") +if (recordKeyArray.length == 1) { + Option.apply(recordKeyArray(0)) +} else { + log.warn("bucket query index only support one bucket field") + Option.apply(null) +} + } +} else { + val fields = bucketHashFields.split(",") + if (fields.length == 1) { +Option.apply(fields(0)) + } else { +log.warn("bucket query index only support one bucket field") +Option.apply(null) + } +} + } + + def getCandidateFiles(allFiles: Seq[FileStatus], bucketIds: BitSet): Set[String] = { +val candidateFiles: mutable.Set[String] = mutable.Set.empty +for (file <- allFiles) { + val fileId = FSUtils.getFileIdFromFilePath(file.getPath) + val fileBucketId = BucketIdentifier.bucketIdFromFileId(fileId) + if (bucketIds.get(fileBucketId)) { +candidateFiles += file.getPath.getName + } +} +candidateFiles.toSet + } + + def filterQueriesWithBucketHashField(queryFilters: Seq[Expression]): Option[BitSet] = { +val bucketNumber = metadataConfig.getInt(HoodieIndexConfig.BUCKET_INDEX_NUM_BUCKETS) +val bucketHashFieldOpt = getBucketHashField +if (bucketHashFieldOpt.isEmpty || queryFilters.isEmpty) { + None +} else { + val matchedBuckets = getExpressionBuckets(queryFilters.reduce(And), bucketHashFieldOpt.get, bucketNumber) + + val numBucketsSelected = matchedBuckets.cardinality() + + // None means all the buckets need to be scanned + if (numBucketsSelected == bucketNumber) { +log.info("bucket query match all file slice, fallback other index") Review Comment: file slice is an internal notion, maybe you should say `The query predicates does not specify equality for all the hasing fields, ...` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6207] spark support bucket index query for table with bucket index [hudi]
danny0405 commented on code in PR #10191: URL: https://github.com/apache/hudi/pull/10191#discussion_r1410089268 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieFileIndex.scala: ## @@ -340,9 +347,18 @@ case class HoodieFileIndex(spark: SparkSession, // and candidate files are obtained from these file slices. lazy val queryReferencedColumns = collectReferencedColumns(spark, queryFilters, schema) - +// bucket query index +var bucketIds = Option.empty[BitSet] +if (bucketIndex.isIndexAvailable && isDataSkippingEnabled) { + bucketIds = bucketIndex.filterQueriesWithBucketHashField(queryFilters) +} +// record index lazy val (_, recordKeys) = recordLevelIndex.filterQueriesWithRecordKey(queryFilters) -if (!isMetadataTableEnabled || !isDataSkippingEnabled) { + +// index chose +if (bucketIndex.isIndexAvailable && bucketIds.isDefined && bucketIds.get.cardinality() > 0) { + Option.apply(bucketIndex.getCandidateFiles(allBaseFiles, bucketIds.get)) Review Comment: We are just doing two level of pruning/skipping here: 1. file group skipping with bucket index; (so that the overall candicates was pruned before next step) 2. file skipping within a file group These two steps should be othogonal and we could have both, maybe RLI does not make sense when hash keys equals primary keys, but when hash keys are sub-set of record keys, we can still have the gains. And if there are some other predicates like max/min from the column stats, we can even skip a very special file then. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7077) Re-enable tests in TestSparkDataSource
[ https://issues.apache.org/jira/browse/HUDI-7077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7077: - Labels: pull-request-available (was: ) > Re-enable tests in TestSparkDataSource > -- > > Key: HUDI-7077 > URL: https://issues.apache.org/jira/browse/HUDI-7077 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Priority: Blocker > Labels: pull-request-available > Fix For: 1.0.0 > > > In CI, TestSparkDataSource causes the job to fail due to memory issue but > locally the tests run fine. The tests are disabled in TestSparkDataSource > temporarily. We need to re-enable them. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7077] Fix OOM error for a test [hudi]
linliu-code opened a new pull request, #10216: URL: https://github.com/apache/hudi/pull/10216 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write none, low medium or high below) _If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated (5b5d7465c7b -> d1c4ead8a80)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 5b5d7465c7b [Minor] Remove useless ')' for ConfigProperty.toString (#10208) add d1c4ead8a80 [HUDI-7128][FOLLOW-UP] support metadatadelete with batch mode (#10210) No new revisions were added by this update. Summary of changes: .../procedures/DeleteMetadataTableProcedure.scala | 22 +--- .../sql/hudi/procedure/TestMetadataProcedure.scala | 58 ++ 2 files changed, 72 insertions(+), 8 deletions(-)
Re: [PR] [HUDI-7128][FOLLOW-UP] Support metadatadelete with batch mode [hudi]
danny0405 merged PR #10210: URL: https://github.com/apache/hudi/pull/10210 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Remove useless ) for ConfigProperty.toString [hudi]
danny0405 merged PR #10208: URL: https://github.com/apache/hudi/pull/10208 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Remove useless ) for ConfigProperty.toString [hudi]
danny0405 commented on PR #10208: URL: https://github.com/apache/hudi/pull/10208#issuecomment-1833002690 Test passed: https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=21216&view=results -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated: [Minor] Remove useless ')' for ConfigProperty.toString (#10208)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 5b5d7465c7b [Minor] Remove useless ')' for ConfigProperty.toString (#10208) 5b5d7465c7b is described below commit 5b5d7465c7b8f873fb6aabedd8846221b2709fa1 Author: hehuiyuan <471627...@qq.com> AuthorDate: Thu Nov 30 10:24:17 2023 +0800 [Minor] Remove useless ')' for ConfigProperty.toString (#10208) --- .../src/main/java/org/apache/hudi/common/config/ConfigProperty.java | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hudi-common/src/main/java/org/apache/hudi/common/config/ConfigProperty.java b/hudi-common/src/main/java/org/apache/hudi/common/config/ConfigProperty.java index d4ed193a041..aa2cf642309 100644 --- a/hudi-common/src/main/java/org/apache/hudi/common/config/ConfigProperty.java +++ b/hudi-common/src/main/java/org/apache/hudi/common/config/ConfigProperty.java @@ -233,7 +233,7 @@ public class ConfigProperty implements Serializable { @Override public String toString() { return String.format( -"Key: '%s' , default: %s , isAdvanced: %s , description: %s since version: %s deprecated after: %s)", +"Key: '%s' , default: %s , isAdvanced: %s , description: %s since version: %s deprecated after: %s", key, defaultValue, advanced, doc, sinceVersion.isPresent() ? sinceVersion.get() : "version is not defined", deprecatedVersion.isPresent() ? deprecatedVersion.get() : "version is not defined"); }
Re: [PR] [HUDI-7128][FOLLOW-UP] Support metadatadelete with batch mode [hudi]
xuzifu666 commented on PR #10210: URL: https://github.com/apache/hudi/pull/10210#issuecomment-1832983093 cc @danny0405 PTAL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7125] Fix bugs for CDC queries [hudi]
hudi-bot commented on PR #10144: URL: https://github.com/apache/hudi/pull/10144#issuecomment-1832972393 ## CI report: * 3e63db8a1620a25197071d21714d06144f1fbb04 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21237) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7137] Implement Bootstrap for new FG reader [hudi]
hudi-bot commented on PR #10137: URL: https://github.com/apache/hudi/pull/10137#issuecomment-1832972346 ## CI report: * 77205b47c45501a0d9de1ebc74d5bb8c960cd95a UNKNOWN * 747894399d3bef0e05c561d9c67db61ab2536cf9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21228) * 31fe075c72fb189b9155e48ab3399e9199cc293a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21231) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Allow concurrent modification for heartbeat map [hudi]
hudi-bot commented on PR #10215: URL: https://github.com/apache/hudi/pull/10215#issuecomment-1832935708 ## CI report: * bd5d820f323c66fbcf7492c61d23585a581e76cc Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21238) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]
hudi-bot commented on PR #10185: URL: https://github.com/apache/hudi/pull/10185#issuecomment-1832935577 ## CI report: * 72201eb9e3ee19dc3e2cd815bc035af8f435b98f UNKNOWN * 66d442d8f652fbd5251dabee5f2c141dbae19821 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21236) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7125] Fix bugs for CDC queries [hudi]
hudi-bot commented on PR #10144: URL: https://github.com/apache/hudi/pull/10144#issuecomment-1832935446 ## CI report: * 847fee8e1ce7b0e2d9af6dadbc802f4d67f06ee7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21171) * 3e63db8a1620a25197071d21714d06144f1fbb04 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21237) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Allow concurrent modification for heartbeat map [hudi]
hudi-bot commented on PR #10215: URL: https://github.com/apache/hudi/pull/10215#issuecomment-1832930088 ## CI report: * bd5d820f323c66fbcf7492c61d23585a581e76cc UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]
hudi-bot commented on PR #10185: URL: https://github.com/apache/hudi/pull/10185#issuecomment-1832929954 ## CI report: * 72201eb9e3ee19dc3e2cd815bc035af8f435b98f UNKNOWN * f7613c8544b014519fe0142a3a42b72fbfc698a3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21235) * 66d442d8f652fbd5251dabee5f2c141dbae19821 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7125] Fix bugs for CDC queries [hudi]
hudi-bot commented on PR #10144: URL: https://github.com/apache/hudi/pull/10144#issuecomment-1832929841 ## CI report: * 847fee8e1ce7b0e2d9af6dadbc802f4d67f06ee7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21171) * 3e63db8a1620a25197071d21714d06144f1fbb04 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]
hudi-bot commented on PR #10185: URL: https://github.com/apache/hudi/pull/10185#issuecomment-1832923838 ## CI report: * 72201eb9e3ee19dc3e2cd815bc035af8f435b98f UNKNOWN * d054e55f468fcf6ad312f6d4c4100e69f7554715 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21232) * f7613c8544b014519fe0142a3a42b72fbfc698a3 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7059] Hudi filter pushdown for positional merging [hudi]
linliu-code commented on PR #10167: URL: https://github.com/apache/hudi/pull/10167#issuecomment-1832916848 @codope -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [MINOR] Allow concurrent modification for heartbeat map [hudi]
linliu-code opened a new pull request, #10215: URL: https://github.com/apache/hudi/pull/10215 ### Change Logs Previously we see the ConcurrentModificationException exception. ### Impact 1. Make the test less flaky. 2. More robust in prod. ### Risk level (write none, low medium or high below) Low. ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6379] [DO NOT MERGE] Pulsar version change to fix snakeyaml CVE [hudi]
CTTY commented on PR #8973: URL: https://github.com/apache/hudi/pull/8973#issuecomment-1832888434 I assume this is no longer needed since we have #9670 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7162) RDD's Don't cache in some situations with new filegroup reader + new parquet file format
Jonathan Vexler created HUDI-7162: - Summary: RDD's Don't cache in some situations with new filegroup reader + new parquet file format Key: HUDI-7162 URL: https://issues.apache.org/jira/browse/HUDI-7162 Project: Apache Hudi Issue Type: Bug Components: spark, spark-sql Reporter: Jonathan Vexler "Test Call rollback_to_instant Procedure with refreshTable" Fails if a projection is added to the query plan. The test does not currently fail, because we don't do the project for non-partitioned tables. Adding the projection prevents the rdd from being cached. Query plans: without projection, caching works: {code:java} == Parsed Logical Plan =='Project ['id]+- SubqueryAlias spark_catalog.default.h0 +- Relation default.h0[_hoodie_commit_time#547,_hoodie_commit_seqno#548,_hoodie_record_key#549,_hoodie_partition_path#550,_hoodie_file_name#551,id#552,name#553,price#554,ts#555L] parquet == Analyzed Logical Plan ==id: intProject [id#552]+- SubqueryAlias spark_catalog.default.h0 +- Relation default.h0[_hoodie_commit_time#547,_hoodie_commit_seqno#548,_hoodie_record_key#549,_hoodie_partition_path#550,_hoodie_file_name#551,id#552,name#553,price#554,ts#555L] parquet == Optimized Logical Plan ==InMemoryRelation [id#552], StorageLevel(disk, memory, deserialized, 1 replicas) +- *(1) ColumnarToRow +- FileScan parquet default.h0[id#552] Batched: true, DataFilters: [], Format: Parquet, Location: HoodieFileIndex(1 paths)[file:/private/var/folders/d0/l7mfhzl1661byhh3mbyg5fv0gn/T/spark-87b3..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct == Physical Plan ==InMemoryTableScan [id#552] +- InMemoryRelation [id#552], StorageLevel(disk, memory, deserialized, 1 replicas) +- *(1) ColumnarToRow+- FileScan parquet default.h0[id#552] Batched: true, DataFilters: [], Format: Parquet, Location: HoodieFileIndex(1 paths)[file:/private/var/folders/d0/l7mfhzl1661byhh3mbyg5fv0gn/T/spark-87b3..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct {code} With projection, no caching: {code:java} == Parsed Logical Plan =='Project ['id]+- SubqueryAlias spark_catalog.default.h0 +- Relation default.h0[_hoodie_commit_time#539,_hoodie_commit_seqno#540,_hoodie_record_key#541,_hoodie_partition_path#542,_hoodie_file_name#543,id#544,name#545,price#546,ts#547L] parquet == Analyzed Logical Plan ==id: intProject [id#544]+- SubqueryAlias spark_catalog.default.h0 +- Relation default.h0[_hoodie_commit_time#539,_hoodie_commit_seqno#540,_hoodie_record_key#541,_hoodie_partition_path#542,_hoodie_file_name#543,id#544,name#545,price#546,ts#547L] parquet == Optimized Logical Plan ==Project [id#544]+- Relation default.h0[_hoodie_commit_time#539,_hoodie_commit_seqno#540,_hoodie_record_key#541,_hoodie_partition_path#542,_hoodie_file_name#543,id#544,name#545,price#546,ts#547L] parquet == Physical Plan ==*(1) ColumnarToRow+- FileScan parquet default.h0[id#544] Batched: true, DataFilters: [], Format: Parquet, Location: HoodieFileIndex(1 paths)[file:/private/var/folders/d0/l7mfhzl1661byhh3mbyg5fv0gn/T/spark-8c60..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7161] Add commit action type and extra metadata to write callback on commit message [hudi]
hudi-bot commented on PR #10213: URL: https://github.com/apache/hudi/pull/10213#issuecomment-1832885296 ## CI report: * 3ac05bdf864a129a74110e1ddacf1f0c8a85 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21234) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6953] Adding test for composite keys with bulk insert row writer [hudi]
hudi-bot commented on PR #10214: URL: https://github.com/apache/hudi/pull/10214#issuecomment-1832885343 ## CI report: * 0ee77f22a2f213a1c581e443a52eb6965832abc4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21233) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]
hudi-bot commented on PR #10185: URL: https://github.com/apache/hudi/pull/10185#issuecomment-1832885165 ## CI report: * 72201eb9e3ee19dc3e2cd815bc035af8f435b98f UNKNOWN * d054e55f468fcf6ad312f6d4c4100e69f7554715 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21232) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7137] Implement Bootstrap for new FG reader [hudi]
hudi-bot commented on PR #10137: URL: https://github.com/apache/hudi/pull/10137#issuecomment-1832884937 ## CI report: * 77205b47c45501a0d9de1ebc74d5bb8c960cd95a UNKNOWN * dfa3bdee07f850efbacb55ecc84637339a953423 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21225) * 747894399d3bef0e05c561d9c67db61ab2536cf9 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21228) * 31fe075c72fb189b9155e48ab3399e9199cc293a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21231) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7137] Implement Bootstrap for new FG reader [hudi]
hudi-bot commented on PR #10137: URL: https://github.com/apache/hudi/pull/10137#issuecomment-1832880118 ## CI report: * 77205b47c45501a0d9de1ebc74d5bb8c960cd95a UNKNOWN * dfa3bdee07f850efbacb55ecc84637339a953423 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21225) * 747894399d3bef0e05c561d9c67db61ab2536cf9 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21228) * 31fe075c72fb189b9155e48ab3399e9199cc293a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]
hudi-bot commented on PR #10185: URL: https://github.com/apache/hudi/pull/10185#issuecomment-1832880229 ## CI report: * 72201eb9e3ee19dc3e2cd815bc035af8f435b98f UNKNOWN * 9a3b347de974d626fdc52a5aafb06c5d2ec45cbd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21229) * d054e55f468fcf6ad312f6d4c4100e69f7554715 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6953] Adding test for composite keys with bulk insert row writer [hudi]
hudi-bot commented on PR #10214: URL: https://github.com/apache/hudi/pull/10214#issuecomment-1832880348 ## CI report: * 0ee77f22a2f213a1c581e443a52eb6965832abc4 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7161] Add commit action type and extra metadata to write callback on commit message [hudi]
hudi-bot commented on PR #10213: URL: https://github.com/apache/hudi/pull/10213#issuecomment-1832880320 ## CI report: * 3ac05bdf864a129a74110e1ddacf1f0c8a85 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7160] Copy over schema properties when adding Hudi Metadata fields [hudi]
hudi-bot commented on PR #10212: URL: https://github.com/apache/hudi/pull/10212#issuecomment-1832874118 ## CI report: * cfdccba5615427da35d9cba25a3867345f46265d Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21226) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-6933) bulk_insert Fails if one of the composite key contains null
[ https://issues.apache.org/jira/browse/HUDI-6933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17791365#comment-17791365 ] sivabalan narayanan commented on HUDI-6933: --- https://github.com/apache/hudi/pull/10214 > bulk_insert Fails if one of the composite key contains null > --- > > Key: HUDI-6933 > URL: https://issues.apache.org/jira/browse/HUDI-6933 > Project: Apache Hudi > Issue Type: Bug > Components: writer-core >Reporter: Aditya Goenka >Assignee: sivabalan narayanan >Priority: Critical > Fix For: 0.14.1 > > > Github Issue- [https://github.com/apache/hudi/issues/9799] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-6953] Adding test for composite keys with bulk insert row writer [hudi]
nsivabalan opened a new pull request, #10214: URL: https://github.com/apache/hudi/pull/10214 ### Change Logs Adding test for composite keys with bulk insert row writer ### Impact Improve test coverage ### Risk level (write none, low medium or high below) none ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-6933) bulk_insert Fails if one of the composite key contains null
[ https://issues.apache.org/jira/browse/HUDI-6933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-6933: - Assignee: sivabalan narayanan > bulk_insert Fails if one of the composite key contains null > --- > > Key: HUDI-6933 > URL: https://issues.apache.org/jira/browse/HUDI-6933 > Project: Apache Hudi > Issue Type: Bug > Components: writer-core >Reporter: Aditya Goenka >Assignee: sivabalan narayanan >Priority: Critical > Fix For: 0.14.1 > > > Github Issue- [https://github.com/apache/hudi/issues/9799] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7161) Add commit action type and ext ra metadata to write callback on commit message
[ https://issues.apache.org/jira/browse/HUDI-7161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7161: - Labels: pull-request-available (was: ) > Add commit action type and ext ra metadata to write callback on commit message > -- > > Key: HUDI-7161 > URL: https://issues.apache.org/jira/browse/HUDI-7161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Rajesh Mahindra >Assignee: Rajesh Mahindra >Priority: Major > Labels: pull-request-available > > Add commit action type and ext ra metadata to write callback on commit message -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7161] Add commit action type and extra metadata to write callback on commit message [hudi]
rmahindra123 opened a new pull request, #10213: URL: https://github.com/apache/hudi/pull/10213 ### Change Logs Add commit action type and extra metadata to write callback on commit message ### Impact No impact on the commit callback API ### Risk level (write none, low medium or high below) low to medium ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7161) Add commit action type and ext ra metadata to write callback on commit message
Rajesh Mahindra created HUDI-7161: - Summary: Add commit action type and ext ra metadata to write callback on commit message Key: HUDI-7161 URL: https://issues.apache.org/jira/browse/HUDI-7161 Project: Apache Hudi Issue Type: Improvement Reporter: Rajesh Mahindra Add commit action type and ext ra metadata to write callback on commit message -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-7161) Add commit action type and ext ra metadata to write callback on commit message
[ https://issues.apache.org/jira/browse/HUDI-7161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Mahindra reassigned HUDI-7161: - Assignee: Rajesh Mahindra > Add commit action type and ext ra metadata to write callback on commit message > -- > > Key: HUDI-7161 > URL: https://issues.apache.org/jira/browse/HUDI-7161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Rajesh Mahindra >Assignee: Rajesh Mahindra >Priority: Major > > Add commit action type and ext ra metadata to write callback on commit message -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]
hudi-bot commented on PR #10185: URL: https://github.com/apache/hudi/pull/10185#issuecomment-1832836648 ## CI report: * 72201eb9e3ee19dc3e2cd815bc035af8f435b98f UNKNOWN * 9a3b347de974d626fdc52a5aafb06c5d2ec45cbd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21229) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7137] Implement Bootstrap for new FG reader [hudi]
hudi-bot commented on PR #10137: URL: https://github.com/apache/hudi/pull/10137#issuecomment-1832836486 ## CI report: * 77205b47c45501a0d9de1ebc74d5bb8c960cd95a UNKNOWN * dfa3bdee07f850efbacb55ecc84637339a953423 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21225) * 747894399d3bef0e05c561d9c67db61ab2536cf9 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21228) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]
hudi-bot commented on PR #10185: URL: https://github.com/apache/hudi/pull/10185#issuecomment-1832827368 ## CI report: * 72201eb9e3ee19dc3e2cd815bc035af8f435b98f UNKNOWN * dd88a687f7a95799cc4da6e71809c679cdf91673 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21200) * 9a3b347de974d626fdc52a5aafb06c5d2ec45cbd UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7137] Implement Bootstrap for new FG reader [hudi]
hudi-bot commented on PR #10137: URL: https://github.com/apache/hudi/pull/10137#issuecomment-1832827213 ## CI report: * 77205b47c45501a0d9de1ebc74d5bb8c960cd95a UNKNOWN * dfa3bdee07f850efbacb55ecc84637339a953423 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21225) * 747894399d3bef0e05c561d9c67db61ab2536cf9 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7137] Implement Bootstrap for new FG reader [hudi]
linliu-code commented on code in PR #10137: URL: https://github.com/apache/hudi/pull/10137#discussion_r1409955152 ## hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRowReaderContext.scala: ## @@ -77,14 +78,18 @@ class SparkFileFormatInternalRowReaderContext(baseFileReader: Option[Partitioned } }).asInstanceOf[ClosableIterator[InternalRow]] } else { - if (baseFileReader.isEmpty) { -throw new IllegalArgumentException("Base file reader is missing when instantiating " - + "SparkFileFormatInternalRowReaderContext."); + val key = generateKey(dataSchema, requiredSchema) + if (!readerMaps.contains(key)) { +throw new IllegalStateException("schemas don't hash to a known reader") } - new CloseableInternalRowIterator(baseFileReader.get.apply(fileInfo)) + new CloseableInternalRowIterator(readerMaps(key).apply(fileInfo)) } } + private def generateKey(dataSchema: Schema, requestedSchema: Schema): Long = { Review Comment: Hi Jon, i really feel that if you can split this PR into smaller PRs, that would be much easier for reviewers to understand and easier for the CI to pass. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6658] inject filters for incremental query [hudi]
hudi-bot commented on PR #10063: URL: https://github.com/apache/hudi/pull/10063#issuecomment-1832819218 ## CI report: * edb9997799c672e69a5a81271f32504e270846d2 UNKNOWN * e1999c6bb70849aa29723415791abac9879eff12 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21227) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-7139) Fix operation type for bulk insert with row writer in Hudi Streamer
[ https://issues.apache.org/jira/browse/HUDI-7139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan closed HUDI-7139. - Resolution: Fixed > Fix operation type for bulk insert with row writer in Hudi Streamer > --- > > Key: HUDI-7139 > URL: https://issues.apache.org/jira/browse/HUDI-7139 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Labels: pull-request-available > Fix For: 0.14.1 > > > {code:java} > "operationType" : null {code} > The operationType is null in the commit metadata of bulk insert operation > with row writer enabled in Hudi Streamer > (hoodie.streamer.write.row.writer.enable=true). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HUDI-7139) Fix operation type for bulk insert with row writer in Hudi Streamer
[ https://issues.apache.org/jira/browse/HUDI-7139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17791352#comment-17791352 ] sivabalan narayanan commented on HUDI-7139: --- fixed in master [https://github.com/apache/hudi/commit/4f875edaecd495eaa8996fa8d81c102a971c599f] > Fix operation type for bulk insert with row writer in Hudi Streamer > --- > > Key: HUDI-7139 > URL: https://issues.apache.org/jira/browse/HUDI-7139 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Labels: pull-request-available > Fix For: 0.14.1 > > > {code:java} > "operationType" : null {code} > The operationType is null in the commit metadata of bulk insert operation > with row writer enabled in Hudi Streamer > (hoodie.streamer.write.row.writer.enable=true). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-7155) Add log to print wrong number of instant metadata files
[ https://issues.apache.org/jira/browse/HUDI-7155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan closed HUDI-7155. - Resolution: Fixed > Add log to print wrong number of instant metadata files > --- > > Key: HUDI-7155 > URL: https://issues.apache.org/jira/browse/HUDI-7155 > Project: Apache Hudi > Issue Type: Improvement > Components: archiving >Reporter: zhuanshenbsj1 >Priority: Minor > Labels: pull-request-available > Fix For: 0.14.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HUDI-7155) Add log to print wrong number of instant metadata files
[ https://issues.apache.org/jira/browse/HUDI-7155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17791350#comment-17791350 ] sivabalan narayanan commented on HUDI-7155: --- fixed in master [https://github.com/apache/hudi/commit/817d81ad14f930c4744ff229640003fe7715b20c] > Add log to print wrong number of instant metadata files > --- > > Key: HUDI-7155 > URL: https://issues.apache.org/jira/browse/HUDI-7155 > Project: Apache Hudi > Issue Type: Improvement > Components: archiving >Reporter: zhuanshenbsj1 >Priority: Minor > Labels: pull-request-available > Fix For: 0.14.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7137] Implement Bootstrap for new FG reader [hudi]
hudi-bot commented on PR #10137: URL: https://github.com/apache/hudi/pull/10137#issuecomment-1832765012 ## CI report: * 77205b47c45501a0d9de1ebc74d5bb8c960cd95a UNKNOWN * dfa3bdee07f850efbacb55ecc84637339a953423 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21225) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6658] inject filters for incremental query [hudi]
hudi-bot commented on PR #10063: URL: https://github.com/apache/hudi/pull/10063#issuecomment-1832764871 ## CI report: * edb9997799c672e69a5a81271f32504e270846d2 UNKNOWN * 34efaac278dde7fd73515e6d54418a6ff8815326 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20939) * e1999c6bb70849aa29723415791abac9879eff12 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21227) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6658] inject filters for incremental query [hudi]
hudi-bot commented on PR #10063: URL: https://github.com/apache/hudi/pull/10063#issuecomment-1832754560 ## CI report: * edb9997799c672e69a5a81271f32504e270846d2 UNKNOWN * 34efaac278dde7fd73515e6d54418a6ff8815326 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20939) * e1999c6bb70849aa29723415791abac9879eff12 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7140] [DNM] Trial Patch to test CI run [hudi]
hudi-bot commented on PR #10176: URL: https://github.com/apache/hudi/pull/10176#issuecomment-183274 ## CI report: * 3c894596a90a326707d4aa052e34cf9f09daae75 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21224) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7160] Copy over schema properties when adding Hudi Metadata fields [hudi]
hudi-bot commented on PR #10212: URL: https://github.com/apache/hudi/pull/10212#issuecomment-1832693020 ## CI report: * c106dd446a9ea4ec82cc00285c6b099c50555bfd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21222) * cfdccba5615427da35d9cba25a3867345f46265d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21226) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7160] Copy over schema properties when adding Hudi Metadata fields [hudi]
hudi-bot commented on PR #10212: URL: https://github.com/apache/hudi/pull/10212#issuecomment-1832682300 ## CI report: * c106dd446a9ea4ec82cc00285c6b099c50555bfd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21222) * cfdccba5615427da35d9cba25a3867345f46265d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7137] Implement Bootstrap for new FG reader [hudi]
hudi-bot commented on PR #10137: URL: https://github.com/apache/hudi/pull/10137#issuecomment-1832681988 ## CI report: * 77205b47c45501a0d9de1ebc74d5bb8c960cd95a UNKNOWN * bfc0a855cadb4f6329bd38a470ade931797c53ab Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21223) * dfa3bdee07f850efbacb55ecc84637339a953423 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21225) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7160] Copy over schema properties when adding Hudi Metadata fields [hudi]
hudi-bot commented on PR #10212: URL: https://github.com/apache/hudi/pull/10212#issuecomment-1832672521 ## CI report: * c106dd446a9ea4ec82cc00285c6b099c50555bfd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21222) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7137] Implement Bootstrap for new FG reader [hudi]
hudi-bot commented on PR #10137: URL: https://github.com/apache/hudi/pull/10137#issuecomment-1832672177 ## CI report: * 77205b47c45501a0d9de1ebc74d5bb8c960cd95a UNKNOWN * bfc0a855cadb4f6329bd38a470ade931797c53ab Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21223) * dfa3bdee07f850efbacb55ecc84637339a953423 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7137] Implement Bootstrap for new FG reader [hudi]
hudi-bot commented on PR #10137: URL: https://github.com/apache/hudi/pull/10137#issuecomment-1832622066 ## CI report: * 77205b47c45501a0d9de1ebc74d5bb8c960cd95a UNKNOWN * b0b711e0c355320da652fa7f2d8669539873d4d6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21196) * bfc0a855cadb4f6329bd38a470ade931797c53ab UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7140] [DNM] Trial Patch to test CI run [hudi]
hudi-bot commented on PR #10176: URL: https://github.com/apache/hudi/pull/10176#issuecomment-1832622253 ## CI report: * 7d8ce155ad5b95f8a26150554a6008cec0ef0653 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21221) * 3c894596a90a326707d4aa052e34cf9f09daae75 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21224) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7160] Copy over schema properties when adding Hudi Metadata fields [hudi]
hudi-bot commented on PR #10212: URL: https://github.com/apache/hudi/pull/10212#issuecomment-1832611516 ## CI report: * c106dd446a9ea4ec82cc00285c6b099c50555bfd Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21222) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org