[jira] [Created] (HUDI-1551) Support Partition with BigDecimal field
Chanh Le created HUDI-1551: -- Summary: Support Partition with BigDecimal field Key: HUDI-1551 URL: https://issues.apache.org/jira/browse/HUDI-1551 Project: Apache Hudi Issue Type: New Feature Components: newbie Reporter: Chanh Le Fix For: 0.7.0 In my data the time indicator field is in BigDecimal -> due to trading data related so need to records in more precision than normal. I would like to add support to partition based on this field type for TimestampBasedKeyGenerator. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] shenh062326 commented on pull request #2382: [HUDI-1477] Support CopyOnWriteTable in java client
shenh062326 commented on pull request #2382: URL: https://github.com/apache/hudi/pull/2382#issuecomment-767297371 > @shenh062326 Thanks for your contribution, would you please add some tests to verify the java client functionally? Add TestJavaCopyOnWriteActionExecutor. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar commented on issue #2013: [SUPPORT] MoR tables SparkDataSource Incremental Querys
vinothchandar commented on issue #2013: URL: https://github.com/apache/hudi/issues/2013#issuecomment-767265499 This is now out in the 0.7.0 release. See https://github.com/apache/hudi/blame/release-0.7.0/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala#L183 this test for examples This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2487: [WIP HUDI-53] Adding Record Level Index based on hoodie backed table
codecov-io edited a comment on pull request #2487: URL: https://github.com/apache/hudi/pull/2487#issuecomment-767228748 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column
nsivabalan commented on issue #1962: URL: https://github.com/apache/hudi/issues/1962#issuecomment-767209175 @bvaradar : guess you missed to follow up on this thread. can you check it out and respond when you can. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #1981: [SUPPORT] Huge performance Difference Between Hudi and Regular Parquet in Athena
nsivabalan commented on issue #1981: URL: https://github.com/apache/hudi/issues/1981#issuecomment-767206596 @vinothchandar @umehrot2 : can either of you respond here wrt metadata support(rfc-15) in Athena. when can we possibly expect. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] jingweiz2017 commented on issue #1971: Schema evoluation causes issue when using kafka source in hudi deltastreamer
jingweiz2017 commented on issue #1971: URL: https://github.com/apache/hudi/issues/1971#issuecomment-767242422 @nsivabalan @bvaradar , thanks for the reply. The commit mentioned by bvaradar should work for me case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] wangxianghu commented on a change in pull request #2431: [HUDI-1526]translate the api partitionBy to hoodie.datasource.write.partitionpath.field
wangxianghu commented on a change in pull request #2431: URL: https://github.com/apache/hudi/pull/2431#discussion_r563537637 ## File path: hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala ## @@ -181,16 +183,33 @@ object DataSourceWriteOptions { @Deprecated val DEFAULT_STORAGE_TYPE_OPT_VAL = COW_STORAGE_TYPE_OPT_VAL - def translateStorageTypeToTableType(optParams: Map[String, String]) : Map[String, String] = { + def translateOptParams(optParams: Map[String, String]): Map[String, String] = { +// translate StorageType to TableType +var newOptParams = optParams if (optParams.contains(STORAGE_TYPE_OPT_KEY) && !optParams.contains(TABLE_TYPE_OPT_KEY)) { log.warn(STORAGE_TYPE_OPT_KEY + " is deprecated and will be removed in a later release; Please use " + TABLE_TYPE_OPT_KEY) - optParams ++ Map(TABLE_TYPE_OPT_KEY -> optParams(STORAGE_TYPE_OPT_KEY)) -} else { - optParams + newOptParams = optParams ++ Map(TABLE_TYPE_OPT_KEY -> optParams(STORAGE_TYPE_OPT_KEY)) } +// translate the api partitionBy of spark DataFrameWriter to PARTITIONPATH_FIELD_OPT_KEY +if (optParams.contains(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY) && !optParams.contains(PARTITIONPATH_FIELD_OPT_KEY)) { + val partitionColumns = optParams.get(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY) +.map(SparkDataSourceUtils.decodePartitioningColumns) +.getOrElse(Nil) + + val keyGeneratorClass = optParams.getOrElse(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY, +DataSourceWriteOptions.DEFAULT_KEYGENERATOR_CLASS_OPT_VAL) + val partitionPathField = +keyGeneratorClass match { + case "org.apache.hudi.keygen.CustomKeyGenerator" => +partitionColumns.map(e => s"$e:SIMPLE").mkString(",") Review comment: we can not simply put `SIMPLE` and `partitionBy` field together. Since when user use `CustomKeyGenerator ` and the partitionpath field is of timestamp type, the str after the `partitionBy` field should be `TIMESTAMP` ## File path: hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala ## @@ -181,16 +183,33 @@ object DataSourceWriteOptions { @Deprecated val DEFAULT_STORAGE_TYPE_OPT_VAL = COW_STORAGE_TYPE_OPT_VAL - def translateStorageTypeToTableType(optParams: Map[String, String]) : Map[String, String] = { + def translateOptParams(optParams: Map[String, String]): Map[String, String] = { +// translate StorageType to TableType +var newOptParams = optParams if (optParams.contains(STORAGE_TYPE_OPT_KEY) && !optParams.contains(TABLE_TYPE_OPT_KEY)) { log.warn(STORAGE_TYPE_OPT_KEY + " is deprecated and will be removed in a later release; Please use " + TABLE_TYPE_OPT_KEY) - optParams ++ Map(TABLE_TYPE_OPT_KEY -> optParams(STORAGE_TYPE_OPT_KEY)) -} else { - optParams + newOptParams = optParams ++ Map(TABLE_TYPE_OPT_KEY -> optParams(STORAGE_TYPE_OPT_KEY)) } +// translate the api partitionBy of spark DataFrameWriter to PARTITIONPATH_FIELD_OPT_KEY +if (optParams.contains(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY) && !optParams.contains(PARTITIONPATH_FIELD_OPT_KEY)) { + val partitionColumns = optParams.get(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY) +.map(SparkDataSourceUtils.decodePartitioningColumns) +.getOrElse(Nil) + + val keyGeneratorClass = optParams.getOrElse(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY, +DataSourceWriteOptions.DEFAULT_KEYGENERATOR_CLASS_OPT_VAL) + val partitionPathField = +keyGeneratorClass match { + case "org.apache.hudi.keygen.CustomKeyGenerator" => +partitionColumns.map(e => s"$e:SIMPLE").mkString(",") Review comment: > @wangxianghu Thank you for your review. My opinion is this:In accordance with the habit of using Spark, the partition field value corresponding to partitionBy is the original value, so the default is to use SIMPLE. If we automatically infer whether to use TIMESTAMP based on the field type, the rules are not easy to determine. For example, if a field is long, we Do you need to convert to TIMESTAMP? If you want to convert, but the value is not a timestamp, an error will be reported, so SIMPLE is used by default. If you want to use TIMESTAMP, users can directly use `hoodie.datasource.write.partitionpath. field`Go to specify yes, I get your point. we'd better support both `SIMPLE` and `TIMESTAMP` type patitionpath in a unified way This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan closed issue #1958: [SUPPORT] Global Indexes return old partition value when querying Hive tables
nsivabalan closed issue #1958: URL: https://github.com/apache/hudi/issues/1958 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp
codecov-io edited a comment on pull request #2438: URL: https://github.com/apache/hudi/pull/2438#issuecomment-759677298 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] rubenssoto commented on issue #2484: [SUPPORT] Hudi Write Performance
rubenssoto commented on issue #2484: URL: https://github.com/apache/hudi/issues/2484#issuecomment-767143513 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #1982: [SUPPORT] Not able to write to ADLS Gen2 in Azure Databricks, with error has invalid authority.
nsivabalan commented on issue #1982: URL: https://github.com/apache/hudi/issues/1982#issuecomment-767205667 @Ac-Rush : would you mind update the ticket. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar commented on pull request #2488: 0.7.0 Doc Revamp
vinothchandar commented on pull request #2488: URL: https://github.com/apache/hudi/pull/2488#issuecomment-767158167 I am going to also cut the release versions for the doc, once I finalize everything w.r.t the release. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #1958: [SUPPORT] Global Indexes return old partition value when querying Hive tables
nsivabalan commented on issue #1958: URL: https://github.com/apache/hudi/issues/1958#issuecomment-767210126 https://github.com/apache/hudi/pull/1978 have fixed it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Karl-WangSK commented on pull request #2260: [HUDI-1381] Schedule compaction based on time elapsed
Karl-WangSK commented on pull request #2260: URL: https://github.com/apache/hudi/pull/2260#issuecomment-767261660 cc @yanghua This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a change in pull request #2487: [WIP HUDI-53] Adding Record Level Index based on hoodie backed table
nsivabalan commented on a change in pull request #2487: URL: https://github.com/apache/hudi/pull/2487#discussion_r564142151 ## File path: hudi-common/src/main/java/org/apache/hudi/index/HoodieRecordLevelIndexPayload.java ## @@ -0,0 +1,98 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.index; + +import org.apache.hudi.avro.model.HoodieRecordLevelIndexRecord; +import org.apache.hudi.common.model.HoodieRecordPayload; +import org.apache.hudi.common.util.Option; + +import org.apache.avro.Schema; +import org.apache.avro.generic.GenericRecord; +import org.apache.avro.generic.IndexedRecord; + +import java.io.IOException; + +/** + * Payload used in index table for Hoodie Record level index. + */ +public class HoodieRecordLevelIndexPayload implements HoodieRecordPayload { + + private String key; + private String partitionPath; + private String instantTime; + private String fileId; + + public HoodieRecordLevelIndexPayload(Option record) { +if (record.isPresent()) { + // This can be simplified using SpecificData.deepcopy once this bug is fixed + // https://issues.apache.org/jira/browse/AVRO-1811 + key = record.get().get("key").toString(); + partitionPath = record.get().get("partitionPath").toString(); + instantTime = record.get().get("instantTime").toString(); + fileId = record.get().get("fileId").toString(); +} + } + + private HoodieRecordLevelIndexPayload(String key, String partitionPath, String instantTime, String fileId) { +this.key = key; +this.partitionPath = partitionPath; +this.instantTime = instantTime; +this.fileId = fileId; + } + + @Override + public HoodieRecordLevelIndexPayload preCombine(HoodieRecordLevelIndexPayload another) { +if (this.instantTime.compareTo(another.instantTime) >= 0) { Review comment: Note: this needs some fixing . Can we just convert the string to long and compare. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] rubenssoto closed issue #2484: [SUPPORT] Hudi Write Performance
rubenssoto closed issue #2484: URL: https://github.com/apache/hudi/issues/2484 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io commented on pull request #2486: Filtering abnormal data which the recordKeyField or precombineField is null in avro format
codecov-io commented on pull request #2486: URL: https://github.com/apache/hudi/pull/2486#issuecomment-766863772 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2486?src=pr&el=h1) Report > Merging [#2486](https://codecov.io/gh/apache/hudi/pull/2486?src=pr&el=desc) (5476bf0) into [master](https://codecov.io/gh/apache/hudi/commit/c4afd179c1983a382b8a5197d800b0f5dba254de?el=desc) (c4afd17) will **decrease** coverage by `1.27%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2486/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2486?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#2486 +/- ## - Coverage 50.18% 48.90% -1.28% + Complexity 3050 2155 -895 Files 419 266 -153 Lines 1893112041-6890 Branches 1948 1133 -815 - Hits 9500 5889-3611 + Misses 8656 5715-2941 + Partials775 437 -338 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `37.21% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudicommon | `51.47% <ø> (-0.03%)` | `0.00 <ø> (ø)` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `?` | `?` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2486?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...e/hudi/common/table/log/HoodieLogFormatWriter.java](https://codecov.io/gh/apache/hudi/pull/2486/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGb3JtYXRXcml0ZXIuamF2YQ==) | `78.12% <0.00%> (-1.57%)` | `26.00% <0.00%> (ø%)` | | | [.../hadoop/realtime/RealtimeUnmergedRecordReader.java](https://codecov.io/gh/apache/hudi/pull/2486/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL1JlYWx0aW1lVW5tZXJnZWRSZWNvcmRSZWFkZXIuamF2YQ==) | | | | | [...hudi/utilities/schema/FilebasedSchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/2486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9GaWxlYmFzZWRTY2hlbWFQcm92aWRlci5qYXZh) | | | | | [...ties/exception/HoodieIncrementalPullException.java](https://codecov.io/gh/apache/hudi/pull/2486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2V4Y2VwdGlvbi9Ib29kaWVJbmNyZW1lbnRhbFB1bGxFeGNlcHRpb24uamF2YQ==) | | | | | [...in/java/org/apache/hudi/schema/SchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/2486/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zY2hlbWEvU2NoZW1hUHJvdmlkZXIuamF2YQ==) | | | | | [...udi/utilities/schema/DelegatingSchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/2486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9EZWxlZ2F0aW5nU2NoZW1hUHJvdmlkZXIuamF2YQ==) | | | | | [...adoop/realtime/RealtimeBootstrapBaseFileSplit.java](https://codecov.io/gh/apache/hudi/pull/2486/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL1JlYWx0aW1lQm9vdHN0cmFwQmFzZUZpbGVTcGxpdC5qYXZh) | | | | | [...in/java/org/apache/hudi/hive/HoodieHiveClient.java](https://codecov.io/gh/apache/hudi/pull/2486/diff?src=pr&el=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSG9vZGllSGl2ZUNsaWVudC5qYXZh) | | | | | [...hadoop/realtime/RealtimeCompactedRecordReader.java](https://codecov.io/gh/apache/hudi/pull/2486/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL1JlYWx0aW1lQ29tcGFjdGVkUmVjb3JkUmVhZGVyLmphdmE=) | | | | | [...di/timeline/service/handlers/FileSliceHandler.java](https://codecov.io/gh/apache/hudi/pull/2486/diff?src=pr&el=tree#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvaGFuZGxlcnMvRmlsZVNsaWNlSGFuZGxlci5qYXZh) | | | | | ... and [142 more](https://codecov.io/gh/apache/hudi/pull/2486/diff?src=pr&el=tree-more) | | This is an automated message from the Apa
[GitHub] [hudi] vinothchandar commented on pull request #2442: Adding new configurations in 0.7.0
vinothchandar commented on pull request #2442: URL: https://github.com/apache/hudi/pull/2442#issuecomment-767102394 Will close this and open a new one This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer
codecov-io edited a comment on pull request #2430: URL: https://github.com/apache/hudi/pull/2430#issuecomment-757736411 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar commented on pull request #2485: [HUDI-1109] Support Spark Structured Streaming read from Hudi table
vinothchandar commented on pull request #2485: URL: https://github.com/apache/hudi/pull/2485#issuecomment-766593559 cc @garyli1019 mind taking a first pass at this PR? :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2443: [HUDI-1269] Make whether the failure of connect hive affects hudi ingest process configurable
codecov-io edited a comment on pull request #2443: URL: https://github.com/apache/hudi/pull/2443#issuecomment-760147630 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2486: Filtering abnormal data which the recordKeyField or precombineField is null in avro format
codecov-io edited a comment on pull request #2486: URL: https://github.com/apache/hudi/pull/2486#issuecomment-766863772 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io commented on pull request #2487: [WIP HUDI-53] Adding Record Level Index based on hoodie backed table
codecov-io commented on pull request #2487: URL: https://github.com/apache/hudi/pull/2487#issuecomment-767228748 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2487?src=pr&el=h1) Report > Merging [#2487](https://codecov.io/gh/apache/hudi/pull/2487?src=pr&el=desc) (8b07157) into [master](https://codecov.io/gh/apache/hudi/commit/e302c6bc12c7eb764781898fdee8ee302ef4ec10?el=desc) (e302c6b) will **increase** coverage by `19.24%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2487/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2487?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#2487 +/- ## = + Coverage 50.18% 69.43% +19.24% + Complexity 3050 357 -2693 = Files 419 53 -366 Lines 18931 1930-17001 Branches 1948 230 -1718 = - Hits 9500 1340 -8160 + Misses 8656 456 -8200 + Partials775 134 -641 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `69.43% <ø> (ø)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2487?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...e/hudi/common/engine/HoodieLocalEngineContext.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2VuZ2luZS9Ib29kaWVMb2NhbEVuZ2luZUNvbnRleHQuamF2YQ==) | | | | | [.../org/apache/hudi/MergeOnReadSnapshotRelation.scala](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL01lcmdlT25SZWFkU25hcHNob3RSZWxhdGlvbi5zY2FsYQ==) | | | | | [.../org/apache/hudi/exception/HoodieKeyException.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZUtleUV4Y2VwdGlvbi5qYXZh) | | | | | [.../apache/hudi/common/bloom/BloomFilterTypeCode.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2Jsb29tL0Jsb29tRmlsdGVyVHlwZUNvZGUuamF2YQ==) | | | | | [...able/timeline/versioning/AbstractMigratorBase.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvQWJzdHJhY3RNaWdyYXRvckJhc2UuamF2YQ==) | | | | | [...rc/main/java/org/apache/hudi/cli/HoodiePrompt.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL0hvb2RpZVByb21wdC5qYXZh) | | | | | [.../org/apache/hudi/common/model/HoodieTableType.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZVRhYmxlVHlwZS5qYXZh) | | | | | [.../scala/org/apache/hudi/Spark2RowDeserializer.scala](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3BhcmsyL3NyYy9tYWluL3NjYWxhL29yZy9hcGFjaGUvaHVkaS9TcGFyazJSb3dEZXNlcmlhbGl6ZXIuc2NhbGE=) | | | | | [...hudi/common/table/log/block/HoodieDeleteBlock.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9ibG9jay9Ib29kaWVEZWxldGVCbG9jay5qYXZh) | | | | | [...cala/org/apache/hudi/HoodieBootstrapRelation.scala](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZUJvb3RzdHJhcFJlbGF0aW9uLnNjYWxh) | | | | | ... and [356 more](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree-more) | | This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about
[GitHub] [hudi] nsivabalan commented on issue #2013: [SUPPORT] MoR tables SparkDataSource Incremental Querys
nsivabalan commented on issue #2013: URL: https://github.com/apache/hudi/issues/2013#issuecomment-767204986 @garyli1019 : can you give any updates you have on on this regard. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar closed issue #2013: [SUPPORT] MoR tables SparkDataSource Incremental Querys
vinothchandar closed issue #2013: URL: https://github.com/apache/hudi/issues/2013 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] kirkuz commented on issue #2323: [SUPPORT] GLOBAL_BLOOM index significantly slowing down processing time
kirkuz commented on issue #2323: URL: https://github.com/apache/hudi/issues/2323#issuecomment-766649165 Hi @nsivabalan, I think we can close this issue for now. I've changed from GLOBAL_BLOOM to SIMPLE index with static partition keys, cause GLOBAL_BLOOM was too slow in my use case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar commented on issue #2484: [SUPPORT] Hudi Write Performance
vinothchandar commented on issue #2484: URL: https://github.com/apache/hudi/issues/2484#issuecomment-767154231 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #1971: Schema evoluation causes issue when using kafka source in hudi deltastreamer
nsivabalan commented on issue #1971: URL: https://github.com/apache/hudi/issues/1971#issuecomment-767208636 @jingweiz2017 : can you please check above response and let us know if you need anything more from Hudi community. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar closed pull request #2442: Adding new configurations in 0.7.0
vinothchandar closed pull request #2442: URL: https://github.com/apache/hudi/pull/2442 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] rubenssoto commented on pull request #2283: [HUDI-1415] Incorrect query result for hudi hive table when using spa…
rubenssoto commented on pull request #2283: URL: https://github.com/apache/hudi/pull/2283#issuecomment-767117951 I had the same problem, but I saw less rows not more. Reading with spark datasource I have more than 30 million rows and using spark sql with hive only 4 million. I had this problem only these two options are enabled "spark.sql.hive.convertMetastoreParquet": "false" "spark.hadoop.hoodie.metadata.enable": "true" @pengzhiwei2018 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] pengzhiwei2018 commented on pull request #1880: [WIP] [HUDI-1125] build framework to support structured streaming
pengzhiwei2018 commented on pull request #1880: URL: https://github.com/apache/hudi/pull/1880#issuecomment-766562247 > Hello, > > Hudi will have nice features like clustering and clustering probably will rewrite a lot of data, so is it possible this rewrites without new data doesn't affect downstream consumer of spark structured streaming? > > It is something like delta lake has on compaction operation > > https://docs.delta.io/latest/best-practices.html > > On compaction has .option("dataChange", "false"), so the downstream consumer won't be affected. > > Thank you. Hi @leesf @n3nash @rubenssoto A new PR has proposed at https://github.com/apache/hudi/pull/2485, we can move the discuss there. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] teeyog commented on a change in pull request #2431: [HUDI-1526]translate the api partitionBy to hoodie.datasource.write.partitionpath.field
teeyog commented on a change in pull request #2431: URL: https://github.com/apache/hudi/pull/2431#discussion_r563598187 ## File path: hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala ## @@ -181,16 +183,33 @@ object DataSourceWriteOptions { @Deprecated val DEFAULT_STORAGE_TYPE_OPT_VAL = COW_STORAGE_TYPE_OPT_VAL - def translateStorageTypeToTableType(optParams: Map[String, String]) : Map[String, String] = { + def translateOptParams(optParams: Map[String, String]): Map[String, String] = { +// translate StorageType to TableType +var newOptParams = optParams if (optParams.contains(STORAGE_TYPE_OPT_KEY) && !optParams.contains(TABLE_TYPE_OPT_KEY)) { log.warn(STORAGE_TYPE_OPT_KEY + " is deprecated and will be removed in a later release; Please use " + TABLE_TYPE_OPT_KEY) - optParams ++ Map(TABLE_TYPE_OPT_KEY -> optParams(STORAGE_TYPE_OPT_KEY)) -} else { - optParams + newOptParams = optParams ++ Map(TABLE_TYPE_OPT_KEY -> optParams(STORAGE_TYPE_OPT_KEY)) } +// translate the api partitionBy of spark DataFrameWriter to PARTITIONPATH_FIELD_OPT_KEY +if (optParams.contains(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY) && !optParams.contains(PARTITIONPATH_FIELD_OPT_KEY)) { + val partitionColumns = optParams.get(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY) +.map(SparkDataSourceUtils.decodePartitioningColumns) +.getOrElse(Nil) + + val keyGeneratorClass = optParams.getOrElse(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY, +DataSourceWriteOptions.DEFAULT_KEYGENERATOR_CLASS_OPT_VAL) + val partitionPathField = +keyGeneratorClass match { + case "org.apache.hudi.keygen.CustomKeyGenerator" => +partitionColumns.map(e => s"$e:SIMPLE").mkString(",") Review comment: @wangxianghu Thank you for your review. My opinion is this:In accordance with the habit of using Spark, the partition field value corresponding to partitionBy is the original value, so the default is to use SIMPLE. If we automatically infer whether to use TIMESTAMP based on the field type, the rules are not easy to determine. For example, if a field is long, we Do you need to convert to TIMESTAMP? If you want to convert, but the value is not a timestamp, an error will be reported, so SIMPLE is used by default. If you want to use TIMESTAMP, users can directly use ```hoodie.datasource.write.partitionpath. field```Go to specify ## File path: hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala ## @@ -181,16 +183,33 @@ object DataSourceWriteOptions { @Deprecated val DEFAULT_STORAGE_TYPE_OPT_VAL = COW_STORAGE_TYPE_OPT_VAL - def translateStorageTypeToTableType(optParams: Map[String, String]) : Map[String, String] = { + def translateOptParams(optParams: Map[String, String]): Map[String, String] = { +// translate StorageType to TableType +var newOptParams = optParams if (optParams.contains(STORAGE_TYPE_OPT_KEY) && !optParams.contains(TABLE_TYPE_OPT_KEY)) { log.warn(STORAGE_TYPE_OPT_KEY + " is deprecated and will be removed in a later release; Please use " + TABLE_TYPE_OPT_KEY) - optParams ++ Map(TABLE_TYPE_OPT_KEY -> optParams(STORAGE_TYPE_OPT_KEY)) -} else { - optParams + newOptParams = optParams ++ Map(TABLE_TYPE_OPT_KEY -> optParams(STORAGE_TYPE_OPT_KEY)) } +// translate the api partitionBy of spark DataFrameWriter to PARTITIONPATH_FIELD_OPT_KEY +if (optParams.contains(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY) && !optParams.contains(PARTITIONPATH_FIELD_OPT_KEY)) { + val partitionColumns = optParams.get(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY) +.map(SparkDataSourceUtils.decodePartitioningColumns) +.getOrElse(Nil) + + val keyGeneratorClass = optParams.getOrElse(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY, +DataSourceWriteOptions.DEFAULT_KEYGENERATOR_CLASS_OPT_VAL) + val partitionPathField = +keyGeneratorClass match { + case "org.apache.hudi.keygen.CustomKeyGenerator" => +partitionColumns.map(e => s"$e:SIMPLE").mkString(",") Review comment: Yes, now if the parameters include ```TIMESTAMP_TYPE_FIELD_PROP``` and ```TIMESTAMP_OUTPUT_DATE_FORMAT_PROP```, TIMESTAMP is used by default, otherwise SIMPLE This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar commented on issue #1829: [SUPPORT] S3 slow file listing causes Hudi read performance.
vinothchandar commented on issue #1829: URL: https://github.com/apache/hudi/issues/1829#issuecomment-766590769 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] satishkotha commented on a change in pull request #2483: [HUDI-1545] Add test cases for INSERT_OVERWRITE Operation
satishkotha commented on a change in pull request #2483: URL: https://github.com/apache/hudi/pull/2483#discussion_r563962124 ## File path: hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestCOWDataSource.scala ## @@ -198,6 +198,31 @@ class TestCOWDataSource extends HoodieClientTestBase { .mode(SaveMode.Append) .save(basePath) +val records2 = recordsToStrings(dataGen.generateInserts("002", 5)).toList +val inputDF2 = spark.read.json(spark.sparkContext.parallelize(records2, 2)) +inputDF2.write.format("org.apache.hudi") + .options(commonOpts) + .option(DataSourceWriteOptions.OPERATION_OPT_KEY, DataSourceWriteOptions.INSERT_OVERWRITE_OPERATION_OPT_VAL) + .mode(SaveMode.Append) + .save(basePath) + +val metaClient = new HoodieTableMetaClient(spark.sparkContext.hadoopConfiguration, basePath, true) +val commits = metaClient.getActiveTimeline.filterCompletedInstants().getInstants.toArray + .map(instant => (instant.asInstanceOf[HoodieInstant]).getAction) +assertEquals(2, commits.size) +assertEquals("commit", commits(0)) +assertEquals("replacecommit", commits(1)) Review comment: Hi, Can you also read back the records and verify that only records2 show up. (data in records1 doesnt show up) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar commented on pull request #2111: [HUDI-1234] Insert new records to data files without merging for "Insert" operation.
vinothchandar commented on pull request #2111: URL: https://github.com/apache/hudi/pull/2111#issuecomment-767103157 @nsivabalan I thought we were going to get this in to 0.7.0? checked back again, to see why this was missing This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vburenin commented on pull request #2476: [HUDI-1538] Try to init class trying different signatures instead of checking its name
vburenin commented on pull request #2476: URL: https://github.com/apache/hudi/pull/2476#issuecomment-766947415 Can anybody merge this PR, please? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2431: [HUDI-1526]translate the api partitionBy to hoodie.datasource.write.partitionpath.field
codecov-io edited a comment on pull request #2431: URL: https://github.com/apache/hudi/pull/2431#issuecomment-757929313 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] cadl closed issue #2063: [SUPPORT] change column type from int to long, schema compatibility check failed
cadl closed issue #2063: URL: https://github.com/apache/hudi/issues/2063 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #2204: [SUPPORT] Hive count(*) query on _rt table failing with exception
nsivabalan commented on issue #2204: URL: https://github.com/apache/hudi/issues/2204#issuecomment-766437535 @BalaMahesh : Would you mind updating the ticket. We will close this out in a weeks time if there are no activity. But feel free to re-open or create a new ticket if you have more questions/issues. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #2284: [SUPPORT] : Is there a option to achieve SCD 2 in Hudi?
nsivabalan commented on issue #2284: URL: https://github.com/apache/hudi/issues/2284#issuecomment-766436747 @sanket-khedikar : can you please respond if the suggested approaches work for you. or you still need more enhancements from Hudi? If it's solved, would appreciate if you can close this ticket. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #2330: Concurrent writes from multiple Spark drivers to S3 support
nsivabalan commented on issue #2330: URL: https://github.com/apache/hudi/issues/2330#issuecomment-766433970 @vinothchandar @borislitvak : since we have a tracking jira, do you think we can close this? or is there anything pending to be resolved or discussed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan closed issue #2429: [SUPPORT] S3 throws ConnectionPoolTimeoutException: Timeout waiting for connection from pool when metadata table is turned on
nsivabalan closed issue #2429: URL: https://github.com/apache/hudi/issues/2429 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan merged pull request #2478: [HUDI-1476] Introduce unit test infra for java client
xushiyan merged pull request #2478: URL: https://github.com/apache/hudi/pull/2478 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar merged pull request #2481: [MINOR] Removing spring repos from pom
vinothchandar merged pull request #2481: URL: https://github.com/apache/hudi/pull/2481 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] git-raj commented on issue #2284: [SUPPORT] : Is there a option to achieve SCD 2 in Hudi?
git-raj commented on issue #2284: URL: https://github.com/apache/hudi/issues/2284#issuecomment-766523668 using AWS Glue pySpark and Hudi and S3 as data store: i'm trying to do the traditional SCD Type 2 where old record gets updated with the insert datetime on 'effective to' field, 'isActive' field becomes 'false', and new row is inserted with the insert datetime in 'effective from' field with 'isActive' becoming 'true'. Any solution post, or pointers to solve that if possible is highly appreciated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #2323: [SUPPORT] GLOBAL_BLOOM index significantly slowing down processing time
nsivabalan commented on issue #2323: URL: https://github.com/apache/hudi/issues/2323#issuecomment-766435871 @Kirkuz: Do you have any updates in this regard. Can you please respond or let us know if you have more questions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan closed issue #2480: [SUPPORT] The Docker demo document description is incorrect
nsivabalan closed issue #2480: URL: https://github.com/apache/hudi/issues/2480 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] rubenssoto edited a comment on issue #1829: [SUPPORT] S3 slow file listing causes Hudi read performance.
rubenssoto edited a comment on issue #1829: URL: https://github.com/apache/hudi/issues/1829#issuecomment-766496187 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar commented on issue #2330: Concurrent writes from multiple Spark drivers to S3 support
vinothchandar commented on issue #2330: URL: https://github.com/apache/hudi/issues/2330#issuecomment-766441408 we can close this out This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar commented on issue #2479: [SUPPORT] Dependency Issue When I Try Build Hudi From Source
vinothchandar commented on issue #2479: URL: https://github.com/apache/hudi/issues/2479#issuecomment-766369989 Great. No thank you for catching :). eventually as m2 caches are lost, I think build would have failed. may be month or so from now :). Will merge the fix This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] rubenssoto commented on issue #1829: [SUPPORT] S3 slow file listing causes Hudi read performance.
rubenssoto commented on issue #1829: URL: https://github.com/apache/hudi/issues/1829#issuecomment-766496187 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zherenyu831 commented on issue #2285: [SUPPORT] Exception on snapshot query while compaction (hudi 0.6.0)
zherenyu831 commented on issue #2285: URL: https://github.com/apache/hudi/issues/2285#issuecomment-766482729 @bvaradar Hi Bavaradar, it will be little difficult to replicate the problem, since it only happens on huge amount of data. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2382: [HUDI-1477] Support CopyOnWriteTable in java client
codecov-io edited a comment on pull request #2382: URL: https://github.com/apache/hudi/pull/2382#issuecomment-751367927 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan edited a comment on issue #2285: [SUPPORT] Exception on snapshot query while compaction (hudi 0.6.0)
nsivabalan edited a comment on issue #2285: URL: https://github.com/apache/hudi/issues/2285#issuecomment-766436364 @zherenyu831 : can you please respond with any updates on your end. @n3nash : can you please take a look when you have time. If you were able to narrow down the issue, please do file a jira and add "user-support-issues" label. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #2135: [SUPPORT] GDPR safe deletes is complex
nsivabalan commented on issue #2135: URL: https://github.com/apache/hudi/issues/2135#issuecomment-766439085 @andaag : I have created a Hudi ticket for this. Feel free to update the desc of the ticket with more details https://issues.apache.org/jira/browse/HUDI-1549 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #2123: Timestamp not parsed correctly on Athena
nsivabalan commented on issue #2123: URL: https://github.com/apache/hudi/issues/2123#issuecomment-766439219 @satishkotha : when you get a chance, can you please follow up on this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #2285: [SUPPORT] Exception on snapshot query while compaction (hudi 0.6.0)
nsivabalan commented on issue #2285: URL: https://github.com/apache/hudi/issues/2285#issuecomment-766436364 @zherenyu831 : can you please respond with any updates on your end. @n3nash : can you take a look when you have time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #2467: [Travis issue] TestJsonStringToHoodieRecordMapFunction.testMapFunction failed
nsivabalan commented on issue #2467: URL: https://github.com/apache/hudi/issues/2467#issuecomment-766427684 Have created a tracking jira https://issues.apache.org/jira/browse/HUDI-1547 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #2121: [SUPPORT] How to define scehma for data in jsonArray format when using Deltastreamer
nsivabalan commented on issue #2121: URL: https://github.com/apache/hudi/issues/2121#issuecomment-766439932 @liujinhui1994 : We already have an [example in our HoodieTestDatagenerator](https://github.com/apache/hudi/blob/c4afd179c1983a382b8a5197d800b0f5dba254de/hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestDataGenerator.java#L101) using array type. Let us know if this helps. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar closed issue #2330: Concurrent writes from multiple Spark drivers to S3 support
vinothchandar closed issue #2330: URL: https://github.com/apache/hudi/issues/2330 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #2429: [SUPPORT] S3 throws ConnectionPoolTimeoutException: Timeout waiting for connection from pool when metadata table is turned on
nsivabalan commented on issue #2429: URL: https://github.com/apache/hudi/issues/2429#issuecomment-766428773 @vinothchandar : closing this for now. feel free to re-open if you see more issues. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar commented on issue #2285: [SUPPORT] Exception on snapshot query while compaction (hudi 0.6.0)
vinothchandar commented on issue #2285: URL: https://github.com/apache/hudi/issues/2285#issuecomment-766450275 cc @garyli1019 as well This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #2399: [SUPPORT] Hudi deletes not being properly commited
nsivabalan commented on issue #2399: URL: https://github.com/apache/hudi/issues/2399#issuecomment-766431496 @afeldman1 : can you respond when you can. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io commented on pull request #2485: [HUDI-1109] Support Spark Structured Streaming read from Hudi table
codecov-io commented on pull request #2485: URL: https://github.com/apache/hudi/pull/2485#issuecomment-766519181 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2485?src=pr&el=h1) Report > Merging [#2485](https://codecov.io/gh/apache/hudi/pull/2485?src=pr&el=desc) (91cf083) into [master](https://codecov.io/gh/apache/hudi/commit/e302c6bc12c7eb764781898fdee8ee302ef4ec10?el=desc) (e302c6b) will **decrease** coverage by `40.49%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2485/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2485?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master #2485 +/- ## - Coverage 50.18% 9.68% -40.50% + Complexity 3050 48 -3002 Files 419 53 -366 Lines 189311930-17001 Branches 1948 230 -1718 - Hits 9500 187 -9313 + Misses 86561730 -6926 + Partials775 13 -762 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `9.68% <ø> (-59.75%)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2485?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | | | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | | | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | | | [...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | | | [...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | | | [...lities/schema/SchemaProviderWithPostProcessor.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlc
[GitHub] [hudi] nsivabalan edited a comment on issue #2100: [SUPPORT] 0.6.0 - using keytab authentication gives issues
nsivabalan edited a comment on issue #2100: URL: https://github.com/apache/hudi/issues/2100#issuecomment-766440534 @n3nash @bhasudha : sorry the thread is bit long, so couldn't gauge correctly. I see some workarounds have been proposed and it worked. But do we need to fixes in Hudi in general? if yes, can you file a jira and close this out. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #2329: [SUPPORT] Time Travel (querying the historical versions of data) ability for Hudi Table
nsivabalan commented on issue #2329: URL: https://github.com/apache/hudi/issues/2329#issuecomment-766435383 https://issues.apache.org/jira/browse/HUDI-1460 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #2480: [SUPPORT] The Docker demo document description is incorrect
nsivabalan commented on issue #2480: URL: https://github.com/apache/hudi/issues/2480#issuecomment-766427153 Sure, will take it up. Closing it as we have a tracking jira. https://issues.apache.org/jira/browse/HUDI-1546 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan edited a comment on issue #2121: [SUPPORT] How to define scehma for data in jsonArray format when using Deltastreamer
nsivabalan edited a comment on issue #2121: URL: https://github.com/apache/hudi/issues/2121#issuecomment-766439932 @liujinhui1994 : Sorry about the delay. We already have an [example in our HoodieTestDatagenerator](https://github.com/apache/hudi/blob/c4afd179c1983a382b8a5197d800b0f5dba254de/hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestDataGenerator.java#L101) using array type. Let us know if this helps. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #2066: [SUPPORT] Hudi is increasing the storage size big time
nsivabalan commented on issue #2066: URL: https://github.com/apache/hudi/issues/2066#issuecomment-766449665 @KarthickAN : did you get a chance to try out the suggestion from Balaji. please do update the issue w/ any updates. If the issue is resolved, feel free to close it out. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #2367: [SUPPORT] Seek error when querying MOR Tables in GCP
nsivabalan commented on issue #2367: URL: https://github.com/apache/hudi/issues/2367#issuecomment-766431687 Sure. sorry about the delay. will get to this in a day or two. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2485: [HUDI-1109] Support Spark Structured Streaming read from Hudi table
codecov-io edited a comment on pull request #2485: URL: https://github.com/apache/hudi/pull/2485#issuecomment-766519181 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #2331: Why does Hudi not support field deletions?
nsivabalan commented on issue #2331: URL: https://github.com/apache/hudi/issues/2331#issuecomment-766432877 @prashantwason : In lieu of this ticket, do you think we can update our documentation wrt schema evolution. If you don't mind can you take it up and fix our documentation. https://issues.apache.org/jira/browse/HUDI-1548 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #2178: [SUPPORT] Hudi writing 10MB worth of org.apache.hudi.bloomfilter data in each of the parquet files produced
nsivabalan commented on issue #2178: URL: https://github.com/apache/hudi/issues/2178#issuecomment-766438221 @KarthickAN : hope you got a chance to go through our [blog on indexes in Hudi](https://hudi.apache.org/blog/hudi-indexing-mechanisms/). Wrt this gh issue, please do let us know if you have any more specific questions. If not, will close this out in a weeks time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #2100: [SUPPORT] 0.6.0 - using keytab authentication gives issues
nsivabalan commented on issue #2100: URL: https://github.com/apache/hudi/issues/2100#issuecomment-766440534 @n3nash @bhasudha : sorry the thread is bit long. I see some workarounds have been proposed and it worked. But do we need to fixes in Hudi in general? if yes, can you file a jira and close this out. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #2063: [SUPPORT] change column type from int to long, schema compatibility check failed
nsivabalan commented on issue #2063: URL: https://github.com/apache/hudi/issues/2063#issuecomment-766449860 @cadl : did you get a chance to try out the setting? We plan to close out this issue due to inactivity in a weeks time. But feel free to reopen to create a new ticket if you find any more issues. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] lshg opened a new issue #2490: spark read hudi data from hive
lshg opened a new issue #2490: URL: https://github.com/apache/hudi/issues/2490 package com.gjr.recommend import org.apache.spark.sql.hive.HiveContext import org.apache.spark.sql.{Row, SparkSession} import org.apache.spark.{SparkConf, SparkContext} object DWDTenderLog { def main(args: Array[String]): Unit = { val conf = new SparkConf().setAppName(this.getClass.getSimpleName).setMaster("local[2]").set("spark.executor.memory", "512m") val sc: SparkContext = new SparkContext(conf) val spark: SparkSession = SparkSession.builder().config(conf).getOrCreate() val hc = new HiveContext(sc) hc.setConf("spark.sql.crossJoin.enabled","true"); val tenderLog: Array[Row] = hc.sql( """ | SELECT |projectid, |provinceid, |typeId, |tender_tag |FROM |( |SELECT |projectid, |provinceid, |typeId, |antistop |FROM |app.dwd_recommend_tender_ds |WHERE |createTime >= 1608280608479 AND createTime <= 1611628847000 |AND antistop != '' |GROUP BY |projectid, |provinceid, |typeId, |antistop |) AS a lateral VIEW explode (split(antistop, "#")) table_tmp AS tender_tag """.stripMargin).collect() println(tenderLog.toBuffer) sc.stop() } } 0[main] INFO org.apache.spark.SparkContext - Running Spark version 2.4.7 346 [main] INFO org.apache.spark.SparkContext - Submitted application: DWDTenderLog$ 390 [main] INFO org.apache.spark.SecurityManager - Changing view acls to: lsh 390 [main] INFO org.apache.spark.SecurityManager - Changing modify acls to: lsh 390 [main] INFO org.apache.spark.SecurityManager - Changing view acls groups to: 390 [main] INFO org.apache.spark.SecurityManager - Changing modify acls groups to: 391 [main] INFO org.apache.spark.SecurityManager - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(lsh); groups with view permissions: Set(); users with modify permissions: Set(lsh); groups with modify permissions: Set() 2533 [main] INFO org.apache.spark.util.Utils - Successfully started service 'sparkDriver' on port 54347. 2575 [main] INFO org.apache.spark.SparkEnv - Registering MapOutputTracker 2588 [main] INFO org.apache.spark.SparkEnv - Registering BlockManagerMaster 2589 [main] INFO org.apache.spark.storage.BlockManagerMasterEndpoint - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 2590 [main] INFO org.apache.spark.storage.BlockManagerMasterEndpoint - BlockManagerMasterEndpoint up 2596 [main] INFO org.apache.spark.storage.DiskBlockManager - Created local directory at C:\Users\lsh\AppData\Local\Temp\blockmgr-d134fb11-0552-4b4b-8f20-ea7e04fd086d 2609 [main] INFO org.apache.spark.storage.memory.MemoryStore - MemoryStore started with capacity 1979.1 MB 2619 [main] INFO org.apache.spark.SparkEnv - Registering OutputCommitCoordinator 2675 [main] INFO org.spark_project.jetty.util.log - Logging initialized @23630ms 2720 [main] INFO org.spark_project.jetty.server.Server - jetty-9.3.z-SNAPSHOT, build timestamp: 2019-02-16T00:53:49+08:00, git hash: eb70b240169fcf1abbd86af36482d1c49826fa0b 2731 [main] INFO org.spark_project.jetty.server.Server - Started @23687ms 2747 [main] INFO org.spark_project.jetty.server.AbstractConnector - Started ServerConnector@4d63b624{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} 2747 [main] INFO org.apache.spark.util.Utils - Successfully started service 'SparkUI' on port 4040. 2767 [main] INFO org.spark_project.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@27eb3298{/jobs,null,AVAILABLE,@Spark} 2768 [main] INFO org.spark_project.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@1b58ff9e{/jobs/json,null,AVAILABLE,@Spark} 2768 [main] INFO org.spark_project.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@2f66e802{/jobs/job,null,AVAILABLE,@Spark} 2769 [main] INFO org.spark_project.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@76318a7d{/jobs/job/json,null,AVAILABLE,@Spark} 2770 [main] INFO org.spark_project.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@2a492f2a{/stages,null,AVAILABLE,@Spark} 2770 [main] INFO org.spark_project.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@3277e499{/stages/json,null,AVAILABLE,@Spark} 2771 [main] INFO org.spark_project.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@585811a4{/stages/stage,null,AVAILABLE,@Spark} 2772 [main] IN
[GitHub] [hudi] lshg opened a new issue #2489: [SUPPORT]
lshg opened a new issue #2489: URL: https://github.com/apache/hudi/issues/2489 hive (app)> SELECT > projectid, > provinceid, > typeId, > antistop > FROM > app.dwd_recommend_tender_ds > WHERE > createTime >= 1608280608479 > AND antistop != '' limit 2; OK projectid provinceid typeid antistop 7876350 15 9 装修#扩建 7876350 15 9 装修#扩建 Time taken: 0.133 seconds, Fetched: 2 row(s) it is work ! but ! hive (app)> select count(1) as total from app.dwd_recommend_tender_ds; Query ID = root_20210126113448_acb65504-b31b-4309-9d65-39c48743326e Total jobs = 1 Launching Job 1 out of 1 Tez session was closed. Reopening... Session re-established. -- VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -- Map 1container INITIALIZING -1 00 -1 0 0 Reducer 2containerINITED 1 001 0 0 -- VERTICES: 00/02 [>>--] 0%ELAPSED TIME: 0.55 s -- Status: Failed Vertex failed, vertexName=Map 1, vertexId=vertex_1611208773582_0084_1_00, diagnostics=[Vertex vertex_1611208773582_0084_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: dwd_recommend_tender_ds initializer failed, vertex=vertex_1611208773582_0084_1_00 [Map 1], java.lang.NoSuchMethodError: org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getCommitsTimeline()Lorg/apache/hudi/common/table/HoodieTimeline; at org.apache.hudi.hadoop.HoodieParquetInputFormat.filterFileStatusForSnapshotMode(HoodieParquetInputFormat.java:238) at org.apache.hudi.hadoop.HoodieParquetInputFormat.listStatus(HoodieParquetInputFormat.java:110) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315) at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:442) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:561) at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:196) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ] Vertex killed, vertexName=Reducer 2, vertexId=vertex_1611208773582_0084_1_01, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1611208773582_0084_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE] DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1611208773582_0084_1_00, diagnostics=[Vertex vertex_1611208773582_0084_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: dwd_recommend_tender_ds initializer failed, vertex=vertex_1611208773582_0084_1_00 [Map 1], java.lang.NoSuchMethodError: org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getCommitsTimeline()Lorg/apache/hudi/common/table/HoodieTimeline; at org.apache.hudi.hadoop.HoodieParquetInputFormat.filterFileStatusForSnapshotMode(HoodieParquetInputFormat.java:238) at org.apache.hudi.hadoop.HoodieParquetInputFormat.listStatus(HoodieParquetInputFormat.java:110) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat
[GitHub] [hudi] vinothchandar commented on issue #2013: [SUPPORT] MoR tables SparkDataSource Incremental Querys
vinothchandar commented on issue #2013: URL: https://github.com/apache/hudi/issues/2013#issuecomment-767265499 This is now out in the 0.7.0 release. See https://github.com/apache/hudi/blame/release-0.7.0/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala#L183 this test for examples This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar closed issue #2013: [SUPPORT] MoR tables SparkDataSource Incremental Querys
vinothchandar closed issue #2013: URL: https://github.com/apache/hudi/issues/2013 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Karl-WangSK commented on pull request #2260: [HUDI-1381] Schedule compaction based on time elapsed
Karl-WangSK commented on pull request #2260: URL: https://github.com/apache/hudi/pull/2260#issuecomment-767261660 cc @yanghua This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] jingweiz2017 commented on issue #1971: Schema evoluation causes issue when using kafka source in hudi deltastreamer
jingweiz2017 commented on issue #1971: URL: https://github.com/apache/hudi/issues/1971#issuecomment-767242422 @nsivabalan @bvaradar , thanks for the reply. The commit mentioned by bvaradar should work for me case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2487: [WIP HUDI-53] Adding Record Level Index based on hoodie backed table
codecov-io edited a comment on pull request #2487: URL: https://github.com/apache/hudi/pull/2487#issuecomment-767228748 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io commented on pull request #2487: [WIP HUDI-53] Adding Record Level Index based on hoodie backed table
codecov-io commented on pull request #2487: URL: https://github.com/apache/hudi/pull/2487#issuecomment-767228748 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2487?src=pr&el=h1) Report > Merging [#2487](https://codecov.io/gh/apache/hudi/pull/2487?src=pr&el=desc) (8b07157) into [master](https://codecov.io/gh/apache/hudi/commit/e302c6bc12c7eb764781898fdee8ee302ef4ec10?el=desc) (e302c6b) will **increase** coverage by `19.24%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2487/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2487?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#2487 +/- ## = + Coverage 50.18% 69.43% +19.24% + Complexity 3050 357 -2693 = Files 419 53 -366 Lines 18931 1930-17001 Branches 1948 230 -1718 = - Hits 9500 1340 -8160 + Misses 8656 456 -8200 + Partials775 134 -641 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `69.43% <ø> (ø)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2487?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...e/hudi/common/engine/HoodieLocalEngineContext.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2VuZ2luZS9Ib29kaWVMb2NhbEVuZ2luZUNvbnRleHQuamF2YQ==) | | | | | [.../org/apache/hudi/MergeOnReadSnapshotRelation.scala](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL01lcmdlT25SZWFkU25hcHNob3RSZWxhdGlvbi5zY2FsYQ==) | | | | | [.../org/apache/hudi/exception/HoodieKeyException.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZUtleUV4Y2VwdGlvbi5qYXZh) | | | | | [.../apache/hudi/common/bloom/BloomFilterTypeCode.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2Jsb29tL0Jsb29tRmlsdGVyVHlwZUNvZGUuamF2YQ==) | | | | | [...able/timeline/versioning/AbstractMigratorBase.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvQWJzdHJhY3RNaWdyYXRvckJhc2UuamF2YQ==) | | | | | [...rc/main/java/org/apache/hudi/cli/HoodiePrompt.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL0hvb2RpZVByb21wdC5qYXZh) | | | | | [.../org/apache/hudi/common/model/HoodieTableType.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZVRhYmxlVHlwZS5qYXZh) | | | | | [.../scala/org/apache/hudi/Spark2RowDeserializer.scala](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3BhcmsyL3NyYy9tYWluL3NjYWxhL29yZy9hcGFjaGUvaHVkaS9TcGFyazJSb3dEZXNlcmlhbGl6ZXIuc2NhbGE=) | | | | | [...hudi/common/table/log/block/HoodieDeleteBlock.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9ibG9jay9Ib29kaWVEZWxldGVCbG9jay5qYXZh) | | | | | [...cala/org/apache/hudi/HoodieBootstrapRelation.scala](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZUJvb3RzdHJhcFJlbGF0aW9uLnNjYWxh) | | | | | ... and [356 more](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree-more) | | This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about
svn commit: r45595 - in /release/hudi: 0.7.0/ hudi-0.7.0/
Author: vinoth Date: Tue Jan 26 01:37:48 2021 New Revision: 45595 Log: Renaming for Hudi 0.7.0 Added: release/hudi/0.7.0/ - copied from r45594, release/hudi/hudi-0.7.0/ Removed: release/hudi/hudi-0.7.0/
[jira] [Commented] (HUDI-1547) CI intermittent failure: TestJsonStringToHoodieRecordMapFunction.testMapFunction
[ https://issues.apache.org/jira/browse/HUDI-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271766#comment-17271766 ] wangxianghu commented on HUDI-1547: --- [~vinoth] I can take it > CI intermittent failure: > TestJsonStringToHoodieRecordMapFunction.testMapFunction > - > > Key: HUDI-1547 > URL: https://issues.apache.org/jira/browse/HUDI-1547 > Project: Apache Hudi > Issue Type: Bug > Components: Release & Administrative >Affects Versions: 0.8.0 >Reporter: sivabalan narayanan >Assignee: wangxianghu >Priority: Major > Labels: user-support-issues > > [https://github.com/apache/hudi/issues/2467] > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-1547) CI intermittent failure: TestJsonStringToHoodieRecordMapFunction.testMapFunction
[ https://issues.apache.org/jira/browse/HUDI-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangxianghu reassigned HUDI-1547: - Assignee: wangxianghu > CI intermittent failure: > TestJsonStringToHoodieRecordMapFunction.testMapFunction > - > > Key: HUDI-1547 > URL: https://issues.apache.org/jira/browse/HUDI-1547 > Project: Apache Hudi > Issue Type: Bug > Components: Release & Administrative >Affects Versions: 0.8.0 >Reporter: sivabalan narayanan >Assignee: wangxianghu >Priority: Major > Labels: user-support-issues > > [https://github.com/apache/hudi/issues/2467] > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] nsivabalan closed issue #1958: [SUPPORT] Global Indexes return old partition value when querying Hive tables
nsivabalan closed issue #1958: URL: https://github.com/apache/hudi/issues/1958 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #1958: [SUPPORT] Global Indexes return old partition value when querying Hive tables
nsivabalan commented on issue #1958: URL: https://github.com/apache/hudi/issues/1958#issuecomment-767210126 https://github.com/apache/hudi/pull/1978 have fixed it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a change in pull request #2487: [WIP HUDI-53] Adding Record Level Index based on hoodie backed table
nsivabalan commented on a change in pull request #2487: URL: https://github.com/apache/hudi/pull/2487#discussion_r564142151 ## File path: hudi-common/src/main/java/org/apache/hudi/index/HoodieRecordLevelIndexPayload.java ## @@ -0,0 +1,98 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.index; + +import org.apache.hudi.avro.model.HoodieRecordLevelIndexRecord; +import org.apache.hudi.common.model.HoodieRecordPayload; +import org.apache.hudi.common.util.Option; + +import org.apache.avro.Schema; +import org.apache.avro.generic.GenericRecord; +import org.apache.avro.generic.IndexedRecord; + +import java.io.IOException; + +/** + * Payload used in index table for Hoodie Record level index. + */ +public class HoodieRecordLevelIndexPayload implements HoodieRecordPayload { + + private String key; + private String partitionPath; + private String instantTime; + private String fileId; + + public HoodieRecordLevelIndexPayload(Option record) { +if (record.isPresent()) { + // This can be simplified using SpecificData.deepcopy once this bug is fixed + // https://issues.apache.org/jira/browse/AVRO-1811 + key = record.get().get("key").toString(); + partitionPath = record.get().get("partitionPath").toString(); + instantTime = record.get().get("instantTime").toString(); + fileId = record.get().get("fileId").toString(); +} + } + + private HoodieRecordLevelIndexPayload(String key, String partitionPath, String instantTime, String fileId) { +this.key = key; +this.partitionPath = partitionPath; +this.instantTime = instantTime; +this.fileId = fileId; + } + + @Override + public HoodieRecordLevelIndexPayload preCombine(HoodieRecordLevelIndexPayload another) { +if (this.instantTime.compareTo(another.instantTime) >= 0) { Review comment: Note: this needs some fixing . Can we just convert the string to long and compare. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column
nsivabalan commented on issue #1962: URL: https://github.com/apache/hudi/issues/1962#issuecomment-767209175 @bvaradar : guess you missed to follow up on this thread. can you check it out and respond when you can. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #1971: Schema evoluation causes issue when using kafka source in hudi deltastreamer
nsivabalan commented on issue #1971: URL: https://github.com/apache/hudi/issues/1971#issuecomment-767208636 @jingweiz2017 : can you please check above response and let us know if you need anything more from Hudi community. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #1981: [SUPPORT] Huge performance Difference Between Hudi and Regular Parquet in Athena
nsivabalan commented on issue #1981: URL: https://github.com/apache/hudi/issues/1981#issuecomment-767206596 @vinothchandar @umehrot2 : can either of you respond here wrt metadata support(rfc-15) in Athena. when can we possibly expect. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #1982: [SUPPORT] Not able to write to ADLS Gen2 in Azure Databricks, with error has invalid authority.
nsivabalan commented on issue #1982: URL: https://github.com/apache/hudi/issues/1982#issuecomment-767205667 @Ac-Rush : would you mind update the ticket. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #2013: [SUPPORT] MoR tables SparkDataSource Incremental Querys
nsivabalan commented on issue #2013: URL: https://github.com/apache/hudi/issues/2013#issuecomment-767204986 @garyli1019 : can you give any updates you have on on this regard. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Reopened] (HUDI-284) Need Tests for Hudi handling of schema evolution
[ https://issues.apache.org/jira/browse/HUDI-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar reopened HUDI-284: - > Need Tests for Hudi handling of schema evolution > - > > Key: HUDI-284 > URL: https://issues.apache.org/jira/browse/HUDI-284 > Project: Apache Hudi > Issue Type: Test > Components: Common Core, newbie, Testing >Reporter: Balaji Varadarajan >Assignee: liwei >Priority: Major > Labels: help-wanted, pull-request-available, starter > Fix For: 0.7.0 > > > Context in : > https://github.com/apache/incubator-hudi/pull/927#pullrequestreview-293449514 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-575) Support Async Compaction for spark streaming writes to hudi table
[ https://issues.apache.org/jira/browse/HUDI-575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar resolved HUDI-575. - Resolution: Fixed > Support Async Compaction for spark streaming writes to hudi table > - > > Key: HUDI-575 > URL: https://issues.apache.org/jira/browse/HUDI-575 > Project: Apache Hudi > Issue Type: Improvement > Components: Spark Integration >Reporter: Balaji Varadarajan >Assignee: Balaji Varadarajan >Priority: Blocker > Labels: pull-request-available > Fix For: 0.7.0 > > > Currenlty, only inline compaction is supported for Structured streaming > writes. > > We need to > * Enable configuring async compaction for streaming writes > * Implement a parallel compaction process like we did for delta streamer -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-284) Need Tests for Hudi handling of schema evolution
[ https://issues.apache.org/jira/browse/HUDI-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar resolved HUDI-284. - Resolution: Fixed > Need Tests for Hudi handling of schema evolution > - > > Key: HUDI-284 > URL: https://issues.apache.org/jira/browse/HUDI-284 > Project: Apache Hudi > Issue Type: Test > Components: Common Core, newbie, Testing >Reporter: Balaji Varadarajan >Assignee: liwei >Priority: Major > Labels: help-wanted, pull-request-available, starter > Fix For: 0.7.0 > > > Context in : > https://github.com/apache/incubator-hudi/pull/927#pullrequestreview-293449514 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HUDI-575) Support Async Compaction for spark streaming writes to hudi table
[ https://issues.apache.org/jira/browse/HUDI-575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar reopened HUDI-575: - > Support Async Compaction for spark streaming writes to hudi table > - > > Key: HUDI-575 > URL: https://issues.apache.org/jira/browse/HUDI-575 > Project: Apache Hudi > Issue Type: Improvement > Components: Spark Integration >Reporter: Balaji Varadarajan >Assignee: Balaji Varadarajan >Priority: Blocker > Labels: pull-request-available > Fix For: 0.7.0 > > > Currenlty, only inline compaction is supported for Structured streaming > writes. > > We need to > * Enable configuring async compaction for streaming writes > * Implement a parallel compaction process like we did for delta streamer -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-791) Replace null by Option in Delta Streamer
[ https://issues.apache.org/jira/browse/HUDI-791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar resolved HUDI-791. - Resolution: Fixed > Replace null by Option in Delta Streamer > > > Key: HUDI-791 > URL: https://issues.apache.org/jira/browse/HUDI-791 > Project: Apache Hudi > Issue Type: Improvement > Components: DeltaStreamer, newbie >Reporter: Yanjia Gary Li >Assignee: liwei >Priority: Minor > Labels: pull-request-available > Fix For: 0.7.0 > > > There is a lot of null in Delta Streamer. That will be great if we can > replace those null by Option. -- This message was sent by Atlassian Jira (v8.3.4#803005)