[jira] [Created] (HUDI-7937) Fix handling of decimals in StreamSync and Clustering

2024-06-27 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-7937:
---

 Summary: Fix handling of decimals in StreamSync and Clustering
 Key: HUDI-7937
 URL: https://issues.apache.org/jira/browse/HUDI-7937
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Timothy Brown


When decimals are using a small precision, we need to write them in legacy 
format to ensure all hudi components can read them back. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7937) Fix handling of decimals in StreamSync and Clustering

2024-06-27 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-7937:
---

Assignee: Timothy Brown

> Fix handling of decimals in StreamSync and Clustering
> -
>
> Key: HUDI-7937
> URL: https://issues.apache.org/jira/browse/HUDI-7937
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>
> When decimals are using a small precision, we need to write them in legacy 
> format to ensure all hudi components can read them back. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7927) Secondary View should only initialize when required

2024-06-24 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-7927:
---

Assignee: Timothy Brown

> Secondary View should only initialize when required
> ---
>
> Key: HUDI-7927
> URL: https://issues.apache.org/jira/browse/HUDI-7927
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>
> In the PriorityBasedFileSystemView, the secondary view will be initialized 
> eagerly causing extra overhead including file listing. We should avoid this 
> to reduce the cost for users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7927) Secondary View should only initialize when required

2024-06-24 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-7927:
---

 Summary: Secondary View should only initialize when required
 Key: HUDI-7927
 URL: https://issues.apache.org/jira/browse/HUDI-7927
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Timothy Brown


In the PriorityBasedFileSystemView, the secondary view will be initialized 
eagerly causing extra overhead including file listing. We should avoid this to 
reduce the cost for users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7826) hoodie.write.set.null.for.missing.columns results in invalid objects

2024-06-02 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-7826:
---

 Summary: hoodie.write.set.null.for.missing.columns results in 
invalid objects
 Key: HUDI-7826
 URL: https://issues.apache.org/jira/browse/HUDI-7826
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Timothy Brown


When setting `hoodie.write.set.null.for.missing.columns` a null value will get 
set for the fields missing in the incoming data set. If the column was 
non-nullable, then you will get an error at runtime. Instead, we should evolve 
the field to be nullable in the table's schema.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7821) Handle schema evolution in proto to avro conversion

2024-05-31 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-7821:
---

 Summary: Handle schema evolution in proto to avro conversion
 Key: HUDI-7821
 URL: https://issues.apache.org/jira/browse/HUDI-7821
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Timothy Brown


Users can encounter errors when a batch of data was written with an older 
schema and a new schema has fields that are not present in the old data



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7758) MDT Initialization Parses Non-Hudi files

2024-05-14 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-7758:
---

Assignee: Timothy Brown

> MDT Initialization Parses Non-Hudi files
> 
>
> Key: HUDI-7758
> URL: https://issues.apache.org/jira/browse/HUDI-7758
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>
> Right now the MDT initialization will parse files that do not belong to the 
> Hudi table



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7758) MDT Initialization Parses Non-Hudi files

2024-05-14 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-7758:
---

 Summary: MDT Initialization Parses Non-Hudi files
 Key: HUDI-7758
 URL: https://issues.apache.org/jira/browse/HUDI-7758
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Timothy Brown


Right now the MDT initialization will parse files that do not belong to the 
Hudi table



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7713) Schema Reconciliation should also re-order fields

2024-05-05 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-7713:
---

Assignee: Timothy Brown

> Schema Reconciliation should also re-order fields
> -
>
> Key: HUDI-7713
> URL: https://issues.apache.org/jira/browse/HUDI-7713
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>
> The schema reconciliation current makes sure the incoming schema is 
> compatible with the target but it can also be used to guarantee a consistent 
> ordering of fields in the schema between commits. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7713) Schema Reconciliation should also re-order fields

2024-05-05 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-7713:
---

 Summary: Schema Reconciliation should also re-order fields
 Key: HUDI-7713
 URL: https://issues.apache.org/jira/browse/HUDI-7713
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Timothy Brown


The schema reconciliation current makes sure the incoming schema is compatible 
with the target but it can also be used to guarantee a consistent ordering of 
fields in the schema between commits. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7689) Allow users to leverage HoodieTable and Engine Context in Compaction Strategy

2024-04-29 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-7689:
---

 Summary: Allow users to leverage HoodieTable and Engine Context in 
Compaction Strategy
 Key: HUDI-7689
 URL: https://issues.apache.org/jira/browse/HUDI-7689
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Timothy Brown






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7689) Allow users to leverage HoodieTable and Engine Context in Compaction Strategy

2024-04-29 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-7689:
---

Assignee: Timothy Brown

> Allow users to leverage HoodieTable and Engine Context in Compaction Strategy
> -
>
> Key: HUDI-7689
> URL: https://issues.apache.org/jira/browse/HUDI-7689
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-4732) Leverage Schema Registry for reading proto messages from kafka

2024-04-22 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-4732:
---

Assignee: Timothy Brown

> Leverage Schema Registry for reading proto messages from kafka
> --
>
> Key: HUDI-4732
> URL: https://issues.apache.org/jira/browse/HUDI-4732
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Minor
>
> If you use the Confluent Schema Registry, they provide a way to deserialize 
> the kafka message value without providing the protobuf class name. The first 
> cut of ProtoKafkaSource requires users to specify a classname but we want to 
> allow users the flexibility to use this other method of deserializing the 
> message.
>  
> Docs: 
> https://docs.confluent.io/platform/current/schema-registry/serdes-develop/serdes-protobuf.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7576) Avoid recomputing partition path in AbstractFileSystemView

2024-04-11 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown updated HUDI-7576:

Summary: Avoid recomputing partition path in AbstractFileSystemView  (was: 
Add partitionPath to the HoodieBaseFile and HoodieLogFile objects)

> Avoid recomputing partition path in AbstractFileSystemView
> --
>
> Key: HUDI-7576
> URL: https://issues.apache.org/jira/browse/HUDI-7576
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>  Labels: pull-request-available
>
> Adding this field to the classes will allow us to avoid repeatedly computing 
> the partition path per file in other parts of the code. This can cut down on 
> the CPU overhead associated with creating the FS View.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7576) Avoid recomputing partition path in AbstractFileSystemView

2024-04-11 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown updated HUDI-7576:

Description: We have observed a non-negligible amount of CPU spent simply 
computing the partition paths of base and log files when building a file system 
view. We should aim to improve the efficiency of these calls and reduce the 
number of them.  (was: Adding this field to the classes will allow us to avoid 
repeatedly computing the partition path per file in other parts of the code. 
This can cut down on the CPU overhead associated with creating the FS View.)

> Avoid recomputing partition path in AbstractFileSystemView
> --
>
> Key: HUDI-7576
> URL: https://issues.apache.org/jira/browse/HUDI-7576
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>  Labels: pull-request-available
>
> We have observed a non-negligible amount of CPU spent simply computing the 
> partition paths of base and log files when building a file system view. We 
> should aim to improve the efficiency of these calls and reduce the number of 
> them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7576) Add partitionPath to the HoodieBaseFile and HoodieLogFile objects

2024-04-07 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-7576:
---

 Summary: Add partitionPath to the HoodieBaseFile and HoodieLogFile 
objects
 Key: HUDI-7576
 URL: https://issues.apache.org/jira/browse/HUDI-7576
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Timothy Brown


Adding this field to the classes will allow us to avoid repeatedly computing 
the partition path per file in other parts of the code. This can cut down on 
the CPU overhead associated with creating the FS View.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7575) Avoid recomputing list of pending replacecommits in FSView code

2024-04-07 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-7575:
---

 Summary: Avoid recomputing list of pending replacecommits in 
FSView code
 Key: HUDI-7575
 URL: https://issues.apache.org/jira/browse/HUDI-7575
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Timothy Brown


When checking if a base file is part of a pending clustering, the code will 
construct the same list repeatedly leading to unnecessary overhead. The class 
should gather this list once and persist it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7576) Add partitionPath to the HoodieBaseFile and HoodieLogFile objects

2024-04-07 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-7576:
---

Assignee: Timothy Brown

> Add partitionPath to the HoodieBaseFile and HoodieLogFile objects
> -
>
> Key: HUDI-7576
> URL: https://issues.apache.org/jira/browse/HUDI-7576
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>
> Adding this field to the classes will allow us to avoid repeatedly computing 
> the partition path per file in other parts of the code. This can cut down on 
> the CPU overhead associated with creating the FS View.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7575) Avoid recomputing list of pending replacecommits in FSView code

2024-04-07 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-7575:
---

Assignee: Timothy Brown

> Avoid recomputing list of pending replacecommits in FSView code
> ---
>
> Key: HUDI-7575
> URL: https://issues.apache.org/jira/browse/HUDI-7575
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>
> When checking if a base file is part of a pending clustering, the code will 
> construct the same list repeatedly leading to unnecessary overhead. The class 
> should gather this list once and persist it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7551) Avoid loading all partitions into memory for cleaner planner

2024-03-26 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-7551:
---

 Summary: Avoid loading all partitions into memory for cleaner 
planner
 Key: HUDI-7551
 URL: https://issues.apache.org/jira/browse/HUDI-7551
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Timothy Brown


When MDT is enabled the clean planner can end up loading all partitions into 
memory which can add extra memory pressure than is required on the driver.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7464) JsonKafkaSource Metadata Bug

2024-03-01 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-7464:
---

 Summary: JsonKafkaSource Metadata Bug
 Key: HUDI-7464
 URL: https://issues.apache.org/jira/browse/HUDI-7464
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Timothy Brown


There are 2 potential issues with the Kafka Json Source:
1. A null key can produce an NPE and result in the offset and other metadata 
not being added to the row
2. The schema post processor can attempt to add fields to a source schema that 
may already contain those metadata fields.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7464) JsonKafkaSource Metadata Bug

2024-03-01 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-7464:
---

Assignee: Timothy Brown

> JsonKafkaSource Metadata Bug
> 
>
> Key: HUDI-7464
> URL: https://issues.apache.org/jira/browse/HUDI-7464
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>
> There are 2 potential issues with the Kafka Json Source:
> 1. A null key can produce an NPE and result in the offset and other metadata 
> not being added to the row
> 2. The schema post processor can attempt to add fields to a source schema 
> that may already contain those metadata fields.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7404) Bloom Filter Execution Improvements

2024-02-12 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-7404:
---

 Summary: Bloom Filter Execution Improvements
 Key: HUDI-7404
 URL: https://issues.apache.org/jira/browse/HUDI-7404
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Timothy Brown


# Avoid executing a countByKey that is only used by a single flow
 # Avoid intermediate collection on driver
 # Early exit when possible to avoid overhead of reader instantiation 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7323) Transformer schema inference uses stale schema

2024-01-22 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-7323:
---

Assignee: Timothy Brown

> Transformer schema inference uses stale schema
> --
>
> Key: HUDI-7323
> URL: https://issues.apache.org/jira/browse/HUDI-7323
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>  Labels: pull-request-available
>
> The `transformedSchema` interface for the Transformer class should use an up 
> to date schema instead of the schema at the time of object creation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7323) Transformer schema inference uses stale schema

2024-01-22 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-7323:
---

 Summary: Transformer schema inference uses stale schema
 Key: HUDI-7323
 URL: https://issues.apache.org/jira/browse/HUDI-7323
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Timothy Brown


The `transformedSchema` interface for the Transformer class should use an up to 
date schema instead of the schema at the time of object creation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7238) Ensure ExternalSpillableMaps are properly closed

2023-12-17 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-7238:
---

 Summary: Ensure ExternalSpillableMaps are properly closed 
 Key: HUDI-7238
 URL: https://issues.apache.org/jira/browse/HUDI-7238
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Timothy Brown


There are a few places that the ExternalSpillableMap are used but the close 
method is not called. There are also cases where we are creating the underlying 
BitMap even when we have no need for it yet.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7237) Minor Improvements to Schema Handling in Delta Sync

2023-12-17 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown updated HUDI-7237:

Priority: Minor  (was: Major)

> Minor Improvements to Schema Handling in Delta Sync
> ---
>
> Key: HUDI-7237
> URL: https://issues.apache.org/jira/browse/HUDI-7237
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Priority: Minor
>  Labels: pull-request-available
>
> There are a two minor items that we have run into running DeltaStreamer in 
> production.
> 1. The number of times the schema is fetched is more than it needs to be and 
> can put unnecessary load on schema providers or increase file system reads
> 2. SchemaProviders that return null target schemas on empty batches cause 
> null schema values in commits leading to unexpected issues later
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7237) Minor Improvements to Schema Handling in Delta Sync

2023-12-16 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-7237:
---

 Summary: Minor Improvements to Schema Handling in Delta Sync
 Key: HUDI-7237
 URL: https://issues.apache.org/jira/browse/HUDI-7237
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Timothy Brown


There are a two minor items that we have run into running DeltaStreamer in 
production.
1. The number of times the schema is fetched is more than it needs to be and 
can put unnecessary load on schema providers or increase file system reads

2. SchemaProviders that return null target schemas on empty batches cause null 
schema values in commits leading to unexpected issues later

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7223) Hudi Cleaner removing files still required for view N hours old

2023-12-11 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-7223:
---

 Summary: Hudi Cleaner removing files still required for view N 
hours old
 Key: HUDI-7223
 URL: https://issues.apache.org/jira/browse/HUDI-7223
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Timothy Brown


If a user is using time based cleaner policy, they will expect that they can 
query the table state as of N hours ago. This means that they do not want to 
clean up files older than N hours but files that are no longer relevant to the 
table N hours ago. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7160) Avro Schema Properties are dropped when adding Hoodie Metadata columns

2023-11-29 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-7160:
---

Assignee: Timothy Brown

> Avro Schema Properties are dropped when adding Hoodie Metadata columns
> --
>
> Key: HUDI-7160
> URL: https://issues.apache.org/jira/browse/HUDI-7160
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>
> When we add the metadata columns to an existing avro schema, the properties 
> set on that schema are dropped. We should allow these properties to be 
> carried through.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7160) Avro Schema Properties are dropped when adding Hoodie Metadata columns

2023-11-29 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-7160:
---

 Summary: Avro Schema Properties are dropped when adding Hoodie 
Metadata columns
 Key: HUDI-7160
 URL: https://issues.apache.org/jira/browse/HUDI-7160
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Timothy Brown


When we add the metadata columns to an existing avro schema, the properties set 
on that schema are dropped. We should allow these properties to be carried 
through.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7160) Avro Schema Properties are dropped when adding Hoodie Metadata columns

2023-11-29 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown updated HUDI-7160:

Priority: Minor  (was: Major)

> Avro Schema Properties are dropped when adding Hoodie Metadata columns
> --
>
> Key: HUDI-7160
> URL: https://issues.apache.org/jira/browse/HUDI-7160
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Minor
>
> When we add the metadata columns to an existing avro schema, the properties 
> set on that schema are dropped. We should allow these properties to be 
> carried through.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7115) Add more options for BigQuery Sync

2023-11-16 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-7115:
---

 Summary: Add more options for BigQuery Sync
 Key: HUDI-7115
 URL: https://issues.apache.org/jira/browse/HUDI-7115
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Timothy Brown


There are options for requiring a partition filter and adding a big lake 
connection ID to leverage some new access control features that users may want 
to leverage in their environment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7112) Allow reuse of timeline server across tables

2023-11-16 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-7112:
---

Assignee: Timothy Brown

> Allow reuse of timeline server across tables
> 
>
> Key: HUDI-7112
> URL: https://issues.apache.org/jira/browse/HUDI-7112
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>
> When a user is running multiple writers in the same JVM, there will currently 
> be a javelin server created per table. This leads to unnecessary overhead 
> since the timeline server can support multiple basepaths.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7112) Allow reuse of timeline server across tables

2023-11-16 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-7112:
---

 Summary: Allow reuse of timeline server across tables
 Key: HUDI-7112
 URL: https://issues.apache.org/jira/browse/HUDI-7112
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Timothy Brown


When a user is running multiple writers in the same JVM, there will currently 
be a javelin server created per table. This leads to unnecessary overhead since 
the timeline server can support multiple basepaths.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6916) Fix excessive object creation in custom key generator

2023-10-04 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-6916:
---

 Summary: Fix excessive object creation in custom key generator
 Key: HUDI-6916
 URL: https://issues.apache.org/jira/browse/HUDI-6916
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Timothy Brown


The custom key generators are creating key generator objects per record/row 
instead of creating them once up front.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6916) Fix excessive object creation in custom key generator

2023-10-04 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-6916:
---

Assignee: Timothy Brown

> Fix excessive object creation in custom key generator
> -
>
> Key: HUDI-6916
> URL: https://issues.apache.org/jira/browse/HUDI-6916
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>
> The custom key generators are creating key generator objects per record/row 
> instead of creating them once up front.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6898) Improve test stability by closing metadata writers, update logging

2023-09-27 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-6898:
---

Assignee: Timothy Brown

> Improve test stability by closing metadata writers, update logging
> --
>
> Key: HUDI-6898
> URL: https://issues.apache.org/jira/browse/HUDI-6898
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Trivial
>
> Improve the test stability and performance by closing all metadata writers 
> created in the tests.
> Also update logging to reduce the number of logs making it easier to find the 
> failures in the test output.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6898) Improve test stability by closing metadata writers, update logging

2023-09-27 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown updated HUDI-6898:

Priority: Trivial  (was: Major)

> Improve test stability by closing metadata writers, update logging
> --
>
> Key: HUDI-6898
> URL: https://issues.apache.org/jira/browse/HUDI-6898
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Priority: Trivial
>
> Improve the test stability and performance by closing all metadata writers 
> created in the tests.
> Also update logging to reduce the number of logs making it easier to find the 
> failures in the test output.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6898) Improve test stability by closing metadata writers, update logging

2023-09-27 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-6898:
---

 Summary: Improve test stability by closing metadata writers, 
update logging
 Key: HUDI-6898
 URL: https://issues.apache.org/jira/browse/HUDI-6898
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Timothy Brown


Improve the test stability and performance by closing all metadata writers 
created in the tests.

Also update logging to reduce the number of logs making it easier to find the 
failures in the test output.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6871) BigQuery Sync Improvements

2023-09-18 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown updated HUDI-6871:

Priority: Minor  (was: Major)

> BigQuery Sync Improvements
> --
>
> Key: HUDI-6871
> URL: https://issues.apache.org/jira/browse/HUDI-6871
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Minor
>
> # The manifest file writer is slow due to the overhead incurred per iteration
>  # Schema's with reserved keywords are failing in the create table statement



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6871) BigQuery Sync Improvements

2023-09-18 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-6871:
---

Assignee: Timothy Brown

> BigQuery Sync Improvements
> --
>
> Key: HUDI-6871
> URL: https://issues.apache.org/jira/browse/HUDI-6871
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>
> # The manifest file writer is slow due to the overhead incurred per iteration
>  # Schema's with reserved keywords are failing in the create table statement



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6871) BigQuery Sync Improvements

2023-09-18 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-6871:
---

 Summary: BigQuery Sync Improvements
 Key: HUDI-6871
 URL: https://issues.apache.org/jira/browse/HUDI-6871
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Timothy Brown


# The manifest file writer is slow due to the overhead incurred per iteration
 # Schema's with reserved keywords are failing in the create table statement



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6857) Update Docs For BigQuerySyncTool

2023-09-13 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-6857:
---

Assignee: Timothy Brown

> Update Docs For BigQuerySyncTool
> 
>
> Key: HUDI-6857
> URL: https://issues.apache.org/jira/browse/HUDI-6857
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Trivial
>
> Update the docs to include references to the new manifest based approach



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6857) Update Docs For BigQuerySyncTool

2023-09-13 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-6857:
---

 Summary: Update Docs For BigQuerySyncTool
 Key: HUDI-6857
 URL: https://issues.apache.org/jira/browse/HUDI-6857
 Project: Apache Hudi
  Issue Type: Task
Reporter: Timothy Brown


Update the docs to include references to the new manifest based approach



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6839) Github Actions Workflow Improvements

2023-09-08 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-6839:
---

Assignee: Timothy Brown

> Github Actions Workflow Improvements
> 
>
> Key: HUDI-6839
> URL: https://issues.apache.org/jira/browse/HUDI-6839
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>
> # Leverage maven cache option for build speed
>  # Use parallel build when packaging jars for tests
>  # Cancel inflight tests when updates to branches are pushed to save on costs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6839) Github Actions Workflow Improvements

2023-09-08 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-6839:
---

 Summary: Github Actions Workflow Improvements
 Key: HUDI-6839
 URL: https://issues.apache.org/jira/browse/HUDI-6839
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Timothy Brown


# Leverage maven cache option for build speed
 # Use parallel build when packaging jars for tests
 # Cancel inflight tests when updates to branches are pushed to save on costs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6836) Shutdown metrics for metadata table writer in deltastreamer

2023-09-08 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-6836:
---

 Summary: Shutdown metrics for metadata table writer in 
deltastreamer
 Key: HUDI-6836
 URL: https://issues.apache.org/jira/browse/HUDI-6836
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Timothy Brown


When debugging some Deltastreamer tests, I noticed that there is still a 
running metrics instance for the metadata table path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6836) Shutdown metrics for metadata table writer in deltastreamer

2023-09-08 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-6836:
---

Assignee: Timothy Brown

> Shutdown metrics for metadata table writer in deltastreamer
> ---
>
> Key: HUDI-6836
> URL: https://issues.apache.org/jira/browse/HUDI-6836
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Minor
>
> When debugging some Deltastreamer tests, I noticed that there is still a 
> running metrics instance for the metadata table path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6807) MoR Incremental count queries trigger full scan of files in table

2023-08-30 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-6807:
---

 Summary: MoR Incremental count queries trigger full scan of files 
in table
 Key: HUDI-6807
 URL: https://issues.apache.org/jira/browse/HUDI-6807
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Timothy Brown


While running the `TestMORDataSource` datasource tests I saw that we eventually 
call `HoodiePruneFileSourcePartitions` which will list all of the files in the 
table instead of the files that are relevant to the incremental query. Ideally 
this would be limited to the files that were impacted by commits within the 
range specified.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6763) WriteStats are extracted twice in BaseSparkCommitActionExecutor

2023-08-28 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-6763:
---

Assignee: Timothy Brown

> WriteStats are extracted twice in BaseSparkCommitActionExecutor
> ---
>
> Key: HUDI-6763
> URL: https://issues.apache.org/jira/browse/HUDI-6763
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Minor
>
> In BaseSparkCommitActionExecutor there are two places the same 
> `collectAsList` is called on an RDD. We can optimize this by only calling 
> this method once.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6763) WriteStats are extracted twice in BaseSparkCommitActionExecutor

2023-08-28 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-6763:
---

 Summary: WriteStats are extracted twice in 
BaseSparkCommitActionExecutor
 Key: HUDI-6763
 URL: https://issues.apache.org/jira/browse/HUDI-6763
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Timothy Brown


In BaseSparkCommitActionExecutor there are two places the same `collectAsList` 
is called on an RDD. We can optimize this by only calling this method once.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6741) Timeline server cannot handle multiple base paths when metadata table is enabled

2023-08-23 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-6741:
---

Assignee: Timothy Brown

> Timeline server cannot handle multiple base paths when metadata table is 
> enabled
> 
>
> Key: HUDI-6741
> URL: https://issues.apache.org/jira/browse/HUDI-6741
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>
> The Timeline Server will take in a view manager to gather the information 
> about the tables. When the metadata table is enabled, there is a supplier 
> that will be called to get the 
> HoodieTableMetadata. That supplier is configured for a single base path but 
> the timeline server can be used for multiple tables. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6741) Timeline server cannot handle multiple base paths when metadata table is enabled

2023-08-23 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-6741:
---

 Summary: Timeline server cannot handle multiple base paths when 
metadata table is enabled
 Key: HUDI-6741
 URL: https://issues.apache.org/jira/browse/HUDI-6741
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Timothy Brown


The Timeline Server will take in a view manager to gather the information about 
the tables. When the metadata table is enabled, there is a supplier that will 
be called to get the 
HoodieTableMetadata. That supplier is configured for a single base path but the 
timeline server can be used for multiple tables. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6731) Allow MoR Read-Optimized BigQuery Sync

2023-08-21 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-6731:
---

Assignee: Timothy Brown

> Allow MoR Read-Optimized BigQuery Sync
> --
>
> Key: HUDI-6731
> URL: https://issues.apache.org/jira/browse/HUDI-6731
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Minor
>  Labels: pull-request-available
>
> Allow users to query their Hudi MoR tables with BigQuery in a read-optimized 
> manner by syncing the base files to BigQuery like we do for CoW tables today.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6731) Allow MoR Read-Optimized BigQuery Sync

2023-08-20 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-6731:
---

 Summary: Allow MoR Read-Optimized BigQuery Sync
 Key: HUDI-6731
 URL: https://issues.apache.org/jira/browse/HUDI-6731
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Timothy Brown


Allow users to query their Hudi MoR tables with BigQuery in a read-optimized 
manner by syncing the base files to BigQuery like we do for CoW tables today.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6728) Add Schema Evolution Support to BigQuery Sync

2023-08-18 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-6728:
---

Assignee: Timothy Brown

> Add Schema Evolution Support to BigQuery Sync
> -
>
> Key: HUDI-6728
> URL: https://issues.apache.org/jira/browse/HUDI-6728
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>
> Right now the BigQuery sync is using schema auto detection which will rely on 
> a single file for the schema. This can cause issues when users evolve their 
> schema since the file may not have the latest schema.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6728) Add Schema Evolution Support to BigQuery Sync

2023-08-18 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-6728:
---

 Summary: Add Schema Evolution Support to BigQuery Sync
 Key: HUDI-6728
 URL: https://issues.apache.org/jira/browse/HUDI-6728
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Timothy Brown


Right now the BigQuery sync is using schema auto detection which will rely on a 
single file for the schema. This can cause issues when users evolve their 
schema since the file may not have the latest schema.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HUDI-6672) BigQuery Sync updates while queries running cause failures

2023-08-18 Thread Timothy Brown (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-6672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17756221#comment-17756221
 ] 

Timothy Brown commented on HUDI-6672:
-

Closing since there is a new manifest file based approach that does not have 
this issue.

> BigQuery Sync updates while queries running cause failures
> --
>
> Key: HUDI-6672
> URL: https://issues.apache.org/jira/browse/HUDI-6672
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>
> Issue was reported by the user here: 
> [https://github.com/apache/hudi/issues/9355]
>  
> It looks like we are updating the underlying manifest file while there is a 
> query executing causing issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-6672) BigQuery Sync updates while queries running cause failures

2023-08-18 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown closed HUDI-6672.
---
Resolution: Won't Fix

> BigQuery Sync updates while queries running cause failures
> --
>
> Key: HUDI-6672
> URL: https://issues.apache.org/jira/browse/HUDI-6672
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>
> Issue was reported by the user here: 
> [https://github.com/apache/hudi/issues/9355]
>  
> It looks like we are updating the underlying manifest file while there is a 
> query executing causing issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6672) BigQuery Sync updates while queries running cause failures

2023-08-09 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-6672:
---

 Summary: BigQuery Sync updates while queries running cause failures
 Key: HUDI-6672
 URL: https://issues.apache.org/jira/browse/HUDI-6672
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Timothy Brown


Issue was reported by the user here: 
[https://github.com/apache/hudi/issues/9355]

 

It looks like we are updating the underlying manifest file while there is a 
query executing causing issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6672) BigQuery Sync updates while queries running cause failures

2023-08-09 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-6672:
---

Assignee: Timothy Brown

> BigQuery Sync updates while queries running cause failures
> --
>
> Key: HUDI-6672
> URL: https://issues.apache.org/jira/browse/HUDI-6672
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>
> Issue was reported by the user here: 
> [https://github.com/apache/hudi/issues/9355]
>  
> It looks like we are updating the underlying manifest file while there is a 
> query executing causing issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6664) Fix Java Bulk Insert partitioner for all metadata table partitions

2023-08-07 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-6664:
---

 Summary: Fix Java Bulk Insert partitioner for all metadata table 
partitions
 Key: HUDI-6664
 URL: https://issues.apache.org/jira/browse/HUDI-6664
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Timothy Brown


The java bulk partitioner was updated to handle the metadata table but it 
should be done in a cleaner way and should be validated that it will work when 
bootstrapping all of the metadata table partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6648) Allow creation of table with existing files when metadata table is enabled

2023-08-04 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-6648:
---

Assignee: Timothy Brown

> Allow creation of table with existing files when metadata table is enabled
> --
>
> Key: HUDI-6648
> URL: https://issues.apache.org/jira/browse/HUDI-6648
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>
> With the metadata table we can store the information about the table and 
> shift away from relying directly on file names for information like commit 
> and fileID. Adding support for creating tables with existing files will allow 
> us to initialize tables from existing datasets.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6648) Allow creation of table with existing files when metadata table is enabled

2023-08-04 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-6648:
---

 Summary: Allow creation of table with existing files when metadata 
table is enabled
 Key: HUDI-6648
 URL: https://issues.apache.org/jira/browse/HUDI-6648
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Timothy Brown


With the metadata table we can store the information about the table and shift 
away from relying directly on file names for information like commit and 
fileID. Adding support for creating tables with existing files will allow us to 
initialize tables from existing datasets.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6647) Expand Hudi Java Client Functionality

2023-08-04 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-6647:
---

 Summary: Expand Hudi Java Client Functionality
 Key: HUDI-6647
 URL: https://issues.apache.org/jira/browse/HUDI-6647
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Timothy Brown


With recent improvements to the abstractions in the Hudi codebase we can expand 
the functionality in the java client with a lower amount of effort by moving 
common code into the base client and table services.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6647) Expand Hudi Java Client Functionality

2023-08-04 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-6647:
---

Assignee: Timothy Brown

> Expand Hudi Java Client Functionality
> -
>
> Key: HUDI-6647
> URL: https://issues.apache.org/jira/browse/HUDI-6647
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>
> With recent improvements to the abstractions in the Hudi codebase we can 
> expand the functionality in the java client with a lower amount of effort by 
> moving common code into the base client and table services.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6628) Rely on HoodieBaseFile and HoodieLogFile methods over FsUtils

2023-08-01 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-6628:
---

 Summary: Rely on HoodieBaseFile and HoodieLogFile methods over 
FsUtils
 Key: HUDI-6628
 URL: https://issues.apache.org/jira/browse/HUDI-6628
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Timothy Brown


Update the code to rely on the methods exposed by the HoodieBaseFile and the 
HoodieLogFile instead of using FsUtils when possible to start removing our 
reliance on directly referencing file paths for information like commit time 
and file ID throughout the codebase.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6628) Rely on HoodieBaseFile and HoodieLogFile methods over FsUtils

2023-08-01 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-6628:
---

Assignee: Timothy Brown

> Rely on HoodieBaseFile and HoodieLogFile methods over FsUtils
> -
>
> Key: HUDI-6628
> URL: https://issues.apache.org/jira/browse/HUDI-6628
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Minor
>
> Update the code to rely on the methods exposed by the HoodieBaseFile and the 
> HoodieLogFile instead of using FsUtils when possible to start removing our 
> reliance on directly referencing file paths for information like commit time 
> and file ID throughout the codebase.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6618) Add Java implementation of HoodieBackedTableMetadataWriter

2023-07-31 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-6618:
---

Assignee: Timothy Brown

> Add Java implementation of HoodieBackedTableMetadataWriter
> --
>
> Key: HUDI-6618
> URL: https://issues.apache.org/jira/browse/HUDI-6618
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>
> Add an implementation of 
> HoodieBackedTableMetadataWriter to be used within the java write client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6618) Add Java implementation of HoodieBackedTableMetadataWriter

2023-07-31 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-6618:
---

 Summary: Add Java implementation of HoodieBackedTableMetadataWriter
 Key: HUDI-6618
 URL: https://issues.apache.org/jira/browse/HUDI-6618
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Timothy Brown


Add an implementation of 
HoodieBackedTableMetadataWriter to be used within the java write client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6590) Improve BigQuery Sync Schema and Partition Handling

2023-07-25 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown updated HUDI-6590:

Summary: Improve BigQuery Sync Schema and Partition Handling  (was: Improve 
BigQuery Sync Support)

> Improve BigQuery Sync Schema and Partition Handling
> ---
>
> Key: HUDI-6590
> URL: https://issues.apache.org/jira/browse/HUDI-6590
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Minor
>
> Add features for Schema evolution and listing only required base files while 
> querying the table to cut down on BigQuery usage costs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6590) Improve BigQuery Sync Support

2023-07-25 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-6590:
---

Assignee: Timothy Brown

> Improve BigQuery Sync Support
> -
>
> Key: HUDI-6590
> URL: https://issues.apache.org/jira/browse/HUDI-6590
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Minor
>
> Add features for Schema evolution and listing only required base files while 
> querying the table to cut down on BigQuery usage costs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6590) Improve BigQuery Sync Support

2023-07-25 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-6590:
---

 Summary: Improve BigQuery Sync Support
 Key: HUDI-6590
 URL: https://issues.apache.org/jira/browse/HUDI-6590
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Timothy Brown


Add features for Schema evolution and listing only required base files while 
querying the table to cut down on BigQuery usage costs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6168) Add source partition columns to rows in S3/GCS Sources

2023-05-03 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-6168:
---

 Summary: Add source partition columns to rows in S3/GCS Sources
 Key: HUDI-6168
 URL: https://issues.apache.org/jira/browse/HUDI-6168
 Project: Apache Hudi
  Issue Type: New Feature
Reporter: Timothy Brown


If the files read from an S3 or GCS source have a hive style partitioning 
themselves, we should be able to parse that out as a column to return in the 
dataset that is then fed into the delta streamer



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6168) Add source partition columns to rows in S3/GCS Sources

2023-05-03 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-6168:
---

Assignee: Timothy Brown

> Add source partition columns to rows in S3/GCS Sources
> --
>
> Key: HUDI-6168
> URL: https://issues.apache.org/jira/browse/HUDI-6168
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>
> If the files read from an S3 or GCS source have a hive style partitioning 
> themselves, we should be able to parse that out as a column to return in the 
> dataset that is then fed into the delta streamer



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-5532) Add a KeyGenerator to support a Keyless workflow

2023-01-11 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-5532:
---

Assignee: Timothy Brown

> Add a KeyGenerator to support a Keyless workflow
> 
>
> Key: HUDI-5532
> URL: https://issues.apache.org/jira/browse/HUDI-5532
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>
> If a Hudi user wants to be able to do append only inserts we should provide 
> the ability to auto configure keys for them so they don't need to set fields 
> for the record key



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5532) Add a KeyGenerator to support a Keyless workflow

2023-01-11 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-5532:
---

 Summary: Add a KeyGenerator to support a Keyless workflow
 Key: HUDI-5532
 URL: https://issues.apache.org/jira/browse/HUDI-5532
 Project: Apache Hudi
  Issue Type: New Feature
Reporter: Timothy Brown


If a Hudi user wants to be able to do append only inserts we should provide the 
ability to auto configure keys for them so they don't need to set fields for 
the record key



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5370) Properly close file handles for Metadata writer

2022-12-11 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-5370:
---

 Summary: Properly close file handles for Metadata writer
 Key: HUDI-5370
 URL: https://issues.apache.org/jira/browse/HUDI-5370
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Timothy Brown






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-5370) Properly close file handles for Metadata writer

2022-12-11 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-5370:
---

Assignee: sivabalan narayanan

> Properly close file handles for Metadata writer
> ---
>
> Key: HUDI-5370
> URL: https://issues.apache.org/jira/browse/HUDI-5370
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: sivabalan narayanan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-4904) Handle Recursive Proto Schemas in ProtoClassBasedSchemaProvider

2022-11-17 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown updated HUDI-4904:

Status: In Progress  (was: Open)

> Handle Recursive Proto Schemas in ProtoClassBasedSchemaProvider
> ---
>
> Key: HUDI-4904
> URL: https://issues.apache.org/jira/browse/HUDI-4904
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>  Labels: pull-request-available
>
> In proto we can have a schema that is recursive. We should limit the 
> "unraveling" of a schema to N levels and let the user specify that amount of 
> levels as a config. After hitting depth N in the recursion, we will create a 
> Record with a byte array and string. The remaining data for that branch of 
> the recursion will be written out as a proto byte array and we record the 
> descriptor string for context of what is in the byte array.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HUDI-4904) Handle Recursive Proto Schemas in ProtoClassBasedSchemaProvider

2022-11-17 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown resolved HUDI-4904.
-

> Handle Recursive Proto Schemas in ProtoClassBasedSchemaProvider
> ---
>
> Key: HUDI-4904
> URL: https://issues.apache.org/jira/browse/HUDI-4904
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>  Labels: pull-request-available
>
> In proto we can have a schema that is recursive. We should limit the 
> "unraveling" of a schema to N levels and let the user specify that amount of 
> levels as a config. After hitting depth N in the recursion, we will create a 
> Record with a byte array and string. The remaining data for that branch of 
> the recursion will be written out as a proto byte array and we record the 
> descriptor string for context of what is in the byte array.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HUDI-4905) Protobuf type handling improvements

2022-11-17 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown resolved HUDI-4905.
-

> Protobuf type handling improvements
> ---
>
> Key: HUDI-4905
> URL: https://issues.apache.org/jira/browse/HUDI-4905
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>  Labels: pull-request-available
>
> Two improvements have come out of discussions with others trying to use 
> protobuf and Hudi.
>  
>  # We can support uint64 as a decimal without losing precision and 
> representing the value in the lake as a positive value
>  # Proto Timestamps can be converted to long with LogicalType timestamp-micros
>  # Treat elements within a `oneof` as nullable



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-5198) add in minor perf wins in hudi-utilities and locking related tests

2022-11-11 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-5198:
---

Assignee: Timothy Brown

> add in minor perf wins in hudi-utilities and locking related tests
> --
>
> Key: HUDI-5198
> URL: https://issues.apache.org/jira/browse/HUDI-5198
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5198) add in minor perf wins in hudi-utilities and locking related tests

2022-11-11 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-5198:
---

 Summary: add in minor perf wins in hudi-utilities and locking 
related tests
 Key: HUDI-5198
 URL: https://issues.apache.org/jira/browse/HUDI-5198
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Timothy Brown






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-4926) Add documentation to the Hudi Site

2022-09-26 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-4926:
---

 Summary: Add documentation to the Hudi Site
 Key: HUDI-4926
 URL: https://issues.apache.org/jira/browse/HUDI-4926
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Timothy Brown






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-4905) Protobuf type handling improvements

2022-09-23 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown updated HUDI-4905:

Description: 
Two improvements have come out of discussions with others trying to use 
protobuf and Hudi.

 
 # We can support uint64 as a decimal without losing precision and representing 
the value in the lake as a positive value
 # Proto Timestamps can be converted to long with LogicalType timestamp-micros
 # Treat elements within a `oneof` as nullable

  was:
Two improvements have come out of discussions with others trying to use 
protobuf and Hudi.

 
 # We can support uint64 as a decimal without losing precision and representing 
the value in the lake as a positive value
 # Proto Timestamps can be converted to long with LogicalType timestamp-micros


> Protobuf type handling improvements
> ---
>
> Key: HUDI-4905
> URL: https://issues.apache.org/jira/browse/HUDI-4905
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>
> Two improvements have come out of discussions with others trying to use 
> protobuf and Hudi.
>  
>  # We can support uint64 as a decimal without losing precision and 
> representing the value in the lake as a positive value
>  # Proto Timestamps can be converted to long with LogicalType timestamp-micros
>  # Treat elements within a `oneof` as nullable



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-4905) Protobuf type handling improvements

2022-09-22 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown updated HUDI-4905:

Summary: Protobuf type handling improvements  (was: Proto type handling 
improvements)

> Protobuf type handling improvements
> ---
>
> Key: HUDI-4905
> URL: https://issues.apache.org/jira/browse/HUDI-4905
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>
> Two improvements have come out of discussions with others trying to use 
> protobuf and Hudi.
>  
>  # We can support uint64 as a decimal without losing precision and 
> representing the value in the lake as a positive value
>  # Proto Timestamps can be converted to long with LogicalType timestamp-micros



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-4905) Proto type handling improvements

2022-09-22 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-4905:
---

 Summary: Proto type handling improvements
 Key: HUDI-4905
 URL: https://issues.apache.org/jira/browse/HUDI-4905
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Timothy Brown


Two improvements have come out of discussions with others trying to use 
protobuf and Hudi.

 
 # We can support uint64 as a decimal without losing precision and representing 
the value in the lake as a positive value
 # Proto Timestamps can be converted to long with LogicalType timestamp-micros



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-4905) Proto type handling improvements

2022-09-22 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-4905:
---

Assignee: Timothy Brown

> Proto type handling improvements
> 
>
> Key: HUDI-4905
> URL: https://issues.apache.org/jira/browse/HUDI-4905
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>
> Two improvements have come out of discussions with others trying to use 
> protobuf and Hudi.
>  
>  # We can support uint64 as a decimal without losing precision and 
> representing the value in the lake as a positive value
>  # Proto Timestamps can be converted to long with LogicalType timestamp-micros



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-4904) Handle Recursive Proto Schemas

2022-09-22 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-4904:
---

Assignee: Timothy Brown

> Handle Recursive Proto Schemas
> --
>
> Key: HUDI-4904
> URL: https://issues.apache.org/jira/browse/HUDI-4904
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>
> In proto we can have a schema that is recursive. We should limit the 
> "unraveling" of a schema to N levels and let the user specify that amount of 
> levels as a config. After hitting depth N in the recursion, we will create a 
> Record with a byte array and string. The remaining data for that branch of 
> the recursion will be written out as a proto byte array and we record the 
> descriptor string for context of what is in the byte array.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-4904) Handle Recursive Proto Schemas in ProtoClassBasedSchemaProvider

2022-09-22 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown updated HUDI-4904:

Summary: Handle Recursive Proto Schemas in ProtoClassBasedSchemaProvider  
(was: Handle Recursive Proto Schemas)

> Handle Recursive Proto Schemas in ProtoClassBasedSchemaProvider
> ---
>
> Key: HUDI-4904
> URL: https://issues.apache.org/jira/browse/HUDI-4904
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>
> In proto we can have a schema that is recursive. We should limit the 
> "unraveling" of a schema to N levels and let the user specify that amount of 
> levels as a config. After hitting depth N in the recursion, we will create a 
> Record with a byte array and string. The remaining data for that branch of 
> the recursion will be written out as a proto byte array and we record the 
> descriptor string for context of what is in the byte array.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-4904) Handle Recursive Proto Schemas

2022-09-22 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-4904:
---

 Summary: Handle Recursive Proto Schemas
 Key: HUDI-4904
 URL: https://issues.apache.org/jira/browse/HUDI-4904
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Timothy Brown


In proto we can have a schema that is recursive. We should limit the 
"unraveling" of a schema to N levels and let the user specify that amount of 
levels as a config. After hitting depth N in the recursion, we will create a 
Record with a byte array and string. The remaining data for that branch of the 
recursion will be written out as a proto byte array and we record the 
descriptor string for context of what is in the byte array.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-4796) Properly release MetricsReporter resources

2022-09-06 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown updated HUDI-4796:

Status: In Progress  (was: Open)

> Properly release MetricsReporter resources
> --
>
> Key: HUDI-4796
> URL: https://issues.apache.org/jira/browse/HUDI-4796
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>  Labels: pull-request-available
>
> In 
> [Metrics.java|https://github.com/apache/hudi/blob/f5de4e434b33720d4846c6fe2450539a284ea14f/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metrics/Metrics.java#L63-L65]
>  we are calling the close method on a class instead of the Reporter's `stop` 
> method. The `stop` method according to the Java docs "Should be used to stop 
> channels, streams and release resources." 
> For most reporters these two actions are equivalent but the 
> [JmxReportServer|https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metrics/JmxReporterServer.java#L127]
>  has a more involved stop method that must be called. 
>  
> Relates to discussion 
> [here|https://github.com/apache/hudi/issues/5249#issuecomment-1235020970]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-4796) Properly release MetricsReporter resources

2022-09-06 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown updated HUDI-4796:

Status: Patch Available  (was: In Progress)

> Properly release MetricsReporter resources
> --
>
> Key: HUDI-4796
> URL: https://issues.apache.org/jira/browse/HUDI-4796
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>  Labels: pull-request-available
>
> In 
> [Metrics.java|https://github.com/apache/hudi/blob/f5de4e434b33720d4846c6fe2450539a284ea14f/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metrics/Metrics.java#L63-L65]
>  we are calling the close method on a class instead of the Reporter's `stop` 
> method. The `stop` method according to the Java docs "Should be used to stop 
> channels, streams and release resources." 
> For most reporters these two actions are equivalent but the 
> [JmxReportServer|https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metrics/JmxReporterServer.java#L127]
>  has a more involved stop method that must be called. 
>  
> Relates to discussion 
> [here|https://github.com/apache/hudi/issues/5249#issuecomment-1235020970]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-4796) Properly release MetricsReporter resources

2022-09-06 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-4796:
---

Assignee: Timothy Brown

> Properly release MetricsReporter resources
> --
>
> Key: HUDI-4796
> URL: https://issues.apache.org/jira/browse/HUDI-4796
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>
> In 
> [Metrics.java|https://github.com/apache/hudi/blob/f5de4e434b33720d4846c6fe2450539a284ea14f/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metrics/Metrics.java#L63-L65]
>  we are calling the close method on a class instead of the Reporter's `stop` 
> method. The `stop` method according to the Java docs "Should be used to stop 
> channels, streams and release resources." 
> For most reporters these two actions are equivalent but the 
> [JmxReportServer|https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metrics/JmxReporterServer.java#L127]
>  has a more involved stop method that must be called. 
>  
> Relates to discussion 
> [here|https://github.com/apache/hudi/issues/5249#issuecomment-1235020970]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-4796) Properly release MetricsReporter resources

2022-09-06 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-4796:
---

 Summary: Properly release MetricsReporter resources
 Key: HUDI-4796
 URL: https://issues.apache.org/jira/browse/HUDI-4796
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Timothy Brown


In 
[Metrics.java|https://github.com/apache/hudi/blob/f5de4e434b33720d4846c6fe2450539a284ea14f/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metrics/Metrics.java#L63-L65]
 we are calling the close method on a class instead of the Reporter's `stop` 
method. The `stop` method according to the Java docs "Should be used to stop 
channels, streams and release resources." 

For most reporters these two actions are equivalent but the 
[JmxReportServer|https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metrics/JmxReporterServer.java#L127]
 has a more involved stop method that must be called. 

 

Relates to discussion 
[here|https://github.com/apache/hudi/issues/5249#issuecomment-1235020970]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-4732) Leverage Schema Registry for reading proto messages from kafka

2022-08-28 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-4732:
---

 Summary: Leverage Schema Registry for reading proto messages from 
kafka
 Key: HUDI-4732
 URL: https://issues.apache.org/jira/browse/HUDI-4732
 Project: Apache Hudi
  Issue Type: New Feature
Reporter: Timothy Brown


If you use the Confluent Schema Registry, they provide a way to deserialize the 
kafka message value without providing the protobuf class name. The first cut of 
ProtoKafkaSource requires users to specify a classname but we want to allow 
users the flexibility to use this other method of deserializing the message.

 

Docs: 
https://docs.confluent.io/platform/current/schema-registry/serdes-develop/serdes-protobuf.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-4727) Direct conversion from Proto Message to Row

2022-08-26 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-4727:
---

 Summary: Direct conversion from Proto Message to Row
 Key: HUDI-4727
 URL: https://issues.apache.org/jira/browse/HUDI-4727
 Project: Apache Hudi
  Issue Type: New Feature
Reporter: Timothy Brown


The initial implementation for the Proto source converts from Message to Avro 
to Row in the SourceFormatAdapter when the source needs to be read as a 
Dataset. Let's remove the intermediate Avro representation and convert 
directly from Message to Row.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-4441) Disbale INFO level logs from tests

2022-07-24 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-4441:
---

Assignee: Timothy Brown

> Disbale INFO level logs from tests
> --
>
> Key: HUDI-4441
> URL: https://issues.apache.org/jira/browse/HUDI-4441
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Timothy Brown
>Priority: Major
>  Labels: pull-request-available
>
> Since the log4j1-2 bridge upgrade, we have noticed that CI runs are logging 
> INFO level logs despite the min level set as WARN in all 
> log4j-sure.properties. To reproduce the issue just run any test locally and 
> you should see INFO level logs. This creates unnecessary noise and painful to 
> debug failures. We need to fix this. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   >