[jira] [Updated] (HUDI-1896) [UMBRELLA] Implement DeltaStreamer Source for cloud object stores

2022-01-06 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran updated HUDI-1896:
--
Epic Name: Implement DeltaStreamer Source for cloud object stores

> [UMBRELLA] Implement DeltaStreamer Source for cloud object stores
> -
>
> Key: HUDI-1896
> URL: https://issues.apache.org/jira/browse/HUDI-1896
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: DeltaStreamer
>Reporter: Raymond Xu
>Assignee: Rajesh Mahindra
>Priority: Critical
>  Labels: hudi-umbrellas, pull-request-available
> Fix For: 1.0.0
>
>
> As discussed in HUDI-1723, we need a better implementation for Cloud object 
> storage like AWS S3 or GCS, leveraging on change notification.
> Also consider 
> [https://docs.databricks.com/spark/latest/structured-streaming/sqs.html]
>  
> We need to look into current *DFSSource classes and see if we can add a new 
> `DFSPathSelector` implementation, that fetech new files on cloud storage 
> after a given point in time. The timestamp based approach used by existing 
> path selector, largely works, but has corner cases as mentioned in HUDI-1723 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-1628) [Umbrella] Improve data locality during ingestion

2022-01-06 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran updated HUDI-1628:
--
  Epic Name: Improve data locality during ingestion
Description: 
Today the upsert partitioner does the file sizing/bin-packing etc for
inserts and then sends some inserts over to existing file groups to
maintain file size.
We can abstract all of this into strategies and some kind of pipeline
abstractions and have it also consider "affinity" to an existing file group
based
on say information stored in the metadata table?

See http://mail-archives.apache.org/mod_mbox/hudi-dev/202102.mbox/browser
 for more details

  was:

Today the upsert partitioner does the file sizing/bin-packing etc for
inserts and then sends some inserts over to existing file groups to
maintain file size.
We can abstract all of this into strategies and some kind of pipeline
abstractions and have it also consider "affinity" to an existing file group
based
on say information stored in the metadata table?

See http://mail-archives.apache.org/mod_mbox/hudi-dev/202102.mbox/browser
 for more details


> [Umbrella] Improve data locality during ingestion
> -
>
> Key: HUDI-1628
> URL: https://issues.apache.org/jira/browse/HUDI-1628
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: Writer Core
>Reporter: satish
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hudi-umbrellas
> Fix For: 0.11.0
>
>
> Today the upsert partitioner does the file sizing/bin-packing etc for
> inserts and then sends some inserts over to existing file groups to
> maintain file size.
> We can abstract all of this into strategies and some kind of pipeline
> abstractions and have it also consider "affinity" to an existing file group
> based
> on say information stored in the metadata table?
> See http://mail-archives.apache.org/mod_mbox/hudi-dev/202102.mbox/browser
>  for more details



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-1387) [UMBRELLA] Support Apache Calcite for writing/querying Hudi datasets

2022-01-06 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran updated HUDI-1387:
--
Epic Name: Support Apache Calcite for writing/querying Hudi datasets

> [UMBRELLA] Support Apache Calcite for writing/querying Hudi datasets
> 
>
> Key: HUDI-1387
> URL: https://issues.apache.org/jira/browse/HUDI-1387
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: Common Core, Writer Core
>Reporter: Raymond Xu
>Priority: Major
>  Labels: gsoc, gsoc2021, hudi-umbrellas, mentor
>
> (More details to be added)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-1390) [UMBRELLA] Support schema inference for unstructured data

2022-01-06 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran updated HUDI-1390:
--
Epic Name: Support schema inference for unstructured data

> [UMBRELLA] Support schema inference for unstructured data
> -
>
> Key: HUDI-1390
> URL: https://issues.apache.org/jira/browse/HUDI-1390
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: bootstrap
>Reporter: Raymond Xu
>Priority: Major
>  Labels: gsoc, gsoc2021, hudi-umbrellas, mentor
>
> (More details to be added)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-1385) [UMBRELLA] Improve source ingestion support in DeltaStreamer

2022-01-06 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran updated HUDI-1385:
--
Epic Name: Improve source ingestion support in DeltaStreamer

> [UMBRELLA] Improve source ingestion support in DeltaStreamer
> 
>
> Key: HUDI-1385
> URL: https://issues.apache.org/jira/browse/HUDI-1385
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: DeltaStreamer
>Reporter: Raymond Xu
>Assignee: Rajesh Mahindra
>Priority: Major
>  Labels: gsoc, gsoc2021, hudi-umbrellas, mentor
>
> (More details to be added)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-1250) [UMBRELLA] Test coverage

2022-01-06 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran updated HUDI-1250:
--
Epic Name: Test coverage

> [UMBRELLA] Test coverage
> 
>
> Key: HUDI-1250
> URL: https://issues.apache.org/jira/browse/HUDI-1250
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: Testing
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: hudi-umbrellas
>
> I found a handful of tickets related to adding more tests. Creating this 
> umbrella ticket to track all of them together. 
>  https://issues.apache.org/jira/browse/HUDI-987 : integration tests for MOR 
> table of decimal type
> https://issues.apache.org/jira/browse/HUDI-778 : adding code cov badge
> https://issues.apache.org/jira/browse/HUDI-699 : Add unit test for 
> CompactionCommand
> https://issues.apache.org/jira/browse/HUDI-693: Add unit test for hudi-cli 
> module
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-1249) [UMBRELLA] refactor tests for ease of development

2022-01-06 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran updated HUDI-1249:
--
Epic Name: refactor tests for ease of development

> [UMBRELLA] refactor tests for ease of development
> -
>
> Key: HUDI-1249
> URL: https://issues.apache.org/jira/browse/HUDI-1249
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: Testing
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: hudi-umbrellas
>
> Creating an umbrella ticket to track efforts to refactor test and test utils 
> for ease of development.
>  
> https://issues.apache.org/jira/browse/HUDI-996: shared spark session 
> provider. 
> https://issues.apache.org/jira/browse/HUDI-995 Organize test utils methods 
> and classes
> https://issues.apache.org/jira/browse/HUDI-994 : Identify functional tests 
> that are convertible to unit tests with mocks
> https://issues.apache.org/jira/browse/HUDI-736 : Simplify 
> ReflectionUtils#getTopLevelClasses
> https://issues.apache.org/jira/browse/HUDI-488 : Refactor Source classes in 
> hudi-utilities
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-1248) [UMBRELLA] Tests cleanup and fixes

2022-01-06 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran updated HUDI-1248:
--
Epic Name: Tests cleanup and fixes

> [UMBRELLA] Tests cleanup and fixes
> --
>
> Key: HUDI-1248
> URL: https://issues.apache.org/jira/browse/HUDI-1248
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: Testing
>Reporter: sivabalan narayanan
>Assignee: Raymond Xu
>Priority: Critical
>  Labels: hudi-umbrellas, pull-request-available
>
> There are quite few tickets that requires some fixes to tests. Creating this 
> umbrella ticket to track all efforts.
>  
> https://issues.apache.org/jira/browse/HUDI-1055 remove .parquet from tests.
>  https://issues.apache.org/jira/browse/HUDI-1033 ITTestRepairsCommand and 
> TestRepairsCommand
>  https://issues.apache.org/jira/browse/HUDI-1010 memory leak.
>  https://issues.apache.org/jira/browse/HUDI-997 memory leak
>  https://issues.apache.org/jira/browse/HUDI-664 : Adjust Logging levels to 
> reduce verbose log msgs in hudi-client
>  https://issues.apache.org/jira/browse/HUDI-623: Remove 
> UpgradePayloadFromUberToApache
>  https://issues.apache.org/jira/browse/HUDI-541: Replace variables/comments 
> named "data files" to "base file"
>  https://issues.apache.org/jira/browse/HUDI-347: Fix 
> TestHoodieClientOnCopyOnWriteStorage Tests with modular private methods
>  https://issues.apache.org/jira/browse/HUDI-323: Docker demo/integ-test 
> stdout/stderr output only available on process exit
>  https://issues.apache.org/jira/browse/HUDI-284: Need Tests for Hudi handling 
> of schema evolution
>  https://issues.apache.org/jira/browse/HUDI-154: Enable Rollback case in 
> HoodieRealtimeRecordReaderTest.testReader
> https://issues.apache.org/jira/browse/HUDI-1143 timestamp micros. 
> https://issues.apache.org/jira/browse/HUDI-1989: flaky tests in 
> TestHoodieMergeOnReadTable



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-1237) [UMBRELLA] Checkstyle, formatting, warnings, spotless

2022-01-06 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran updated HUDI-1237:
--
Epic Name: Checkstyle, formatting, warnings, spotless

> [UMBRELLA] Checkstyle, formatting, warnings, spotless
> -
>
> Key: HUDI-1237
> URL: https://issues.apache.org/jira/browse/HUDI-1237
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: Code Cleanup
>Reporter: sivabalan narayanan
>Assignee: leesf
>Priority: Major
>  Labels: gsoc, gsoc2021, hudi-umbrellas, mentor
>
> Umbrella ticket to track all tickets related to checkstyle, spotless, 
> warnings etc.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-1239) [UMBRELLA] Config clean up

2022-01-06 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran updated HUDI-1239:
--
Epic Name: Config clean up

> [UMBRELLA] Config clean up
> --
>
> Key: HUDI-1239
> URL: https://issues.apache.org/jira/browse/HUDI-1239
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: Code Cleanup
>Reporter: sivabalan narayanan
>Assignee: Vinoth Chandar
>Priority: Major
>  Labels: hudi-umbrellas
>
> Tracks all efforts to clean up configs.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-1238) [UMBRELLA] Perf test env

2022-01-06 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran updated HUDI-1238:
--
Epic Name: Perf test env

> [UMBRELLA] Perf test env
> 
>
> Key: HUDI-1238
> URL: https://issues.apache.org/jira/browse/HUDI-1238
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: Testing
>Reporter: sivabalan narayanan
>Assignee: Rajesh Mahindra
>Priority: Major
>  Labels: hudi-umbrellas
>
> We need to build a perf test environment which monitors metrics from a long 
> running test suite and displays via dashboards. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-1236) [UMBRELLA] Integ Test suite infra

2022-01-06 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran updated HUDI-1236:
--
Epic Name: Integ Test suite infra 

> [UMBRELLA] Integ Test suite infra 
> --
>
> Key: HUDI-1236
> URL: https://issues.apache.org/jira/browse/HUDI-1236
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: Testing
>Affects Versions: 0.9.0
>Reporter: sivabalan narayanan
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: hudi-umbrellas
>
> Long running test suite that checks for correctness across all deployment 
> modes (batch/streaming) and writers (deltastreamer/spark) and readers (hive, 
> presto, spark)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-868) [UMBRELLA] Insert Overwrite API

2022-01-06 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran updated HUDI-868:
-
Epic Name: Insert Overwrite API

> [UMBRELLA] Insert Overwrite API
> ---
>
> Key: HUDI-868
> URL: https://issues.apache.org/jira/browse/HUDI-868
> Project: Apache Hudi
>  Issue Type: Epic
>Affects Versions: 0.9.0
>Reporter: satish
>Assignee: satish
>Priority: Major
>  Labels: hudi-umbrellas
>
> Usecases:
> - Tables where the majority of records change every cycle. So it is likely 
> efficient to write new data instead of doing upserts.
> -  Operational tasks to fix a specific corrupted partition. We can do 'insert 
> overwrite'  on that partition with records from the source. This can be much 
> faster than restore and replay for some data sources.
> The functionality will be similar to hive definition of 'insert overwite'. 
> But, doing this in Hoodie will provide better isolation between writer and 
> readers. I can share possible implementation choices and some nuances if the 
> community thinks this is a useful feature to add. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-538) [UMBRELLA] Restructuring hudi client module for multi engine support

2022-01-06 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran updated HUDI-538:
-
Epic Name: Restructuring hudi client module for multi engine support

> [UMBRELLA] Restructuring hudi client module for multi engine support
> 
>
> Key: HUDI-538
> URL: https://issues.apache.org/jira/browse/HUDI-538
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: Code Cleanup
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: hudi-umbrellas
>
> Hudi is currently tightly coupled with the Spark framework. It caused the 
> integration with other computing engine more difficult. We plan to decouple 
> it with Spark. This umbrella issue used to track this work.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-270) [UMBRELLA] Improve Hudi website UI and documentation

2022-01-06 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran updated HUDI-270:
-
Epic Name: Improve Hudi website UI and documentation

> [UMBRELLA] Improve Hudi website UI and documentation
> 
>
> Key: HUDI-270
> URL: https://issues.apache.org/jira/browse/HUDI-270
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: Docs
>Reporter: Bhavani Sudha Saktheeswaran
>Assignee: Kyle Weller
>Priority: Minor
>  Labels: hudi-umbrellas, pull-request-available
>
> This is an umbrella task of multiple tasks that aim to improve the website



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-466) [Umbrella] Record level, global low-latency index implementation

2022-01-06 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran updated HUDI-466:
-
Epic Name: Record level, global low-latency index implementation

> [Umbrella] Record level, global low-latency index implementation
> 
>
> Key: HUDI-466
> URL: https://issues.apache.org/jira/browse/HUDI-466
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: Index
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: hudi-umbrellas
>
> Improve record indexing using record -> partitionpath, fileId look up. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-60) [UMBRELLA] Support Apache Beam for incremental tailing

2022-01-06 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-60?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran updated HUDI-60:

Epic Name: Support Apache Beam for incremental tailing

> [UMBRELLA] Support Apache Beam for incremental tailing
> --
>
> Key: HUDI-60
> URL: https://issues.apache.org/jira/browse/HUDI-60
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: Spark Integration, Utilities
>Reporter: Vinoth Chandar
>Priority: Major
>  Labels: gsoc, gsoc2021, hudi-umbrellas, mentor
>
> (More details to be added)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-57) [UMBRELLA] Support ORC Storage

2022-01-06 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-57?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran updated HUDI-57:

Epic Name: Support ORC Storage

> [UMBRELLA] Support ORC Storage
> --
>
> Key: HUDI-57
> URL: https://issues.apache.org/jira/browse/HUDI-57
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: Hive Integration, Writer Core
>Affects Versions: 0.9.0
>Reporter: Vinoth Chandar
>Assignee: Teresa Kang
>Priority: Major
>  Labels: hudi-umbrellas, pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [https://github.com/uber/hudi/issues/68]
> https://github.com/uber/hudi/issues/155



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-2235) [UMBRELLA] Keys support in Hudi

2022-01-06 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran updated HUDI-2235:
--
Epic Name: Keys support in Hudi

> [UMBRELLA] Keys support in Hudi
> ---
>
> Key: HUDI-2235
> URL: https://issues.apache.org/jira/browse/HUDI-2235
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: Alexey Kudinkin
>Priority: Blocker
>  Labels: hudi-umbrellas
> Fix For: 0.11.0
>
>
> *  - Add virtual key support to Hudi/meta fields should not be persisted and 
> existing columns should be leveraged. 
>  * Auto-generate a row_id equivalent (_hoodie_seq_no) based key and allow 
> updates/deletes from SQL to work based off that. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-2429) [UMBRELLA] Comprehensive Schema evolution in Hudi

2022-01-06 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran updated HUDI-2429:
--
Epic Name: Comprehensive Schema evolution in Hudi

> [UMBRELLA] Comprehensive Schema evolution in Hudi
> -
>
> Key: HUDI-2429
> URL: https://issues.apache.org/jira/browse/HUDI-2429
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: Common Core
>Reporter: tao meng
>Assignee: tao meng
>Priority: Major
>  Labels: hudi-umbrellas, pull-request-available
> Fix For: 0.11.0
>
>
> [https://cwiki.apache.org/confluence/display/HUDI/RFC+-+33++Hudi+supports+more+comprehensive+Schema+Evolution]
>  
> Support comprehensive schema evolution in Hudi
>  * rename cols
>  * drop cols
>  * reorder cols
>  * re-add cols



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-2261) [UMBRELLA] Dev Hygiene Issues

2022-01-06 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran updated HUDI-2261:
--
Epic Name: Dev Hygiene Issues

> [UMBRELLA] Dev Hygiene Issues
> -
>
> Key: HUDI-2261
> URL: https://issues.apache.org/jira/browse/HUDI-2261
> Project: Apache Hudi
>  Issue Type: Epic
>Reporter: Sagar Sumit
>Priority: Major
>  Labels: Umbrella
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-2531) [UMBRELLA] Support Dataset APIs in writer paths

2022-01-06 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran updated HUDI-2531:
--
Epic Name: Support Dataset APIs in writer paths

> [UMBRELLA] Support Dataset APIs in writer paths
> ---
>
> Key: HUDI-2531
> URL: https://issues.apache.org/jira/browse/HUDI-2531
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: Spark Integration
>Reporter: Raymond Xu
>Priority: Major
>  Labels: hudi-umbrellas
>
> To make use of Dataset APIs in writer paths instead of RDD.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-2668) [UMBRELLA] upgrade libraries or dependency versions

2022-01-06 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran updated HUDI-2668:
--
Epic Name: upgrade libraries or dependency versions

> [UMBRELLA] upgrade libraries or dependency versions
> ---
>
> Key: HUDI-2668
> URL: https://issues.apache.org/jira/browse/HUDI-2668
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: Code Cleanup
>Reporter: sivabalan narayanan
>Assignee: Alexey Kudinkin
>Priority: Major
>  Labels: hudi-umbrellas
> Fix For: 0.11.0
>
>
> Creating an umbrella ticket to track any upgrade asks and tickets 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-2832) [Umbrella] [RFC-40] Implement SnowflakeSyncTool to support Hudi to Snowflake Integration

2022-01-06 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran updated HUDI-2832:
--
Epic Name: [RFC-40] Implement SnowflakeSyncTool to support Hudi to 
Snowflake Integration

> [Umbrella] [RFC-40] Implement SnowflakeSyncTool to support Hudi to Snowflake 
> Integration
> 
>
> Key: HUDI-2832
> URL: https://issues.apache.org/jira/browse/HUDI-2832
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: Common Core
>Reporter: Vinoth Govindarajan
>Assignee: Vinoth Govindarajan
>Priority: Major
>  Labels: BigQuery, Integration, pull-request-available
> Fix For: 0.11.0
>
>
> Snowflake is a fully managed service that’s simple to use but can power a 
> near-unlimited number of concurrent workloads. Snowflake is a solution for 
> data warehousing, data lakes, data engineering, data science, data 
> application development, and securely sharing and consuming shared data. 
> Snowflake [doesn’t 
> support|https://docs.snowflake.com/en/sql-reference/sql/alter-file-format.html]
>  Apache Hudi file format yet, but it has support for Parquet, ORC, and Delta 
> file format. This proposal is to implement a SnowflakeSync similar to 
> HiveSync to sync the Hudi table as the Snowflake External Parquet table so 
> that users can query the Hudi tables using Snowflake. Many users have 
> expressed interest in Hudi and other support channels asking to integrate 
> Hudi with Snowflake, this will unlock new use cases for Hudi.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3000) [UMBRELLA] Consistent Hashing Index

2022-01-06 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran updated HUDI-3000:
--
Epic Name: Consistent Hashing Index  (was: Consistent Hashing Index Epic)

> [UMBRELLA] Consistent Hashing Index
> ---
>
> Key: HUDI-3000
> URL: https://issues.apache.org/jira/browse/HUDI-3000
> Project: Apache Hudi
>  Issue Type: Epic
>Reporter: Yuwei Xiao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3000) [UMBRELLA] Consistent Hashing Index

2022-01-06 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran updated HUDI-3000:
--
Epic Name: Consistent Hashing Index Epic

> [UMBRELLA] Consistent Hashing Index
> ---
>
> Key: HUDI-3000
> URL: https://issues.apache.org/jira/browse/HUDI-3000
> Project: Apache Hudi
>  Issue Type: Epic
>Reporter: Yuwei Xiao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3000) [UMBRELLA] Consistent Hashing Index

2022-01-06 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran updated HUDI-3000:
--
Labels:   (was: hudi-epic-migration)

> [UMBRELLA] Consistent Hashing Index
> ---
>
> Key: HUDI-3000
> URL: https://issues.apache.org/jira/browse/HUDI-3000
> Project: Apache Hudi
>  Issue Type: Epic
>Reporter: Yuwei Xiao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3000) [UMBRELLA] Consistent Hashing Index

2022-01-06 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran updated HUDI-3000:
--
Labels: hudi-epic-migration  (was: )

> [UMBRELLA] Consistent Hashing Index
> ---
>
> Key: HUDI-3000
> URL: https://issues.apache.org/jira/browse/HUDI-3000
> Project: Apache Hudi
>  Issue Type: Epic
>Reporter: Yuwei Xiao
>Priority: Major
>  Labels: hudi-epic-migration
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-270) Improve Hudi website UI and documentation

2019-11-15 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran updated HUDI-270:
-
Summary: Improve Hudi website UI and documentation  (was: Improve Hudi 
website documentation)

> Improve Hudi website UI and documentation
> -
>
> Key: HUDI-270
> URL: https://issues.apache.org/jira/browse/HUDI-270
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Docs
>Reporter: Bhavani Sudha Saktheeswaran
>Assignee: Bhavani Sudha Saktheeswaran
>Priority: Minor
>
> This is an umbrella task of multiple tasks that aim to improve the website



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-245) Refactor code references that call HoodieTimeline.getInstants() and reverse to directly use method HoodieTimeline.getReverseOrderedInstants

2019-11-06 Thread Bhavani Sudha Saktheeswaran (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968328#comment-16968328
 ] 

Bhavani Sudha Saktheeswaran commented on HUDI-245:
--

Absolutely. All yours now :)

> Refactor code references that call HoodieTimeline.getInstants() and reverse 
> to directly use method HoodieTimeline.getReverseOrderedInstants 
> 
>
> Key: HUDI-245
> URL: https://issues.apache.org/jira/browse/HUDI-245
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: newbie
>Reporter: Bhavani Sudha Saktheeswaran
>Assignee: Pratyaksh Sharma
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-245) Refactor code references that call HoodieTimeline.getInstants() and reverse to directly use method HoodieTimeline.getReverseOrderedInstants

2019-11-06 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran reassigned HUDI-245:


Assignee: Pratyaksh Sharma  (was: Bhavani Sudha Saktheeswaran)

> Refactor code references that call HoodieTimeline.getInstants() and reverse 
> to directly use method HoodieTimeline.getReverseOrderedInstants 
> 
>
> Key: HUDI-245
> URL: https://issues.apache.org/jira/browse/HUDI-245
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: newbie
>Reporter: Bhavani Sudha Saktheeswaran
>Assignee: Pratyaksh Sharma
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-317) Edit Quickstart page to point to Hudi maven artifact in spark shell command

2019-10-29 Thread Bhavani Sudha Saktheeswaran (Jira)
Bhavani Sudha Saktheeswaran created HUDI-317:


 Summary: Edit Quickstart page to point to Hudi maven artifact in 
spark shell command
 Key: HUDI-317
 URL: https://issues.apache.org/jira/browse/HUDI-317
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
  Components: Docs
Reporter: Bhavani Sudha Saktheeswaran
Assignee: Bhavani Sudha Saktheeswaran


Modify spark-shell command to use --packages instead of --jars. This will help 
users at Quickstart page to directly access built jars from maven in 
spark-shell command instead of having to freshly build hudi from source and 
point to that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-305) Presto MOR "_rt" queries only reads base parquet file

2019-10-29 Thread Bhavani Sudha Saktheeswaran (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961983#comment-16961983
 ] 

Bhavani Sudha Saktheeswaran commented on HUDI-305:
--

Thanks [~bdscheller]  for the detailed context. I am starting to look into 
this. I believe it should be possible to send patches to Presto opensource.  I 
ll take a look at the file split and record reader changes and get back to you. 
Meanwhile, let me know if you think of any other ideas.

> Presto MOR "_rt" queries only reads base parquet file 
> --
>
> Key: HUDI-305
> URL: https://issues.apache.org/jira/browse/HUDI-305
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Presto Integration
> Environment: On AWS EMR
>Reporter: Brandon Scheller
>Assignee: Bhavani Sudha Saktheeswaran
>Priority: Major
> Fix For: 0.5.1
>
>
> Code example to reproduce.
> {code:java}
> import org.apache.hudi.DataSourceWriteOptions
> import org.apache.hudi.config.HoodieWriteConfig
> import org.apache.spark.sql.SaveMode
> val df = Seq(
>   ("100", "event_name_900", "2015-01-01T13:51:39.340396Z", "type1"),
>   ("101", "event_name_546", "2015-01-01T12:14:58.597216Z", "type2"),
>   ("104", "event_name_123", "2015-01-01T12:15:00.512679Z", "type1"),
>   ("105", "event_name_678", "2015-01-01T13:51:42.248818Z", "type2")
>   ).toDF("event_id", "event_name", "event_ts", "event_type")
> var tableName = "hudi_events_mor_1"
> var tablePath = "s3://emr-users/wenningd/hudi/tables/events/" + tableName
> // write hudi dataset
> df.write.format("org.apache.hudi")
>   .option(HoodieWriteConfig.TABLE_NAME, tableName)
>   .option(DataSourceWriteOptions.OPERATION_OPT_KEY, 
> DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL)
>   .option(DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY, 
> DataSourceWriteOptions.MOR_STORAGE_TYPE_OPT_VAL)
>   .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "event_id")
>   .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "event_type") 
>   .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "event_ts")
>   .option(DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY, "true")
>   .option(DataSourceWriteOptions.HIVE_TABLE_OPT_KEY, tableName)
>   .option(DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY, "event_type")
>   .option(DataSourceWriteOptions.HIVE_ASSUME_DATE_PARTITION_OPT_KEY, "false")
>   .option(DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY, 
> "org.apache.hudi.hive.MultiPartKeysValueExtractor")
>   .mode(SaveMode.Overwrite)
>   .save(tablePath)
> // update a record with event_name "event_name_123" => "event_name_changed"
> val df1 = spark.read.format("org.apache.hudi").load(tablePath + "/*/*")
> val df2 = df1.filter($"event_id" === "104")
> val df3 = df2.withColumn("event_name", lit("event_name_changed"))
> // update hudi dataset
> df3.write.format("org.apache.hudi")
>.option(HoodieWriteConfig.TABLE_NAME, tableName)
>.option(DataSourceWriteOptions.OPERATION_OPT_KEY, 
> DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL)
>.option(DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY, 
> DataSourceWriteOptions.MOR_STORAGE_TYPE_OPT_VAL)
>.option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "event_id")
>.option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "event_type") 
>.option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "event_ts")
>.option("hoodie.compact.inline", "false")
>.option(DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY, "true")
>.option(DataSourceWriteOptions.HIVE_TABLE_OPT_KEY, tableName)
>.option(DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY, "event_type")
>.option(DataSourceWriteOptions.HIVE_ASSUME_DATE_PARTITION_OPT_KEY, "false")
>.option(DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY, 
> "org.apache.hudi.hive.MultiPartKeysValueExtractor")
>.mode(SaveMode.Append)
>.save(tablePath)
> {code}
> Now when querying the real-time table from Hive, we have no issue seeing the 
> updated value:
> {code:java}
> hive> select event_name from hudi_events_mor_1_rt;
> OK
> event_name_900
> event_name_changed
> event_name_546
> event_name_678
> Time taken: 0.103 seconds, Fetched: 4 row(s)
> {code}
> But when querying the real-time table from Presto, we only read the base 
> parquet file and do not see the update that should be merged in from the log 
> file.
> {code:java}
> presto:default> select event_name from hudi_events_mor_1_rt;
>event_name
> 
>  event_name_900
>  event_name_123
>  event_name_546
>  event_name_678
> (4 rows)
> {code}
> Our current understanding of this issue is that while the 
> HoodieParquetRealtimeInputFormat correctly generates the splits. The 
> RealtimeCompactedRecordReader record reader is not used so it is not 

[jira] [Assigned] (HUDI-15) Add a delete() API to HoodieWriteClient as well as Spark datasource #531

2019-10-28 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-15?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran reassigned HUDI-15:
---

Assignee: sivabalan narayanan  (was: Bhavani Sudha Saktheeswaran)

> Add a delete() API to HoodieWriteClient as well as Spark datasource #531
> 
>
> Key: HUDI-15
> URL: https://issues.apache.org/jira/browse/HUDI-15
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Spark datasource, Write Client
>Reporter: Vinoth Chandar
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.5.1
>
>
> Delete API needs to be supported as first class citizen via DeltaStreamer, 
> WriteClient and datasources. Currently there are two ways to delete, soft 
> deletes and hard deletes - https://hudi.apache.org/writing_data.html#deletes. 
> We need to ensure for hard deletes, we are able to leverage 
> EmptyHoodieRecordPayload with just the HoodieKey and empty record value for 
> deleting.
> [https://github.com/uber/hudi/issues/531]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-15) Add a delete() API to HoodieWriteClient as well as Spark datasource #531

2019-10-28 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-15?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran updated HUDI-15:

Description: 
Delete API needs to be supported as first class citizen via DeltaStreamer, 
WriteClient and datasources. Currently there are two ways to delete, soft 
deletes and hard deletes - https://hudi.apache.org/writing_data.html#deletes. 
We need to ensure for hard deletes, we are able to leverage 
EmptyHoodieRecordPayload with just the HoodieKey and empty record value for 
deleting.

[https://github.com/uber/hudi/issues/531]

  was:https://github.com/uber/hudi/issues/531


> Add a delete() API to HoodieWriteClient as well as Spark datasource #531
> 
>
> Key: HUDI-15
> URL: https://issues.apache.org/jira/browse/HUDI-15
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Spark datasource, Write Client
>Reporter: Vinoth Chandar
>Assignee: Bhavani Sudha Saktheeswaran
>Priority: Major
> Fix For: 0.5.1
>
>
> Delete API needs to be supported as first class citizen via DeltaStreamer, 
> WriteClient and datasources. Currently there are two ways to delete, soft 
> deletes and hard deletes - https://hudi.apache.org/writing_data.html#deletes. 
> We need to ensure for hard deletes, we are able to leverage 
> EmptyHoodieRecordPayload with just the HoodieKey and empty record value for 
> deleting.
> [https://github.com/uber/hudi/issues/531]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-25) Faster Incremental queries on Hoodie #492

2019-10-20 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-25?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran updated HUDI-25:

Fix Version/s: 0.5.1

> Faster Incremental queries on Hoodie #492
> -
>
> Key: HUDI-25
> URL: https://issues.apache.org/jira/browse/HUDI-25
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Hive Integration
>Reporter: Vinoth Chandar
>Assignee: Bhavani Sudha Saktheeswaran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive Incremental queries on Hoodie currently suffer a limitation of listing 
> all partitions when a datestr is not present (lists .hoodie and the 
> partitions) and end up throwing away a lot of the files (since 
> `__hoodie__commit_time` column values filters out those files) . This can be 
> very expensive and can impact query planning time and sometime causes 
> timeouts as well if the table is large. The original issue is tracked here - 
> [https://github.com/uber/hudi/issues/492]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-297) [Presto] Reuse table metadata listing and reduce namenode RPCs for Presto RO View queries

2019-10-07 Thread Bhavani Sudha Saktheeswaran (Jira)
Bhavani Sudha Saktheeswaran created HUDI-297:


 Summary: [Presto] Reuse table metadata listing and reduce namenode 
RPCs for Presto RO View queries
 Key: HUDI-297
 URL: https://issues.apache.org/jira/browse/HUDI-297
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: Presto Integration
Reporter: Bhavani Sudha Saktheeswaran
Assignee: Bhavani Sudha Saktheeswaran


This is described in detail in this Presto issue - 
[https://github.com/prestodb/presto/issues/13511]

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-291) Simplify quickstart

2019-10-03 Thread Bhavani Sudha Saktheeswaran (Jira)
Bhavani Sudha Saktheeswaran created HUDI-291:


 Summary: Simplify quickstart
 Key: HUDI-291
 URL: https://issues.apache.org/jira/browse/HUDI-291
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: Docs, docs-chinese, Usability
Reporter: Bhavani Sudha Saktheeswaran
Assignee: Bhavani Sudha Saktheeswaran


Make quickstart really simple by only using spark examples and default configs 
for easier playing around with Hudi APIs. The intent is to introduce what Hudi 
offers to end users as quickly as possible, without having to deal with setting 
up Hive or other external systems. 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-262) Update Hudi website to reflect change in InputFormat Class name

2019-09-28 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran resolved HUDI-262.
--
Resolution: Fixed

> Update Hudi website to reflect change in InputFormat Class name
> ---
>
> Key: HUDI-262
> URL: https://issues.apache.org/jira/browse/HUDI-262
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: asf-migration
>Reporter: Balaji Varadarajan
>Assignee: Bhavani Sudha Saktheeswaran
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-272) Move Prestodb Hudi integration to org.apache.hudi 's inputformat

2019-09-24 Thread Bhavani Sudha Saktheeswaran (Jira)
Bhavani Sudha Saktheeswaran created HUDI-272:


 Summary: Move Prestodb Hudi integration to org.apache.hudi 's 
inputformat
 Key: HUDI-272
 URL: https://issues.apache.org/jira/browse/HUDI-272
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: Presto Integration
Reporter: Bhavani Sudha Saktheeswaran
Assignee: Bhavani Sudha Saktheeswaran






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-271) Simplify quickstart documentation

2019-09-24 Thread Bhavani Sudha Saktheeswaran (Jira)
Bhavani Sudha Saktheeswaran created HUDI-271:


 Summary: Simplify quickstart documentation
 Key: HUDI-271
 URL: https://issues.apache.org/jira/browse/HUDI-271
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
  Components: Docs
Reporter: Bhavani Sudha Saktheeswaran
Assignee: Bhavani Sudha Saktheeswaran


Make quickstart really simple by only using spark examples and default configs 
for easier playing around with Hudi APIs. The intent is to introduce what Hudi 
offers to end users as quickly as possible, without having to deal with setting 
up Hive or other external systems. 

 

Help to set up Hive sync/ Hive metastore etc will be moved to other pages for 
users who want to explore more on how to set these up after initially playing 
around with Hudi api s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-219) Tabify hudi docker demo page

2019-09-24 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran updated HUDI-219:
-
Parent: HUDI-270
Issue Type: Sub-task  (was: Improvement)

> Tabify hudi docker demo page
> 
>
> Key: HUDI-219
> URL: https://issues.apache.org/jira/browse/HUDI-219
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Vinoth Chandar
>Assignee: Bhavani Sudha Saktheeswaran
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-270) Improve Hudi website documentation

2019-09-24 Thread Bhavani Sudha Saktheeswaran (Jira)
Bhavani Sudha Saktheeswaran created HUDI-270:


 Summary: Improve Hudi website documentation
 Key: HUDI-270
 URL: https://issues.apache.org/jira/browse/HUDI-270
 Project: Apache Hudi (incubating)
  Issue Type: Task
  Components: Docs
Reporter: Bhavani Sudha Saktheeswaran
Assignee: Bhavani Sudha Saktheeswaran


This is an umbrella task of multiple tasks that aim to improve the website



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-264) Ensure website for both english and chinese is rendered properly with styling scripts and images

2019-09-18 Thread Bhavani Sudha Saktheeswaran (Jira)
Bhavani Sudha Saktheeswaran created HUDI-264:


 Summary: Ensure website for both english and chinese is rendered 
properly with styling scripts and images
 Key: HUDI-264
 URL: https://issues.apache.org/jira/browse/HUDI-264
 Project: Apache Hudi (incubating)
  Issue Type: Bug
  Components: Docs, docs-chinese, Usability
Reporter: Bhavani Sudha Saktheeswaran
Assignee: vinoyang


Currently, We are seeing some issue with the path to styling scripts behaving 
differently for English and Chinese site. This is mostly folder structures for 
multi language setup and shared header, footer and topnav scripts. 

 

This can be seen by opening the generated content locally eg: 
content/admin_guide.html or content/cn/admin_guide/html from browser.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-245) Refactor code references that call HoodieTimeline.getInstants() and reverse to directly use method HoodieTimeline.getReverseOrderedInstants

2019-09-11 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran reassigned HUDI-245:


Assignee: Bhavani Sudha Saktheeswaran

> Refactor code references that call HoodieTimeline.getInstants() and reverse 
> to directly use method HoodieTimeline.getReverseOrderedInstants 
> 
>
> Key: HUDI-245
> URL: https://issues.apache.org/jira/browse/HUDI-245
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: newbie
>Reporter: Bhavani Sudha Saktheeswaran
>Assignee: Bhavani Sudha Saktheeswaran
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (HUDI-245) Refactor code references that call HoodieTimeline.getInstants() and reverse to directly use method HoodieTimeline.getReverseOrderedInstants

2019-09-11 Thread Bhavani Sudha Saktheeswaran (Jira)
Bhavani Sudha Saktheeswaran created HUDI-245:


 Summary: Refactor code references that call 
HoodieTimeline.getInstants() and reverse to directly use method 
HoodieTimeline.getReverseOrderedInstants 
 Key: HUDI-245
 URL: https://issues.apache.org/jira/browse/HUDI-245
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: newbie
Reporter: Bhavani Sudha Saktheeswaran






--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HUDI-245) Refactor code references that call HoodieTimeline.getInstants() and reverse to directly use method HoodieTimeline.getReverseOrderedInstants

2019-09-11 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran updated HUDI-245:
-
Priority: Minor  (was: Major)

> Refactor code references that call HoodieTimeline.getInstants() and reverse 
> to directly use method HoodieTimeline.getReverseOrderedInstants 
> 
>
> Key: HUDI-245
> URL: https://issues.apache.org/jira/browse/HUDI-245
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: newbie
>Reporter: Bhavani Sudha Saktheeswaran
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HUDI-143) GCS: Jackson Databind Issue seen in query side - Presto/Hive

2019-08-31 Thread Bhavani Sudha Saktheeswaran (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920113#comment-16920113
 ] 

Bhavani Sudha Saktheeswaran commented on HUDI-143:
--

[~vbalaji] seems like this issue is related to - 
[https://github.com/apache/incubator-hudi/pull/818] 

> GCS: Jackson Databind Issue seen in query side - Presto/Hive
> 
>
> Key: HUDI-143
> URL: https://issues.apache.org/jira/browse/HUDI-143
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: hackathon, Hive Integration, Presto Integration
>Reporter: BALAJI VARADARAJAN
>Assignee: Bhavani Sudha Saktheeswaran
>Priority: Major
>  Labels: gcs-parity
>
> “””
> com.fasterxml.jackson.databind.ObjectMapper.setDefaultPropertyInclusion(Lcom/fasterxml/jackson/annotation/JsonInclude$Value;)Lcom/fasterxml/jackson/databind/ObjectMapper;
> java.lang.NoSuchMethodError: 
> com.fasterxml.jackson.databind.ObjectMapper.setDefaultPropertyInclusion(Lcom/fasterxml/jackson/annotation/JsonInclude$Value;)Lcom/fasterxml/jackson/databind/ObjectMapper;
> “””
> _Status_ : Fixed by  adding exclusion filter to exclude 
> META-INF/services/javax.*



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HUDI-164) Incorrect averageBytesPerRecord Causes OOM

2019-08-30 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran updated HUDI-164:
-
Status: Patch Available  (was: In Progress)

> Incorrect averageBytesPerRecord Causes OOM
> --
>
> Key: HUDI-164
> URL: https://issues.apache.org/jira/browse/HUDI-164
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Write Client
>Reporter: Vinoth Chandar
>Assignee: Bhavani Sudha Saktheeswaran
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/incubator-hudi/issues/776] 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (HUDI-92) Include custom names for spark HUDI spark DAG stages for easier understanding

2019-08-27 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-92?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran reassigned HUDI-92:
---

Assignee: Bhavani Sudha Saktheeswaran  (was: Vinoth Chandar)

> Include custom names for spark HUDI spark DAG stages for easier understanding
> -
>
> Key: HUDI-92
> URL: https://issues.apache.org/jira/browse/HUDI-92
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Usability
>Reporter: Nishith Agarwal
>Assignee: Bhavani Sudha Saktheeswaran
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (HUDI-82) Add Presto 0.217 docker to quickstart and pre integration tests

2019-08-23 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-82?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran resolved HUDI-82.
-
Resolution: Fixed

> Add Presto 0.217 docker to quickstart and pre integration tests
> ---
>
> Key: HUDI-82
> URL: https://issues.apache.org/jira/browse/HUDI-82
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Docs, Presto Integration, Testing
>Reporter: Bhavani Sudha Saktheeswaran
>Assignee: Bhavani Sudha Saktheeswaran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Add Presto docker containers to the demo and pre integration tests alongside 
> existing Hive and Spark dockers. Update quickstart documentation for sample 
> end-end querying of Hudi tables via Presto similiar to Hive and Spark 
> examples already described.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)