[jira] [Commented] (HUDI-258) Hive Query engine not supporting join queries between RT and RO tables

2020-05-25 Thread Nishith Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17116433#comment-17116433
 ] 

Nishith Agarwal commented on HUDI-258:
--

[~shivnarayan] I didn't get a chance to look into this after my last comment, 
let me try this on the docker cluster and see if this issue still persists, 
I'll update here soon.

> Hive Query engine not supporting join queries between RT and RO tables
> --
>
> Key: HUDI-258
> URL: https://issues.apache.org/jira/browse/HUDI-258
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: Balaji Varadarajan
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: bug-bash-0.6.0, help-requested
>
> Description : 
> [https://github.com/apache/incubator-hudi/issues/789#issuecomment-512740619]
>  
> Root Cause: Hive is tracking getSplits calls by dataset basePath and does not 
> take INputFormatClass into account. Hence getSplits() is called only once. In 
> the case of RO and RT tables, they both have same dataset base-path but 
> differ in the InputFormatClass. Due to this, Hive join query is returning 
> weird results.
>  
> =
> The result of the demo is very strange
> (Step 6(a))
>  
> {{ select `_hoodie_commit_time`, symbol, ts, volume, open, close  from 
> stock_ticks_mor_rt where  symbol = 'GOOG';
>  select `_hoodie_commit_time`, symbol, ts, volume, open, close  from 
> stock_ticks_mor where  symbol = 'GOOG';}}
> return as demo
> BUT!
>  
> {{select a.key,a.ts, b.ts from stock_ticks_mor a join stock_ticks_mor_rt b  
> on a.key=b.key where a.ts != b.ts
> ...
> ++---+---+--+
> | a.key  | a.ts  | b.ts  |
> ++---+---+--+
> ++---+---+--+}}
>  
> {{0: jdbc:hive2://hiveserver:1> select a.key,a.ts,b.ts from 
> stock_ticks_mor_rt a join stock_ticks_mor b on a.key = b.key where a.key= 
> 'GOOG_2018-08-31 10';
> WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
> future versions. Consider using a different execution engine (i.e. spark, 
> tez) or using Hive 1.X releases.
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/hadoop-2.8.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Execution log at: 
> /tmp/root/root_20190718091316_ec40e8f2-be17-4450-bb75-8db9f4390041.log
> 2019-07-18 09:13:20 Starting to launch local task to process map join;  
> maximum memory = 477626368
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> 2019-07-18 09:13:21 Dump the side-table for tag: 0 with group count: 1 into 
> file: 
> file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-16_658_8306103829282410332-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile50--.hashtable
> 2019-07-18 09:13:21 Uploaded 1 File to: 
> file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-16_658_8306103829282410332-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile50--.hashtable
>  (317 bytes)
> 2019-07-18 09:13:21 End of local task; Time Taken: 1.688 sec.
> +-+--+--+--+
> |a.key| a.ts | b.ts |
> +-+--+--+--+
> | GOOG_2018-08-31 10  | 2018-08-31 10:29:00  | 2018-08-31 10:29:00  |
> +-+--+--+--+
> 1 row selected (7.207 seconds)
> 0: jdbc:hive2://hiveserver:1> select a.key,a.ts,b.ts from stock_ticks_mor 
> a join stock_ticks_mor_rt b on a.key = b.key where a.key= 'GOOG_2018-08-31 
> 10';
> WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
> future versions. Consider using a different execution engine (i.e. spark, 
> tez) or using Hive 1.X releases.
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/hadoop-2.8.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apach

Build failed in Jenkins: hudi-snapshot-deployment-0.5 #289

2020-05-25 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2.34 KB...]
/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.6.0-SNAPSHOT'
[INFO] Scanning for projects...
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-spark_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-timeline-service:jar:0.6.0-SNAPSHOT
[WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but found 
duplicate declaration of plugin org.jacoco:jacoco-maven-plugin @ 
org.apache.hudi:hudi-timeline-service:[unknown-version], 

 line 58, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-utilities_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-utilities_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark-bundle_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
or

[jira] [Updated] (HUDI-804) Add Azure Support to Hudi Doc

2020-05-25 Thread Yanjia Gary Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanjia Gary Li updated HUDI-804:

Status: In Progress  (was: Open)

> Add Azure Support to Hudi Doc
> -
>
> Key: HUDI-804
> URL: https://issues.apache.org/jira/browse/HUDI-804
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Yanjia Gary Li
>Assignee: Yanjia Gary Li
>Priority: Minor
>  Labels: bug-bash-0.6.0
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-804) Add Azure Support to Hudi Doc

2020-05-25 Thread Yanjia Gary Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanjia Gary Li updated HUDI-804:

Status: Open  (was: New)

> Add Azure Support to Hudi Doc
> -
>
> Key: HUDI-804
> URL: https://issues.apache.org/jira/browse/HUDI-804
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Yanjia Gary Li
>Assignee: Yanjia Gary Li
>Priority: Minor
>  Labels: bug-bash-0.6.0
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-946) Metadata Bootstrap Query Testing Master TIcket

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan closed HUDI-946.
---
Resolution: Duplicate

> Metadata Bootstrap Query Testing  Master TIcket
> ---
>
> Key: HUDI-946
> URL: https://issues.apache.org/jira/browse/HUDI-946
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Hive Integration, Presto Integration, Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0
>
>
>  
> Query Pattern used for testing : 
> [https://docs.google.com/spreadsheets/d/1xVfatk-6-fekwuCCZ-nTHQkewcHSEk89y-ReVV5vHQU/edit?usp=sharing]
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-953) Test COW : Spark Data Source Read Optimized Queries

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-953:

Status: Open  (was: New)

> Test COW : Spark Data Source Read Optimized Queries
> ---
>
> Key: HUDI-953
> URL: https://issues.apache.org/jira/browse/HUDI-953
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: Udit Mehrotra
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-953) Test COW : Spark Data Source Read Optimized Queries

2020-05-25 Thread Balaji Varadarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17116272#comment-17116272
 ] 

Balaji Varadarajan commented on HUDI-953:
-

cc [~wenningd] who is also testing Spark DataSource

> Test COW : Spark Data Source Read Optimized Queries
> ---
>
> Key: HUDI-953
> URL: https://issues.apache.org/jira/browse/HUDI-953
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: Udit Mehrotra
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-946) Metadata Bootstrap Query Testing Master TIcket

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan reassigned HUDI-946:
---

Assignee: Balaji Varadarajan

> Metadata Bootstrap Query Testing  Master TIcket
> ---
>
> Key: HUDI-946
> URL: https://issues.apache.org/jira/browse/HUDI-946
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Hive Integration, Presto Integration, Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0
>
>
>  
> Query Pattern used for testing : 
> [https://docs.google.com/spreadsheets/d/1xVfatk-6-fekwuCCZ-nTHQkewcHSEk89y-ReVV5vHQU/edit?usp=sharing]
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-950) Test COW : Spark SQL Read Optimized Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17116271#comment-17116271
 ] 

Balaji Varadarajan commented on HUDI-950:
-

[~vbalaji] completed testing Spark SQL COW queries with metadata bootstrap. 
Waiting for [~wenningd] to finish up testing before closing this ticket.

> Test COW : Spark SQL Read Optimized Query with metadata bootstrap
> -
>
> Key: HUDI-950
> URL: https://issues.apache.org/jira/browse/HUDI-950
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: Wenning Ding
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-949) Test MOR : Hive Realtime Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17116270#comment-17116270
 ] 

Balaji Varadarajan commented on HUDI-949:
-

[~vbalaji] completed testing realtime queries through Hive. Waiting for 
[~wenningd] to finish remaining queries before resolving this ticket.

> Test MOR : Hive Realtime Query with metadata bootstrap
> --
>
> Key: HUDI-949
> URL: https://issues.apache.org/jira/browse/HUDI-949
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Balaji Varadarajan
>Assignee: Wenning Ding
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-949) Test MOR : Hive Realtime Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-949:

Status: Open  (was: New)

> Test MOR : Hive Realtime Query with metadata bootstrap
> --
>
> Key: HUDI-949
> URL: https://issues.apache.org/jira/browse/HUDI-949
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Balaji Varadarajan
>Assignee: Wenning Ding
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-951) Test MOR : Spark SQL Read Optimized Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan resolved HUDI-951.
-
Resolution: Fixed

> Test MOR : Spark SQL Read Optimized Query with metadata bootstrap
> -
>
> Key: HUDI-951
> URL: https://issues.apache.org/jira/browse/HUDI-951
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-951) Test MOR : Spark SQL Read Optimized Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-951:

Status: Open  (was: New)

> Test MOR : Spark SQL Read Optimized Query with metadata bootstrap
> -
>
> Key: HUDI-951
> URL: https://issues.apache.org/jira/browse/HUDI-951
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-950) Test COW : Spark SQL Read Optimized Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-950:

Status: Open  (was: New)

> Test COW : Spark SQL Read Optimized Query with metadata bootstrap
> -
>
> Key: HUDI-950
> URL: https://issues.apache.org/jira/browse/HUDI-950
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: Wenning Ding
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-952) Test MOR : Spark SQL Realtime Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan resolved HUDI-952.
-
Resolution: Fixed

> Test MOR : Spark SQL Realtime Query with metadata bootstrap
> ---
>
> Key: HUDI-952
> URL: https://issues.apache.org/jira/browse/HUDI-952
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-948) Test MOR : Hive Read Optimized Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-948:

Status: Open  (was: New)

> Test MOR : Hive Read Optimized Query with metadata bootstrap
> 
>
> Key: HUDI-948
> URL: https://issues.apache.org/jira/browse/HUDI-948
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Hive Integration
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-947) Test COW : Hive Read Optimized Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-947:

Status: Open  (was: New)

> Test COW : Hive Read Optimized Query with metadata bootstrap
> 
>
> Key: HUDI-947
> URL: https://issues.apache.org/jira/browse/HUDI-947
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Hive Integration
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
>
> Test Hive Queries as described in 
> [https://docs.google.com/spreadsheets/d/1xVfatk-6-fekwuCCZ-nTHQkewcHSEk89y-ReVV5vHQU/edit#gid=1813901684]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-947) Test COW : Hive Read Optimized Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan resolved HUDI-947.
-
Resolution: Fixed

> Test COW : Hive Read Optimized Query with metadata bootstrap
> 
>
> Key: HUDI-947
> URL: https://issues.apache.org/jira/browse/HUDI-947
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Hive Integration
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
>
> Test Hive Queries as described in 
> [https://docs.google.com/spreadsheets/d/1xVfatk-6-fekwuCCZ-nTHQkewcHSEk89y-ReVV5vHQU/edit#gid=1813901684]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-947) Test COW : Hive Read Optimized Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-947:

Status: In Progress  (was: Open)

> Test COW : Hive Read Optimized Query with metadata bootstrap
> 
>
> Key: HUDI-947
> URL: https://issues.apache.org/jira/browse/HUDI-947
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Hive Integration
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
>
> Test Hive Queries as described in 
> [https://docs.google.com/spreadsheets/d/1xVfatk-6-fekwuCCZ-nTHQkewcHSEk89y-ReVV5vHQU/edit#gid=1813901684]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-952) Test MOR : Spark SQL Realtime Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-952:

Status: Open  (was: New)

> Test MOR : Spark SQL Realtime Query with metadata bootstrap
> ---
>
> Key: HUDI-952
> URL: https://issues.apache.org/jira/browse/HUDI-952
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-948) Test MOR : Hive Read Optimized Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan resolved HUDI-948.
-
Resolution: Fixed

> Test MOR : Hive Read Optimized Query with metadata bootstrap
> 
>
> Key: HUDI-948
> URL: https://issues.apache.org/jira/browse/HUDI-948
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Hive Integration
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-949) Test MOR : Hive Realtime Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan reassigned HUDI-949:
---

Assignee: Wenning Ding

> Test MOR : Hive Realtime Query with metadata bootstrap
> --
>
> Key: HUDI-949
> URL: https://issues.apache.org/jira/browse/HUDI-949
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Balaji Varadarajan
>Assignee: Wenning Ding
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-947) Test COW : Hive Read Optimized Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan reassigned HUDI-947:
---

Assignee: Balaji Varadarajan

> Test COW : Hive Read Optimized Query with metadata bootstrap
> 
>
> Key: HUDI-947
> URL: https://issues.apache.org/jira/browse/HUDI-947
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Hive Integration
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
>
> Test Hive Queries as described in 
> [https://docs.google.com/spreadsheets/d/1xVfatk-6-fekwuCCZ-nTHQkewcHSEk89y-ReVV5vHQU/edit#gid=1813901684]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-948) Test MOR : Hive Read Optimized Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan reassigned HUDI-948:
---

Assignee: Balaji Varadarajan

> Test MOR : Hive Read Optimized Query with metadata bootstrap
> 
>
> Key: HUDI-948
> URL: https://issues.apache.org/jira/browse/HUDI-948
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Hive Integration
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-952) Test MOR : Spark SQL Realtime Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan reassigned HUDI-952:
---

Assignee: Balaji Varadarajan

> Test MOR : Spark SQL Realtime Query with metadata bootstrap
> ---
>
> Key: HUDI-952
> URL: https://issues.apache.org/jira/browse/HUDI-952
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-953) Test COW : Spark Data Source Read Optimized Queries

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan reassigned HUDI-953:
---

Assignee: Udit Mehrotra

> Test COW : Spark Data Source Read Optimized Queries
> ---
>
> Key: HUDI-953
> URL: https://issues.apache.org/jira/browse/HUDI-953
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: Udit Mehrotra
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-950) Test COW : Spark SQL Read Optimized Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan reassigned HUDI-950:
---

Assignee: Wenning Ding

> Test COW : Spark SQL Read Optimized Query with metadata bootstrap
> -
>
> Key: HUDI-950
> URL: https://issues.apache.org/jira/browse/HUDI-950
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: Wenning Ding
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-951) Test MOR : Spark SQL Read Optimized Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan reassigned HUDI-951:
---

Assignee: Balaji Varadarajan

> Test MOR : Spark SQL Read Optimized Query with metadata bootstrap
> -
>
> Key: HUDI-951
> URL: https://issues.apache.org/jira/browse/HUDI-951
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-947) Test COW : Hive Read Optimized Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-947:


Test Plan present in 
https://docs.google.com/spreadsheets/d/1xVfatk-6-fekwuCCZ-nTHQkewcHSEk89y-ReVV5vHQU/edit#gid=1813901684

> Test COW : Hive Read Optimized Query with metadata bootstrap
> 
>
> Key: HUDI-947
> URL: https://issues.apache.org/jira/browse/HUDI-947
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Hive Integration
>Reporter: Balaji Varadarajan
>Priority: Major
>
> Test Hive Queries as described in 
> [https://docs.google.com/spreadsheets/d/1xVfatk-6-fekwuCCZ-nTHQkewcHSEk89y-ReVV5vHQU/edit#gid=1813901684]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-954) Test COW : Presto Read Optimized Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-954:


Test Plan present in 
https://docs.google.com/spreadsheets/d/1xVfatk-6-fekwuCCZ-nTHQkewcHSEk89y-ReVV5vHQU/edit#gid=1813901684

> Test COW : Presto Read Optimized Query with metadata bootstrap
> --
>
> Key: HUDI-954
> URL: https://issues.apache.org/jira/browse/HUDI-954
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Presto Integration
>Reporter: Balaji Varadarajan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-953) Test COW : Spark Data Source Read Optimized Queries

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-953:


Test Plan present in 
https://docs.google.com/spreadsheets/d/1xVfatk-6-fekwuCCZ-nTHQkewcHSEk89y-ReVV5vHQU/edit#gid=1813901684

> Test COW : Spark Data Source Read Optimized Queries
> ---
>
> Key: HUDI-953
> URL: https://issues.apache.org/jira/browse/HUDI-953
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-952) Test MOR : Spark SQL Realtime Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-952:


Test Plan present in 
https://docs.google.com/spreadsheets/d/1xVfatk-6-fekwuCCZ-nTHQkewcHSEk89y-ReVV5vHQU/edit#gid=1813901684

> Test MOR : Spark SQL Realtime Query with metadata bootstrap
> ---
>
> Key: HUDI-952
> URL: https://issues.apache.org/jira/browse/HUDI-952
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-951) Test MOR : Spark SQL Read Optimized Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-951:


Test Plan present in 
https://docs.google.com/spreadsheets/d/1xVfatk-6-fekwuCCZ-nTHQkewcHSEk89y-ReVV5vHQU/edit#gid=1813901684

> Test MOR : Spark SQL Read Optimized Query with metadata bootstrap
> -
>
> Key: HUDI-951
> URL: https://issues.apache.org/jira/browse/HUDI-951
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-949) Test MOR : Hive Realtime Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-949:


Test Plan present in 
https://docs.google.com/spreadsheets/d/1xVfatk-6-fekwuCCZ-nTHQkewcHSEk89y-ReVV5vHQU/edit#gid=1813901684

> Test MOR : Hive Realtime Query with metadata bootstrap
> --
>
> Key: HUDI-949
> URL: https://issues.apache.org/jira/browse/HUDI-949
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Balaji Varadarajan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-950) Test COW : Spark SQL Read Optimized Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-950:


Test Plan present in 
https://docs.google.com/spreadsheets/d/1xVfatk-6-fekwuCCZ-nTHQkewcHSEk89y-ReVV5vHQU/edit#gid=1813901684

> Test COW : Spark SQL Read Optimized Query with metadata bootstrap
> -
>
> Key: HUDI-950
> URL: https://issues.apache.org/jira/browse/HUDI-950
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-955) Test MOR : Presto Read Optimized Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-955:


Test Plan present in 
https://docs.google.com/spreadsheets/d/1xVfatk-6-fekwuCCZ-nTHQkewcHSEk89y-ReVV5vHQU/edit#gid=1813901684

> Test MOR : Presto Read Optimized Query with metadata bootstrap
> --
>
> Key: HUDI-955
> URL: https://issues.apache.org/jira/browse/HUDI-955
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Presto Integration
>Reporter: Balaji Varadarajan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-948) Test MOR : Hive Read Optimized Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-948:


Test Plan present in 
https://docs.google.com/spreadsheets/d/1xVfatk-6-fekwuCCZ-nTHQkewcHSEk89y-ReVV5vHQU/edit#gid=1813901684

> Test MOR : Hive Read Optimized Query with metadata bootstrap
> 
>
> Key: HUDI-948
> URL: https://issues.apache.org/jira/browse/HUDI-948
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Hive Integration
>Reporter: Balaji Varadarajan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-956) Test COW : Presto Realtime Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-956:


Test Plan present in 
https://docs.google.com/spreadsheets/d/1xVfatk-6-fekwuCCZ-nTHQkewcHSEk89y-ReVV5vHQU/edit#gid=1813901684

> Test COW : Presto Realtime Query with metadata bootstrap
> 
>
> Key: HUDI-956
> URL: https://issues.apache.org/jira/browse/HUDI-956
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Presto Integration
>Reporter: Balaji Varadarajan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-955) Test MOR : Presto Read Optimized Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-955:
---

 Summary: Test MOR : Presto Read Optimized Query with metadata 
bootstrap
 Key: HUDI-955
 URL: https://issues.apache.org/jira/browse/HUDI-955
 Project: Apache Hudi
  Issue Type: Sub-task
  Components: Presto Integration
Reporter: Balaji Varadarajan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-956) Test COW : Presto Realtime Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-956:
---

 Summary: Test COW : Presto Realtime Query with metadata bootstrap
 Key: HUDI-956
 URL: https://issues.apache.org/jira/browse/HUDI-956
 Project: Apache Hudi
  Issue Type: Sub-task
  Components: Presto Integration
Reporter: Balaji Varadarajan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-954) Test COW : Presto Read Optimized Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-954:
---

 Summary: Test COW : Presto Read Optimized Query with metadata 
bootstrap
 Key: HUDI-954
 URL: https://issues.apache.org/jira/browse/HUDI-954
 Project: Apache Hudi
  Issue Type: Sub-task
  Components: Presto Integration
Reporter: Balaji Varadarajan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-953) Test COW : Spark Data Source Read Optimized Queries

2020-05-25 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-953:
---

 Summary: Test COW : Spark Data Source Read Optimized Queries
 Key: HUDI-953
 URL: https://issues.apache.org/jira/browse/HUDI-953
 Project: Apache Hudi
  Issue Type: Sub-task
  Components: Spark Integration
Reporter: Balaji Varadarajan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-952) Test MOR : Spark SQL Realtime Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-952:
---

 Summary: Test MOR : Spark SQL Realtime Query with metadata 
bootstrap
 Key: HUDI-952
 URL: https://issues.apache.org/jira/browse/HUDI-952
 Project: Apache Hudi
  Issue Type: Sub-task
  Components: Spark Integration
Reporter: Balaji Varadarajan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-950) Test COW : Spark SQL Read Optimized Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-950:
---

 Summary: Test COW : Spark SQL Read Optimized Query with metadata 
bootstrap
 Key: HUDI-950
 URL: https://issues.apache.org/jira/browse/HUDI-950
 Project: Apache Hudi
  Issue Type: Sub-task
  Components: Spark Integration
Reporter: Balaji Varadarajan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-951) Test MOR : Spark SQL Read Optimized Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-951:
---

 Summary: Test MOR : Spark SQL Read Optimized Query with metadata 
bootstrap
 Key: HUDI-951
 URL: https://issues.apache.org/jira/browse/HUDI-951
 Project: Apache Hudi
  Issue Type: Sub-task
  Components: Spark Integration
Reporter: Balaji Varadarajan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-947) Test COW : Hive Read Optimized Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-947:

Summary: Test COW : Hive Read Optimized Query with metadata bootstrap  
(was: COW : Hive Read Optimized Table with metadata bootstrap)

> Test COW : Hive Read Optimized Query with metadata bootstrap
> 
>
> Key: HUDI-947
> URL: https://issues.apache.org/jira/browse/HUDI-947
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Hive Integration
>Reporter: Balaji Varadarajan
>Priority: Major
>
> Test Hive Queries as described in 
> [https://docs.google.com/spreadsheets/d/1xVfatk-6-fekwuCCZ-nTHQkewcHSEk89y-ReVV5vHQU/edit#gid=1813901684]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-948) Test MOR : Hive Read Optimized Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-948:
---

 Summary: Test MOR : Hive Read Optimized Query with metadata 
bootstrap
 Key: HUDI-948
 URL: https://issues.apache.org/jira/browse/HUDI-948
 Project: Apache Hudi
  Issue Type: Sub-task
  Components: Hive Integration
Reporter: Balaji Varadarajan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-949) Test MOR : Hive Realtime Query with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-949:
---

 Summary: Test MOR : Hive Realtime Query with metadata bootstrap
 Key: HUDI-949
 URL: https://issues.apache.org/jira/browse/HUDI-949
 Project: Apache Hudi
  Issue Type: Sub-task
Reporter: Balaji Varadarajan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-947) COW : Hive Read Optimized Table with metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-947:
---

 Summary: COW : Hive Read Optimized Table with metadata bootstrap
 Key: HUDI-947
 URL: https://issues.apache.org/jira/browse/HUDI-947
 Project: Apache Hudi
  Issue Type: Sub-task
  Components: Hive Integration
Reporter: Balaji Varadarajan


Test Hive Queries as described in 
[https://docs.google.com/spreadsheets/d/1xVfatk-6-fekwuCCZ-nTHQkewcHSEk89y-ReVV5vHQU/edit#gid=1813901684]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-946) Metadata Bootstrap Query Testing Master TIcket

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-946:

Status: In Progress  (was: Open)

> Metadata Bootstrap Query Testing  Master TIcket
> ---
>
> Key: HUDI-946
> URL: https://issues.apache.org/jira/browse/HUDI-946
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Hive Integration, Presto Integration, Spark Integration
>Reporter: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0
>
>
>  
> Query Pattern used for testing : 
> [https://docs.google.com/spreadsheets/d/1xVfatk-6-fekwuCCZ-nTHQkewcHSEk89y-ReVV5vHQU/edit?usp=sharing]
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-946) Metadata Bootstrap Query Testing Master TIcket

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-946:

Status: Open  (was: New)

> Metadata Bootstrap Query Testing  Master TIcket
> ---
>
> Key: HUDI-946
> URL: https://issues.apache.org/jira/browse/HUDI-946
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Hive Integration, Presto Integration, Spark Integration
>Reporter: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0
>
>
>  
> Query Pattern used for testing : 
> [https://docs.google.com/spreadsheets/d/1xVfatk-6-fekwuCCZ-nTHQkewcHSEk89y-ReVV5vHQU/edit?usp=sharing]
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-946) Metadata Bootstrap Query Testing Master TIcket

2020-05-25 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-946:
---

 Summary: Metadata Bootstrap Query Testing  Master TIcket
 Key: HUDI-946
 URL: https://issues.apache.org/jira/browse/HUDI-946
 Project: Apache Hudi
  Issue Type: Task
  Components: Hive Integration, Presto Integration, Spark Integration
Reporter: Balaji Varadarajan
 Fix For: 0.6.0


 

Query Pattern used for testing : 
[https://docs.google.com/spreadsheets/d/1xVfatk-6-fekwuCCZ-nTHQkewcHSEk89y-ReVV5vHQU/edit?usp=sharing]

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-690) filtercompletedInstants in HudiSnapshotCopier not working as expected for MOR tables

2020-05-25 Thread Raymond Xu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17116251#comment-17116251
 ] 

Raymond Xu edited comment on HUDI-690 at 5/25/20, 8:41 PM:
---

It is the `getCommitsTimeline()` that does not include compaction instants. 
Changing to`getCommitsAndCompactionTimeline()` should do the job.

[~jomeke] would you be able to verify the change with the problematic dataset 
that you had? You may simply apply the 2-line change to

hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotExporter.java

as shown in 
[https://github.com/apache/hudi/pull/1667/files#diff-5ba8b968112135426aa10a7660d9b248]


was (Author: rxu):
It is the `getCommitsTimeline()` that does not include compaction instants. 
Changing to`getCommitsAndCompactionTimeline()` should do the job.

[~jomeke] would you be able to verify the change with the problematic dataset 
that you had? You may simply apply the 2-line change to 
`HudiSnapshotCopier.java` as shown in 
[https://github.com/apache/hudi/pull/1667/files#diff-5ba8b968112135426aa10a7660d9b248]

> filtercompletedInstants in HudiSnapshotCopier not working as expected for MOR 
> tables
> 
>
> Key: HUDI-690
> URL: https://issues.apache.org/jira/browse/HUDI-690
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Jasmine Omeke
>Assignee: Raymond Xu
>Priority: Major
>  Labels: bug-bash-0.6.0
> Fix For: 0.6.0
>
>
> Hi. I encountered an error while using the HudiSnapshotCopier class to make a 
> Backup of merge on read tables: 
> [https://github.com/apache/incubator-hudi/blob/release-0.5.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotCopier.java]
>  
> The error:
>  
> {code:java}
> 20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: 
> web-proxy.bt.local Proxy Port: 3128
> 20/03/09 15:43:19 INFO HoodieTableConfig: Loading dataset properties from 
> /.hoodie/hoodie.properties
> 20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: 
> web-proxy.bt.local Proxy Port: 3128
> 20/03/09 15:43:19 INFO HoodieTableMetaClient: Finished Loading Table of type 
> MERGE_ON_READ from 
> 20/03/09 15:43:20 INFO HoodieActiveTimeline: Loaded instants 
> java.util.stream.ReferencePipeline$Head@77f7352a
> 20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered 
> executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40894) 
> with ID 2
> 20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 2 has 
> registered (new total is 1)
> 20/03/09 15:43:21 INFO BlockManagerMasterEndpoint: Registering block manager 
> ip-10-49-26-74.us-east-2.compute.internal:32831 with 12.4 GB RAM, 
> BlockManagerId(2, ip-10-49-26-74.us-east-2.compute.internal, 3283
> 1, None)
> 20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered 
> executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40902) 
> with ID 4
> 20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 4 has 
> registered (new total is 2)Exception in thread "main" 
> java.lang.IllegalStateException: Hudi File Id 
> (HoodieFileGroupId{partitionPath='created_at_month=2020-03', 
> fileId='7104bb0b-20f6-4dec-981b-c11bf20ade4a-0'}) has more than 1 pending
> compactions. Instants: (20200309011643,{"baseInstantTime": "20200308213934", 
> "deltaFilePaths": 
> [".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.1_3-751289-170568496",
>  ".7104bb0b-20f6-4dec-981b-c11
> bf20ade4a-0_20200308213934.log.2_3-761601-172985464", 
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.3_1-772174-175483657",
>  ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.4_2-782377-
> 177872977", 
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.5_1-790994-179909226"],
>  "dataFilePath": 
> "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-746201-169642460_20200308213934.parquet",
>  "fileId": "7
> 104bb0b-20f6-4dec-981b-c11bf20ade4a-0", "partitionPath": 
> "created_at_month=2020-03", "metrics": {"TOTAL_LOG_FILES": 5.0, 
> "TOTAL_IO_READ_MB": 512.0, "TOTAL_LOG_FILES_SIZE": 33789.0, 
> "TOTAL_IO_WRITE_MB": 512.0,
>  "TOTAL_IO_MB": 1024.0, "TOTAL_LOG_FILE_SIZE": 33789.0}}), 
> (20200308213934,{"baseInstantTime": "20200308180755", "deltaFilePaths": 
> [".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.1_3-696047-158157865",
>  
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.2_2-706457-160605423",
>  
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.3_1-716977-163056814",
>  ".7104bb0b-20f6-4dec-981b-c11bf20ad
> e4a-0_20200308180755.log.4_3-727192-165430450", 
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.5_3-737755-167913339"],
>  "dat

[jira] [Comment Edited] (HUDI-690) filtercompletedInstants in HudiSnapshotCopier not working as expected for MOR tables

2020-05-25 Thread Raymond Xu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17116251#comment-17116251
 ] 

Raymond Xu edited comment on HUDI-690 at 5/25/20, 8:39 PM:
---

It is the `getCommitsTimeline()` that does not include compaction instants. 
Changing to`getCommitsAndCompactionTimeline()` should do the job.

[~jomeke] would you be able to verify the change with the problematic dataset 
that you had? You may simply apply the 2-line change to 
`HudiSnapshotCopier.java` as shown in 
[https://github.com/apache/hudi/pull/1667/files#diff-5ba8b968112135426aa10a7660d9b248]


was (Author: rxu):
It is the `getCommitsTimeline()` that does not include compaction instants. 
Changing to`getCommitsAndCompactionTimeline()` should do the job.

[~jomeke] would you be able to verify the change with the problematic dataset 
that you had? You may simply apply the changes as shown in 
[https://github.com/apache/hudi/pull/1667/files#diff-5ba8b968112135426aa10a7660d9b248]

> filtercompletedInstants in HudiSnapshotCopier not working as expected for MOR 
> tables
> 
>
> Key: HUDI-690
> URL: https://issues.apache.org/jira/browse/HUDI-690
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Jasmine Omeke
>Assignee: Raymond Xu
>Priority: Major
>  Labels: bug-bash-0.6.0
> Fix For: 0.6.0
>
>
> Hi. I encountered an error while using the HudiSnapshotCopier class to make a 
> Backup of merge on read tables: 
> [https://github.com/apache/incubator-hudi/blob/release-0.5.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotCopier.java]
>  
> The error:
>  
> {code:java}
> 20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: 
> web-proxy.bt.local Proxy Port: 3128
> 20/03/09 15:43:19 INFO HoodieTableConfig: Loading dataset properties from 
> /.hoodie/hoodie.properties
> 20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: 
> web-proxy.bt.local Proxy Port: 3128
> 20/03/09 15:43:19 INFO HoodieTableMetaClient: Finished Loading Table of type 
> MERGE_ON_READ from 
> 20/03/09 15:43:20 INFO HoodieActiveTimeline: Loaded instants 
> java.util.stream.ReferencePipeline$Head@77f7352a
> 20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered 
> executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40894) 
> with ID 2
> 20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 2 has 
> registered (new total is 1)
> 20/03/09 15:43:21 INFO BlockManagerMasterEndpoint: Registering block manager 
> ip-10-49-26-74.us-east-2.compute.internal:32831 with 12.4 GB RAM, 
> BlockManagerId(2, ip-10-49-26-74.us-east-2.compute.internal, 3283
> 1, None)
> 20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered 
> executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40902) 
> with ID 4
> 20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 4 has 
> registered (new total is 2)Exception in thread "main" 
> java.lang.IllegalStateException: Hudi File Id 
> (HoodieFileGroupId{partitionPath='created_at_month=2020-03', 
> fileId='7104bb0b-20f6-4dec-981b-c11bf20ade4a-0'}) has more than 1 pending
> compactions. Instants: (20200309011643,{"baseInstantTime": "20200308213934", 
> "deltaFilePaths": 
> [".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.1_3-751289-170568496",
>  ".7104bb0b-20f6-4dec-981b-c11
> bf20ade4a-0_20200308213934.log.2_3-761601-172985464", 
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.3_1-772174-175483657",
>  ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.4_2-782377-
> 177872977", 
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.5_1-790994-179909226"],
>  "dataFilePath": 
> "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-746201-169642460_20200308213934.parquet",
>  "fileId": "7
> 104bb0b-20f6-4dec-981b-c11bf20ade4a-0", "partitionPath": 
> "created_at_month=2020-03", "metrics": {"TOTAL_LOG_FILES": 5.0, 
> "TOTAL_IO_READ_MB": 512.0, "TOTAL_LOG_FILES_SIZE": 33789.0, 
> "TOTAL_IO_WRITE_MB": 512.0,
>  "TOTAL_IO_MB": 1024.0, "TOTAL_LOG_FILE_SIZE": 33789.0}}), 
> (20200308213934,{"baseInstantTime": "20200308180755", "deltaFilePaths": 
> [".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.1_3-696047-158157865",
>  
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.2_2-706457-160605423",
>  
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.3_1-716977-163056814",
>  ".7104bb0b-20f6-4dec-981b-c11bf20ad
> e4a-0_20200308180755.log.4_3-727192-165430450", 
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.5_3-737755-167913339"],
>  "dataFilePath": "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-690668-157158597_2
> 0200308180755.parqu

[jira] [Commented] (HUDI-690) filtercompletedInstants in HudiSnapshotCopier not working as expected for MOR tables

2020-05-25 Thread Raymond Xu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17116251#comment-17116251
 ] 

Raymond Xu commented on HUDI-690:
-

It is the `getCommitsTimeline()` that does not include compaction instants. 
Changing to`getCommitsAndCompactionTimeline()` should do the job.

[~jomeke] would you be able to verify the change with the problematic dataset 
that you had? You may simply apply the changes as shown in 
[https://github.com/apache/hudi/pull/1667/files#diff-5ba8b968112135426aa10a7660d9b248]

> filtercompletedInstants in HudiSnapshotCopier not working as expected for MOR 
> tables
> 
>
> Key: HUDI-690
> URL: https://issues.apache.org/jira/browse/HUDI-690
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Jasmine Omeke
>Assignee: Raymond Xu
>Priority: Major
>  Labels: bug-bash-0.6.0
> Fix For: 0.6.0
>
>
> Hi. I encountered an error while using the HudiSnapshotCopier class to make a 
> Backup of merge on read tables: 
> [https://github.com/apache/incubator-hudi/blob/release-0.5.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotCopier.java]
>  
> The error:
>  
> {code:java}
> 20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: 
> web-proxy.bt.local Proxy Port: 3128
> 20/03/09 15:43:19 INFO HoodieTableConfig: Loading dataset properties from 
> /.hoodie/hoodie.properties
> 20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: 
> web-proxy.bt.local Proxy Port: 3128
> 20/03/09 15:43:19 INFO HoodieTableMetaClient: Finished Loading Table of type 
> MERGE_ON_READ from 
> 20/03/09 15:43:20 INFO HoodieActiveTimeline: Loaded instants 
> java.util.stream.ReferencePipeline$Head@77f7352a
> 20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered 
> executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40894) 
> with ID 2
> 20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 2 has 
> registered (new total is 1)
> 20/03/09 15:43:21 INFO BlockManagerMasterEndpoint: Registering block manager 
> ip-10-49-26-74.us-east-2.compute.internal:32831 with 12.4 GB RAM, 
> BlockManagerId(2, ip-10-49-26-74.us-east-2.compute.internal, 3283
> 1, None)
> 20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered 
> executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40902) 
> with ID 4
> 20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 4 has 
> registered (new total is 2)Exception in thread "main" 
> java.lang.IllegalStateException: Hudi File Id 
> (HoodieFileGroupId{partitionPath='created_at_month=2020-03', 
> fileId='7104bb0b-20f6-4dec-981b-c11bf20ade4a-0'}) has more than 1 pending
> compactions. Instants: (20200309011643,{"baseInstantTime": "20200308213934", 
> "deltaFilePaths": 
> [".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.1_3-751289-170568496",
>  ".7104bb0b-20f6-4dec-981b-c11
> bf20ade4a-0_20200308213934.log.2_3-761601-172985464", 
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.3_1-772174-175483657",
>  ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.4_2-782377-
> 177872977", 
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.5_1-790994-179909226"],
>  "dataFilePath": 
> "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-746201-169642460_20200308213934.parquet",
>  "fileId": "7
> 104bb0b-20f6-4dec-981b-c11bf20ade4a-0", "partitionPath": 
> "created_at_month=2020-03", "metrics": {"TOTAL_LOG_FILES": 5.0, 
> "TOTAL_IO_READ_MB": 512.0, "TOTAL_LOG_FILES_SIZE": 33789.0, 
> "TOTAL_IO_WRITE_MB": 512.0,
>  "TOTAL_IO_MB": 1024.0, "TOTAL_LOG_FILE_SIZE": 33789.0}}), 
> (20200308213934,{"baseInstantTime": "20200308180755", "deltaFilePaths": 
> [".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.1_3-696047-158157865",
>  
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.2_2-706457-160605423",
>  
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.3_1-716977-163056814",
>  ".7104bb0b-20f6-4dec-981b-c11bf20ad
> e4a-0_20200308180755.log.4_3-727192-165430450", 
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.5_3-737755-167913339"],
>  "dataFilePath": "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-690668-157158597_2
> 0200308180755.parquet", "fileId": "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0", 
> "partitionPath": "created_at_month=2020-03", "metrics": {"TOTAL_LOG_FILES": 
> 5.0, "TOTAL_IO_READ_MB": 512.0, "TOTAL_LOG_FILES_SIZE":
> 44197.0, "TOTAL_IO_WRITE_MB": 511.0, "TOTAL_IO_MB": 1023.0, 
> "TOTAL_LOG_FILE_SIZE": 44197.0}})at 
> org.apache.hudi.common.util.CompactionUtils.lambda$getAllPendingCompactionOperations$5(CompactionUtils.java:161)
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRe

[jira] [Updated] (HUDI-807) Spark DS Support for incremental queries for bootstrapped tables

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-807:

Status: Patch Available  (was: In Progress)

> Spark DS Support for incremental queries for bootstrapped tables
> 
>
> Key: HUDI-807
> URL: https://issues.apache.org/jira/browse/HUDI-807
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Udit Mehrotra
>Assignee: Udit Mehrotra
>Priority: Major
>
> Investigate and figure out the changes required in Spark integration code to 
> make incremental queries work seamlessly for bootstrapped tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-807) Spark DS Support for incremental queries for bootstrapped tables

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-807:

Status: In Progress  (was: Open)

> Spark DS Support for incremental queries for bootstrapped tables
> 
>
> Key: HUDI-807
> URL: https://issues.apache.org/jira/browse/HUDI-807
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Udit Mehrotra
>Assignee: Udit Mehrotra
>Priority: Major
>
> Investigate and figure out the changes required in Spark integration code to 
> make incremental queries work seamlessly for bootstrapped tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-807) Spark DS Support for incremental queries for bootstrapped tables

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-807:

Status: Open  (was: New)

> Spark DS Support for incremental queries for bootstrapped tables
> 
>
> Key: HUDI-807
> URL: https://issues.apache.org/jira/browse/HUDI-807
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Udit Mehrotra
>Assignee: Udit Mehrotra
>Priority: Major
>
> Investigate and figure out the changes required in Spark integration code to 
> make incremental queries work seamlessly for bootstrapped tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-426) Implement Spark DataSource Support for querying bootstrapped tables

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-426:

Status: Patch Available  (was: In Progress)

> Implement Spark DataSource Support for querying bootstrapped tables
> ---
>
> Key: HUDI-426
> URL: https://issues.apache.org/jira/browse/HUDI-426
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: Udit Mehrotra
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We need ability in SparkDataSource to query COW table which is bootstrapped 
> as per 
> [https://cwiki.apache.org/confluence/display/HUDI/RFC+-+12+:+Efficient+Migration+of+Large+Parquet+Tables+to+Apache+Hudi#RFC-12:EfficientMigrationofLargeParquetTablestoApacheHudi-BootstrapIndex:]
>  
> Current implementation delegates to Parquet DataSource but this wont work as 
> we need ability to stitch the columns externally.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-426) Implement Spark DataSource Support for querying bootstrapped tables

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-426:

Status: In Progress  (was: Open)

> Implement Spark DataSource Support for querying bootstrapped tables
> ---
>
> Key: HUDI-426
> URL: https://issues.apache.org/jira/browse/HUDI-426
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: Udit Mehrotra
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We need ability in SparkDataSource to query COW table which is bootstrapped 
> as per 
> [https://cwiki.apache.org/confluence/display/HUDI/RFC+-+12+:+Efficient+Migration+of+Large+Parquet+Tables+to+Apache+Hudi#RFC-12:EfficientMigrationofLargeParquetTablestoApacheHudi-BootstrapIndex:]
>  
> Current implementation delegates to Parquet DataSource but this wont work as 
> we need ability to stitch the columns externally.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-808) Support for cleaning source data

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan reassigned HUDI-808:
---

Assignee: Udit Mehrotra

> Support for cleaning source data
> 
>
> Key: HUDI-808
> URL: https://issues.apache.org/jira/browse/HUDI-808
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Udit Mehrotra
>Assignee: Udit Mehrotra
>Priority: Major
>
> This is an important requirement from GDPR perspective. When performing 
> deletion on a metadata only bootstrapped partition, users should have the 
> ability to tell to clean up the original data from the source location 
> because as per this new bootstrapping mechanism the original data serves as 
> the data in original commit for Hudi.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-828) Open Questions before merging Bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan reassigned HUDI-828:
---

Assignee: Balaji Varadarajan

> Open Questions before merging Bootstrap 
> 
>
> Key: HUDI-828
> URL: https://issues.apache.org/jira/browse/HUDI-828
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
>
> This ticket tracks open questions that needs to be resolved before we checkin 
> bootstrap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-807) Spark DS Support for incremental queries for bootstrapped tables

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-807:

Summary: Spark DS Support for incremental queries for bootstrapped tables  
(was: Support for incremental queries for bootstrapped tables)

> Spark DS Support for incremental queries for bootstrapped tables
> 
>
> Key: HUDI-807
> URL: https://issues.apache.org/jira/browse/HUDI-807
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Udit Mehrotra
>Assignee: Udit Mehrotra
>Priority: Major
>
> Investigate and figure out the changes required in Spark integration code to 
> make incremental queries work seamlessly for bootstrapped tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-899) Add a knob to change partition-path style while performing metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan reassigned HUDI-899:
---

Assignee: Balaji Varadarajan

> Add a knob to change partition-path style while performing metadata bootstrap
> -
>
> Key: HUDI-899
> URL: https://issues.apache.org/jira/browse/HUDI-899
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0
>
>  Time Spent: 24h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-915) Partition Columns missing in files upserted after Metadata Bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan reassigned HUDI-915:
---

Assignee: Balaji Varadarajan

> Partition Columns missing in files upserted after Metadata Bootstrap
> 
>
> Key: HUDI-915
> URL: https://issues.apache.org/jira/browse/HUDI-915
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Common Core
>Reporter: Udit Mehrotra
>Assignee: Balaji Varadarajan
>Priority: Major
>
> This issue happens in when the source data is partitioned using _*hive-style 
> partitioning*_ which is also the default behavior of spark when it writes the 
> data. With this partitioning, the partition column/schema is never stored in 
> the files but instead retrieved on the fly from the file paths which have 
> partition folder in the form *_partition_key=partition_value_*.
> Now, during metadata bootstrap we store only the metadata columns in the hudi 
> table folder. Also the *bootstrap schema* we are computing directly reads 
> schema from the source data file which does not have the *partition column 
> schema* in it. Thus it is not complete.
> All this manifests into issues when we ultimately do *upserts* on these 
> bootstrapped files and they are fully bootstrapped. During upsert time the 
> schema evolves because the upsert dataframe needs to have partition column in 
> it for performing upserts. Thus ultimately the *upserted rows* have the 
> correct partition column value stored, while the other records which are 
> simply copied over from the metadata bootstrap file have missing partition 
> column in them. Thus, we observe a different behavior here with 
> *bootstrapped* vs *non-bootstrapped* tables.
> While this is not at the moment creating issues with *Hive* because it is 
> able to determine the partition columns becuase of all the metadata it 
> stores, however it creates a problem with other engines like *Spark* where 
> the partition columns will show up as *null* when the upserted files are read.
> Thus, the proposal is to fix the following issues:
>  * When performing bootstrap, figure out the partition schema and store it in 
> the *bootstrap schema* in the commit metadata file. This would provide the 
> following benefits:
>  ** From a completeness perspective this is good so that there is no 
> behavioral changes between bootstrapped vs non-bootstrapped tables.
>  ** In spark bootstrap relation and incremental query relation where we need 
> to figure out the latest schema, once can simply get the accurate schema from 
> the commit metadata file instead of having to determine whether or not 
> partition column is present in the schema obtained from the metadata file and 
> if not figure out the partition schema everytime and merge (which can be 
> expensive).
>  * When doing upsert on files that are metadata bootstrapped, the partition 
> column values should be correctly determined and copied to the upserted file 
> to avoid missing and null values.
>  ** Again this is consistent behavior with non-bootstrapped tables and even 
> though Hive seems to somehow handle this, we should consider other engines 
> like *Spark* where it cannot be automatically handled.
>  ** Without this it will be significantly more complicated to be able to 
> provide the partition value on read side in spark, to be able to determine 
> everytime whether partition value is null and somehow filling it in.
>  ** Once the table is fully bootstrapped at some point in future, and the 
> bootstrap commit is say cleaned up and spark querying happens through 
> *parquet* datasource instead of *new bootstrapped datasource*, the *parquet 
> datasource* will return null values wherever it find the missing partition 
> values. In that case, we have no control over the *parquet* datasource as it 
> is simply reading from the file. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-915) Partition Columns missing in files upserted after Metadata Bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-915:

Status: Open  (was: New)

> Partition Columns missing in files upserted after Metadata Bootstrap
> 
>
> Key: HUDI-915
> URL: https://issues.apache.org/jira/browse/HUDI-915
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Common Core
>Reporter: Udit Mehrotra
>Priority: Major
>
> This issue happens in when the source data is partitioned using _*hive-style 
> partitioning*_ which is also the default behavior of spark when it writes the 
> data. With this partitioning, the partition column/schema is never stored in 
> the files but instead retrieved on the fly from the file paths which have 
> partition folder in the form *_partition_key=partition_value_*.
> Now, during metadata bootstrap we store only the metadata columns in the hudi 
> table folder. Also the *bootstrap schema* we are computing directly reads 
> schema from the source data file which does not have the *partition column 
> schema* in it. Thus it is not complete.
> All this manifests into issues when we ultimately do *upserts* on these 
> bootstrapped files and they are fully bootstrapped. During upsert time the 
> schema evolves because the upsert dataframe needs to have partition column in 
> it for performing upserts. Thus ultimately the *upserted rows* have the 
> correct partition column value stored, while the other records which are 
> simply copied over from the metadata bootstrap file have missing partition 
> column in them. Thus, we observe a different behavior here with 
> *bootstrapped* vs *non-bootstrapped* tables.
> While this is not at the moment creating issues with *Hive* because it is 
> able to determine the partition columns becuase of all the metadata it 
> stores, however it creates a problem with other engines like *Spark* where 
> the partition columns will show up as *null* when the upserted files are read.
> Thus, the proposal is to fix the following issues:
>  * When performing bootstrap, figure out the partition schema and store it in 
> the *bootstrap schema* in the commit metadata file. This would provide the 
> following benefits:
>  ** From a completeness perspective this is good so that there is no 
> behavioral changes between bootstrapped vs non-bootstrapped tables.
>  ** In spark bootstrap relation and incremental query relation where we need 
> to figure out the latest schema, once can simply get the accurate schema from 
> the commit metadata file instead of having to determine whether or not 
> partition column is present in the schema obtained from the metadata file and 
> if not figure out the partition schema everytime and merge (which can be 
> expensive).
>  * When doing upsert on files that are metadata bootstrapped, the partition 
> column values should be correctly determined and copied to the upserted file 
> to avoid missing and null values.
>  ** Again this is consistent behavior with non-bootstrapped tables and even 
> though Hive seems to somehow handle this, we should consider other engines 
> like *Spark* where it cannot be automatically handled.
>  ** Without this it will be significantly more complicated to be able to 
> provide the partition value on read side in spark, to be able to determine 
> everytime whether partition value is null and somehow filling it in.
>  ** Once the table is fully bootstrapped at some point in future, and the 
> bootstrap commit is say cleaned up and spark querying happens through 
> *parquet* datasource instead of *new bootstrapped datasource*, the *parquet 
> datasource* will return null values wherever it find the missing partition 
> values. In that case, we have no control over the *parquet* datasource as it 
> is simply reading from the file. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-899) Add a knob to change partition-path style while performing metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-899:

Status: Open  (was: New)

> Add a knob to change partition-path style while performing metadata bootstrap
> -
>
> Key: HUDI-899
> URL: https://issues.apache.org/jira/browse/HUDI-899
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-899) Add a knob to change partition-path style while performing metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-899:

Status: In Progress  (was: Open)

> Add a knob to change partition-path style while performing metadata bootstrap
> -
>
> Key: HUDI-899
> URL: https://issues.apache.org/jira/browse/HUDI-899
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-899) Add a knob to change partition-path style while performing metadata bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-899:

Status: Patch Available  (was: In Progress)

> Add a knob to change partition-path style while performing metadata bootstrap
> -
>
> Key: HUDI-899
> URL: https://issues.apache.org/jira/browse/HUDI-899
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-429) Long Running Testing to certify Bootstrapping

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan reassigned HUDI-429:
---

Assignee: Balaji Varadarajan

> Long Running Testing to certify Bootstrapping
> -
>
> Key: HUDI-429
> URL: https://issues.apache.org/jira/browse/HUDI-429
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0
>
>
> It would be great if we run long running tests to perform bootstrapping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-429) Long Running Testing to certify Bootstrapping

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-429:

Status: In Progress  (was: Open)

> Long Running Testing to certify Bootstrapping
> -
>
> Key: HUDI-429
> URL: https://issues.apache.org/jira/browse/HUDI-429
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0
>
>
> It would be great if we run long running tests to perform bootstrapping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-806) Implement support for bootstrapping via Spark datasource API

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-806:

Status: Open  (was: New)

> Implement support for bootstrapping via Spark datasource API
> 
>
> Key: HUDI-806
> URL: https://issues.apache.org/jira/browse/HUDI-806
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Udit Mehrotra
>Assignee: Udit Mehrotra
>Priority: Major
>
> This Jira tracks the work required to perform bootstrapping through Spark 
> data source API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-806) Implement support for bootstrapping via Spark datasource API

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-806:

Status: Patch Available  (was: In Progress)

> Implement support for bootstrapping via Spark datasource API
> 
>
> Key: HUDI-806
> URL: https://issues.apache.org/jira/browse/HUDI-806
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Udit Mehrotra
>Assignee: Udit Mehrotra
>Priority: Major
>  Time Spent: 336h
>  Remaining Estimate: 0h
>
> This Jira tracks the work required to perform bootstrapping through Spark 
> data source API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-806) Implement support for bootstrapping via Spark datasource API

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-806:

Status: In Progress  (was: Open)

> Implement support for bootstrapping via Spark datasource API
> 
>
> Key: HUDI-806
> URL: https://issues.apache.org/jira/browse/HUDI-806
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Udit Mehrotra
>Assignee: Udit Mehrotra
>Priority: Major
>  Time Spent: 336h
>  Remaining Estimate: 0h
>
> This Jira tracks the work required to perform bootstrapping through Spark 
> data source API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-621) Presto Integration for supporting Bootstrapped table

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-621:

Status: Open  (was: New)

> Presto Integration for supporting Bootstrapped table
> 
>
> Key: HUDI-621
> URL: https://issues.apache.org/jira/browse/HUDI-621
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Presto Integration
>Reporter: Balaji Varadarajan
>Assignee: Udit Mehrotra
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-621) Presto Integration for supporting Bootstrapped table

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan reassigned HUDI-621:
---

Assignee: Udit Mehrotra

> Presto Integration for supporting Bootstrapped table
> 
>
> Key: HUDI-621
> URL: https://issues.apache.org/jira/browse/HUDI-621
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Presto Integration
>Reporter: Balaji Varadarajan
>Assignee: Udit Mehrotra
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-620) Hive Sync Integration of bootstrapped table

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-620:

Status: In Progress  (was: Open)

> Hive Sync Integration of bootstrapped table
> ---
>
> Key: HUDI-620
> URL: https://issues.apache.org/jira/browse/HUDI-620
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Hive Integration
>Reporter: Balaji Varadarajan
>Assignee: Udit Mehrotra
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-620) Hive Sync Integration of bootstrapped table

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-620:

Status: Patch Available  (was: In Progress)

> Hive Sync Integration of bootstrapped table
> ---
>
> Key: HUDI-620
> URL: https://issues.apache.org/jira/browse/HUDI-620
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Hive Integration
>Reporter: Balaji Varadarajan
>Assignee: Udit Mehrotra
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-620) Hive Sync Integration of bootstrapped table

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-620:

Status: Open  (was: New)

> Hive Sync Integration of bootstrapped table
> ---
>
> Key: HUDI-620
> URL: https://issues.apache.org/jira/browse/HUDI-620
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Hive Integration
>Reporter: Balaji Varadarajan
>Assignee: Udit Mehrotra
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-900) Metadata Bootstrap Key Generator needs to handle complex keys correctly

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-900:

Status: In Progress  (was: Open)

> Metadata Bootstrap Key Generator needs to handle complex keys correctly
> ---
>
> Key: HUDI-900
> URL: https://issues.apache.org/jira/browse/HUDI-900
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0
>
>
> Look at ComplexKeyGenerator. Make sure MetadataBootstrap is of same format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-900) Metadata Bootstrap Key Generator needs to handle complex keys correctly

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-900:

Status: Patch Available  (was: In Progress)

> Metadata Bootstrap Key Generator needs to handle complex keys correctly
> ---
>
> Key: HUDI-900
> URL: https://issues.apache.org/jira/browse/HUDI-900
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0
>
>
> Look at ComplexKeyGenerator. Make sure MetadataBootstrap is of same format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-427) Implement CLI support for performing bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-427:

Status: In Progress  (was: Open)

> Implement CLI support for performing bootstrap
> --
>
> Key: HUDI-427
> URL: https://issues.apache.org/jira/browse/HUDI-427
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: CLI
>Reporter: Balaji Varadarajan
>Assignee: Wenning Ding
>Priority: Major
> Fix For: 0.6.0
>
>
> Need CLI to perform bootstrap as described in 
> [https://cwiki.apache.org/confluence/display/HUDI/RFC+-+12+%3A+Efficient+Migration+of+Large+Parquet+Tables+to+Apache+Hudi]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-427) Implement CLI support for performing bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-427:

Status: Patch Available  (was: In Progress)

> Implement CLI support for performing bootstrap
> --
>
> Key: HUDI-427
> URL: https://issues.apache.org/jira/browse/HUDI-427
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: CLI
>Reporter: Balaji Varadarajan
>Assignee: Wenning Ding
>Priority: Major
> Fix For: 0.6.0
>
>
> Need CLI to perform bootstrap as described in 
> [https://cwiki.apache.org/confluence/display/HUDI/RFC+-+12+%3A+Efficient+Migration+of+Large+Parquet+Tables+to+Apache+Hudi]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-423) Implement upsert functionality for handling updates to these bootstrap file slices

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-423:

Status: Patch Available  (was: In Progress)

> Implement upsert functionality for handling updates to these bootstrap file 
> slices
> --
>
> Key: HUDI-423
> URL: https://issues.apache.org/jira/browse/HUDI-423
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Common Core, Writer Core
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0
>
>
> Needs support to handle upsert of these file-slices. For MOR tables, also 
> need compaction support. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-423) Implement upsert functionality for handling updates to these bootstrap file slices

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-423:

Status: In Progress  (was: Open)

> Implement upsert functionality for handling updates to these bootstrap file 
> slices
> --
>
> Key: HUDI-423
> URL: https://issues.apache.org/jira/browse/HUDI-423
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Common Core, Writer Core
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0
>
>
> Needs support to handle upsert of these file-slices. For MOR tables, also 
> need compaction support. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-420) Automated end to end Integration Test

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-420:

Status: Patch Available  (was: In Progress)

> Automated end to end Integration Test
> -
>
> Key: HUDI-420
> URL: https://issues.apache.org/jira/browse/HUDI-420
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0
>
>
> We need end to end test as part ITTestHoodieDemo to also include bootstrap 
> table cases.
> We can have a new table bootstrapped from the Hoodie table build in the demo 
> and ensure queries work and return same responses



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-420) Automated end to end Integration Test

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-420:

Status: In Progress  (was: Open)

> Automated end to end Integration Test
> -
>
> Key: HUDI-420
> URL: https://issues.apache.org/jira/browse/HUDI-420
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0
>
>
> We need end to end test as part ITTestHoodieDemo to also include bootstrap 
> table cases.
> We can have a new table bootstrapped from the Hoodie table build in the demo 
> and ensure queries work and return same responses



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-420) Automated end to end Integration Test

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan reassigned HUDI-420:
---

Assignee: Balaji Varadarajan

> Automated end to end Integration Test
> -
>
> Key: HUDI-420
> URL: https://issues.apache.org/jira/browse/HUDI-420
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0
>
>
> We need end to end test as part ITTestHoodieDemo to also include bootstrap 
> table cases.
> We can have a new table bootstrapped from the Hoodie table build in the demo 
> and ensure queries work and return same responses



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-425) Implement support for bootstrapping in HoodieDeltaStreamer

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-425:

Status: In Progress  (was: Open)

> Implement support for bootstrapping in HoodieDeltaStreamer
> --
>
> Key: HUDI-425
> URL: https://issues.apache.org/jira/browse/HUDI-425
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: DeltaStreamer
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
>  Labels: help-wanted
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-425) Implement support for bootstrapping in HoodieDeltaStreamer

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-425:

Status: Patch Available  (was: In Progress)

> Implement support for bootstrapping in HoodieDeltaStreamer
> --
>
> Key: HUDI-425
> URL: https://issues.apache.org/jira/browse/HUDI-425
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: DeltaStreamer
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
>  Labels: help-wanted
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-424) Implement Hive Query Side Integration for querying tables containing bootstrap file slices

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-424:

Description: 
Support for Hive read-optimized and realtime queries 

 

 

  was:
Includes 

(1) Hive Sync integration as part of bootstrap

(2) Hive read-optimized and realtime queries 

 

 


> Implement Hive Query Side Integration for querying tables containing 
> bootstrap file slices
> --
>
> Key: HUDI-424
> URL: https://issues.apache.org/jira/browse/HUDI-424
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Hive Integration
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0
>
>
> Support for Hive read-optimized and realtime queries 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-424) Implement Hive Query Side Integration for querying tables containing bootstrap file slices

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-424:

Status: Patch Available  (was: In Progress)

> Implement Hive Query Side Integration for querying tables containing 
> bootstrap file slices
> --
>
> Key: HUDI-424
> URL: https://issues.apache.org/jira/browse/HUDI-424
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Hive Integration
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0
>
>
> Support for Hive read-optimized and realtime queries 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-422) Cleanup bootstrap code and create write APIs for supporting bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-422:

Status: In Progress  (was: Open)

> Cleanup bootstrap code and create write APIs for supporting bootstrap 
> --
>
> Key: HUDI-422
> URL: https://issues.apache.org/jira/browse/HUDI-422
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0
>
>
> Once refactor for HoodieWriteClient is done, we can cleanup and introduce 
> HoodieBootstrapClient as a separate PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-422) Cleanup bootstrap code and create write APIs for supporting bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-422:

Status: Patch Available  (was: In Progress)

> Cleanup bootstrap code and create write APIs for supporting bootstrap 
> --
>
> Key: HUDI-422
> URL: https://issues.apache.org/jira/browse/HUDI-422
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0
>
>
> Once refactor for HoodieWriteClient is done, we can cleanup and introduce 
> HoodieBootstrapClient as a separate PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-424) Implement Hive Query Side Integration for querying tables containing bootstrap file slices

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-424:

Summary: Implement Hive Query Side Integration for querying tables 
containing bootstrap file slices  (was: Implement Query Side Integration for 
querying tables containing bootstrap file slices)

> Implement Hive Query Side Integration for querying tables containing 
> bootstrap file slices
> --
>
> Key: HUDI-424
> URL: https://issues.apache.org/jira/browse/HUDI-424
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Hive Integration
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0
>
>
> Includes 
> (1) Hive Sync integration as part of bootstrap
> (2) Hive read-optimized and realtime queries 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-422) Cleanup bootstrap code and create write APIs for supporting bootstrap

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-422:

Summary: Cleanup bootstrap code and create write APIs for supporting 
bootstrap   (was: Cleanup bootstrap code and create HoodieBootstrapClient for 
supporting bootstrap)

> Cleanup bootstrap code and create write APIs for supporting bootstrap 
> --
>
> Key: HUDI-422
> URL: https://issues.apache.org/jira/browse/HUDI-422
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0
>
>
> Once refactor for HoodieWriteClient is done, we can cleanup and introduce 
> HoodieBootstrapClient as a separate PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-424) Implement Query Side Integration for querying tables containing bootstrap file slices

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-424:

Status: In Progress  (was: Open)

> Implement Query Side Integration for querying tables containing bootstrap 
> file slices
> -
>
> Key: HUDI-424
> URL: https://issues.apache.org/jira/browse/HUDI-424
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Hive Integration
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0
>
>
> Includes 
> (1) Hive Sync integration as part of bootstrap
> (2) Hive read-optimized and realtime queries 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-421) Cleanup bootstrap code and create PR for FileStystemView changes

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-421:

Status: Patch Available  (was: In Progress)

> Cleanup bootstrap code and create PR for  FileStystemView changes
> -
>
> Key: HUDI-421
> URL: https://issues.apache.org/jira/browse/HUDI-421
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Common Core
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0
>
>
> FileSystemView needs changes to identify and handle bootstrap file slices. 
> Code changes are present in 
> [https://github.com/bvaradar/hudi/tree/vb_bootstrap] Needs cleanup before 
> they are ready to become PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-421) Cleanup bootstrap code and create PR for FileStystemView changes

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-421:

Status: In Progress  (was: Open)

> Cleanup bootstrap code and create PR for  FileStystemView changes
> -
>
> Key: HUDI-421
> URL: https://issues.apache.org/jira/browse/HUDI-421
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Common Core
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0
>
>
> FileSystemView needs changes to identify and handle bootstrap file slices. 
> Code changes are present in 
> [https://github.com/bvaradar/hudi/tree/vb_bootstrap] Needs cleanup before 
> they are ready to become PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-419) Basic Implementation for verifying if bootstrapping works end to end

2020-05-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan closed HUDI-419.
---

> Basic Implementation for verifying if bootstrapping works end to end
> 
>
> Key: HUDI-419
> URL: https://issues.apache.org/jira/browse/HUDI-419
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Common Core, Hive Integration, Writer Core
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As part of prototyping, I have most of the core functionalities in 
> [https://github.com/bvaradar/hudi/tree/vb_bootstrap]
>  
> This includes:
>  # Timeline and FileSystem View changes
>  # New Bootstrap Client to perform Bootstrap
>  # DeltaStreamer Integration
>  # Hive Parquet Read Optimized reader integration
>  
> Needs to be done:
>  # Merge Handle changes to support upsert over bootstrap file slice (Read 
> part similar to that of (4) functionally and write part same as that of 
> current Hoodie MergeHandle.
>  # Unit Testing 
>  # Code cleanup as the current implementation has duplicated code.
>  # Automated integration test
>  # Hoodie CLI and Spark DataSource Write integration



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >