[jira] [Created] (GOBBLIN-1223) Change the criteria for re-compaction, limit the time for re-compaction

2020-07-28 Thread Zihan Li (Jira)
Zihan Li created GOBBLIN-1223:
-

 Summary: Change the criteria for re-compaction, limit the time for 
re-compaction
 Key: GOBBLIN-1223
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1223
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zihan Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1210) Force AM to read from token file to update token when start up

2020-07-24 Thread Zihan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihan Li resolved GOBBLIN-1210.
---
Resolution: Fixed

> Force AM to read from token file to update token when start up
> --
>
> Key: GOBBLIN-1210
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1210
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zihan Li
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1185) Enable dataset cleaner to emit kafka events

2020-07-24 Thread Zihan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihan Li resolved GOBBLIN-1185.
---
Resolution: Fixed

> Enable dataset cleaner to emit kafka events
> ---
>
> Key: GOBBLIN-1185
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1185
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zihan Li
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1165) Add config to enable user to set additional yarn classpathes

2020-07-24 Thread Zihan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihan Li resolved GOBBLIN-1165.
---
Resolution: Fixed

> Add config to enable user to set additional yarn classpathes
> 
>
> Key: GOBBLIN-1165
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1165
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zihan Li
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1183) Enable additional yarn class path set for app master

2020-07-24 Thread Zihan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihan Li resolved GOBBLIN-1183.
---
Resolution: Fixed

> Enable additional yarn class path set for app master
> 
>
> Key: GOBBLIN-1183
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1183
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zihan Li
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1158) Use input dir to document old files instead of file pathes to reduce memory cost in Compaction configurator

2020-07-24 Thread Zihan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihan Li resolved GOBBLIN-1158.
---
Resolution: Fixed

> Use input dir to document old files instead of file pathes to reduce memory 
> cost in Compaction configurator
> ---
>
> Key: GOBBLIN-1158
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1158
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zihan Li
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1147) Use one dfsClient in FsDataWriter to to rename and exists check to avoid inconsistency

2020-07-24 Thread Zihan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihan Li resolved GOBBLIN-1147.
---
Resolution: Fixed

> Use one dfsClient in FsDataWriter to to rename and exists check to avoid 
> inconsistency
> --
>
> Key: GOBBLIN-1147
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1147
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zihan Li
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1136) Make LogCopier be able to refresh FileSystem for long running job use cases

2020-07-24 Thread Zihan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihan Li resolved GOBBLIN-1136.
---
Resolution: Fixed

> Make LogCopier be able to refresh FileSystem for long running job use cases
> ---
>
> Key: GOBBLIN-1136
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1136
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zihan Li
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Make LogCopier be able to refresh FileSystem for long running job use cases



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1143) Add a generic wrapper producer client to communicate with Kafka

2020-07-24 Thread Zihan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihan Li resolved GOBBLIN-1143.
---
Resolution: Fixed

> Add a generic wrapper producer client to communicate with Kafka
> ---
>
> Key: GOBBLIN-1143
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1143
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zihan Li
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Add a generic wrapper producer client to communicate with Kafka and and it's 
> implementation of kafka08 producer and kafka09producer



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1133) Add CompactionSuiteBaseWithConfigurableCompleteAction to make complete action configurable

2020-07-24 Thread Zihan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihan Li resolved GOBBLIN-1133.
---
Resolution: Fixed

> Add CompactionSuiteBaseWithConfigurableCompleteAction to make complete action 
> configurable
> --
>
> Key: GOBBLIN-1133
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1133
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zihan Li
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> # Add CompactionSuiteBaseWithConfigurableCompleteAction to make complete 
> action configurable
>  # Include dstNewFiles and oldFiles in CompactionJobConfigurator



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1121) Fix Issue that YarnService use the old token to acquire new container

2020-07-24 Thread Zihan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihan Li resolved GOBBLIN-1121.
---
Resolution: Fixed

> Fix Issue that YarnService use the old token to acquire new container
> -
>
> Key: GOBBLIN-1121
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1121
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zihan Li
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1080) Add configuration to preserve schema creation time in converter

2020-07-24 Thread Zihan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihan Li resolved GOBBLIN-1080.
---
Resolution: Fixed

> Add configuration to preserve schema creation time in converter
> ---
>
> Key: GOBBLIN-1080
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1080
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zihan Li
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1069) Add NPE check in handleContainerCompletion method

2020-07-24 Thread Zihan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihan Li resolved GOBBLIN-1069.
---
Resolution: Fixed

> Add NPE check in handleContainerCompletion method
> -
>
> Key: GOBBLIN-1069
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1069
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zihan Li
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Add NPE check in handleContainerCompletion method to make sure call the 
> method twice for the same container will not fail the job



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1064) Make KafkaAvroSchemaRegistry extendable

2020-07-24 Thread Zihan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihan Li resolved GOBBLIN-1064.
---
Resolution: Fixed

> Make KafkaAvroSchemaRegistry extendable
> ---
>
> Key: GOBBLIN-1064
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1064
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zihan Li
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1077) Fix bug in HiveDataset.resolveConfig

2020-07-24 Thread Zihan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihan Li resolved GOBBLIN-1077.
---
Resolution: Fixed

> Fix bug in HiveDataset.resolveConfig
> 
>
> Key: GOBBLIN-1077
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1077
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zihan Li
>Priority: Major
>
> In resolveConfig, we get a config object, and resolve the value and put all 
> of them in a property object without desanitize the key. And when transform 
> the property object back to config, there is a chance to get runTime 
> exception. 
> Solution: directly construct a config object instead config->property->config



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1023) Fix the issue of lossing data when trying to commit

2020-07-24 Thread Zihan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihan Li resolved GOBBLIN-1023.
---
Resolution: Won't Fix

> Fix the issue of lossing data when trying to commit
> ---
>
> Key: GOBBLIN-1023
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1023
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zihan Li
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-981) Handle backward compatibility issue in HiveSource

2020-07-24 Thread Zihan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihan Li resolved GOBBLIN-981.
--
Resolution: Fixed

> Handle backward compatibility issue in HiveSource
> -
>
> Key: GOBBLIN-981
> URL: https://issues.apache.org/jira/browse/GOBBLIN-981
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zihan Li
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-986) Persist the existing property of table when doing hive registration

2020-07-24 Thread Zihan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihan Li resolved GOBBLIN-986.
--
Resolution: Fixed

> Persist the existing property of table when doing hive registration
> ---
>
> Key: GOBBLIN-986
> URL: https://issues.apache.org/jira/browse/GOBBLIN-986
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zihan Li
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-975) Add flag to enable/disable avro type check in AvroToOrc

2020-07-24 Thread Zihan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihan Li resolved GOBBLIN-975.
--
Resolution: Fixed

> Add flag to enable/disable avro type check in AvroToOrc 
> 
>
> Key: GOBBLIN-975
> URL: https://issues.apache.org/jira/browse/GOBBLIN-975
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zihan Li
>Priority: Major
>
> Add flag to enable/disable avro type check when trying to get the schema. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-941) Enhance DDL to add column and column.types with case-preserving schema

2020-07-24 Thread Zihan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihan Li resolved GOBBLIN-941.
--
Resolution: Fixed

> Enhance DDL to add column and column.types with case-preserving schema
> --
>
> Key: GOBBLIN-941
> URL: https://issues.apache.org/jira/browse/GOBBLIN-941
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zihan Li
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Enhance DDL to add column and column.types with case-preserving schema which 
> would enforce avro2orc output preserving correct casing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-924) Get rid of orc.schema.literal in ORC-ingestion and registration

2020-07-24 Thread Zihan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihan Li resolved GOBBLIN-924.
--
Resolution: Fixed

> Get rid of orc.schema.literal in ORC-ingestion and registration
> ---
>
> Key: GOBBLIN-924
> URL: https://issues.apache.org/jira/browse/GOBBLIN-924
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zihan Li
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-916) Make ContainerLaunchContext instantiation in YarnService more efficient

2020-07-24 Thread Zihan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihan Li resolved GOBBLIN-916.
--
Resolution: Fixed

> Make ContainerLaunchContext instantiation in YarnService more efficient
> ---
>
> Key: GOBBLIN-916
> URL: https://issues.apache.org/jira/browse/GOBBLIN-916
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zihan Li
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-921) Make pull/push mode when registering partition to be configurable

2020-07-24 Thread Zihan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihan Li resolved GOBBLIN-921.
--
Resolution: Fixed

> Make pull/push mode when registering partition to be configurable
> -
>
> Key: GOBBLIN-921
> URL: https://issues.apache.org/jira/browse/GOBBLIN-921
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zihan Li
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In pull mode, register will first try to check if the partition has already 
> existed to reduce the call to add_partition. In push mode, register will try 
> to call add_partition directly and relying on the exception to determine 
> whether existed which mode should be used when most of the partition the 
> HiveRegister try to register is not existed to reduce the call to 
> HiveMetaStore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-912) Enable TTL caching on Hive Metastore client connection

2020-07-24 Thread Zihan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihan Li resolved GOBBLIN-912.
--
Resolution: Fixed

> Enable TTL caching on Hive Metastore client connection
> --
>
> Key: GOBBLIN-912
> URL: https://issues.apache.org/jira/browse/GOBBLIN-912
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zihan Li
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-902) Enable gobblin yarn app luncher class configurable

2020-07-24 Thread Zihan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihan Li resolved GOBBLIN-902.
--
Resolution: Fixed

> Enable gobblin yarn app luncher class configurable
> --
>
> Key: GOBBLIN-902
> URL: https://issues.apache.org/jira/browse/GOBBLIN-902
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zihan Li
>Priority: Major
>
> Enable gobblin yarn app luncher class configurable



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-899) Add a key in dataset config to disable schema check for a specific dataset

2020-07-24 Thread Zihan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihan Li resolved GOBBLIN-899.
--
Resolution: Fixed

> Add a key in dataset config to disable schema check for a specific dataset
> --
>
> Key: GOBBLIN-899
> URL: https://issues.apache.org/jira/browse/GOBBLIN-899
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zihan Li
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-877) Add column metadata for partition for inline hive registration

2020-07-24 Thread Zihan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihan Li resolved GOBBLIN-877.
--
Resolution: Fixed

> Add column metadata for partition for inline hive registration
> --
>
> Key: GOBBLIN-877
> URL: https://issues.apache.org/jira/browse/GOBBLIN-877
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zihan Li
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Previously, we remove the schema.literal for partition.  Because Avro schemas 
> should _only_ be defined at the table level. Hive overrides table properties 
> if the same property is defined on the partition. Defining them at the 
> partition level may lead to partitions with inconsistent schemas. And because 
> column metadata is calculated from schema.literal, so we remove the column 
> metadata as well.
> Then we encounter a problem that presto cannot read data from orc file. 
> Because ORC (and other Hive serdes) need metadata in the partitions so that 
> coercion can be done between a partition schema and the table schema.
> So we need to treat Avro and other formate separately to make sure hive 
> registration works well so that user can read right data from Presto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-863) Handle race condition between concurrent Gobblin tasks performing Hive registration

2020-07-24 Thread Zihan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihan Li resolved GOBBLIN-863.
--
Resolution: Fixed

> Handle race condition between concurrent Gobblin tasks performing Hive 
> registration
> ---
>
> Key: GOBBLIN-863
> URL: https://issues.apache.org/jira/browse/GOBBLIN-863
> Project: Apache Gobblin
>  Issue Type: Task
>  Components: hive-registration
>Reporter: Zihan Li
>Assignee: Abhishek Tiwari
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-861) Skip getPartition() call to Hive Metastore when a partition already exists

2020-07-24 Thread Zihan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihan Li resolved GOBBLIN-861.
--
Resolution: Fixed

> Skip getPartition() call to Hive Metastore when a partition already exists
> --
>
> Key: GOBBLIN-861
> URL: https://issues.apache.org/jira/browse/GOBBLIN-861
> Project: Apache Gobblin
>  Issue Type: Task
>  Components: hive-registration
>Reporter: Zihan Li
>Assignee: Abhishek Tiwari
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Currently, we compute a diff between the current partition and an already 
> registered partition when a partition has already been registered in Hive. 
> This is done by calling getPartition() on the Hive metastore client, which 
> can be expensive. Since no time-varying attributes are stored in a Hive 
> partition, diff computation (and getPartition() call) can be skipped when a 
> partition already exists.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-859) Let writer pass latest schema to workUnitState when schema change

2020-07-24 Thread Zihan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihan Li resolved GOBBLIN-859.
--
Resolution: Fixed

> Let writer pass latest schema to workUnitState when schema change
> -
>
> Key: GOBBLIN-859
> URL: https://issues.apache.org/jira/browse/GOBBLIN-859
> Project: Apache Gobblin
>  Issue Type: Task
>  Components: gobblin-core
>Reporter: Zihan Li
>Assignee: Abhishek Tiwari
>Priority: Major
>
> Let writer pass latest schema to workUnitState when initialize the writer and 
> schema change so that hive registration can directly get the latest schema 
> without maintain a logic to compute the latest schema version



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-852) Reorganize the code for hive registration to isolate function

2020-07-24 Thread Zihan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihan Li resolved GOBBLIN-852.
--
Resolution: Fixed

> Reorganize the code for hive registration to isolate function
> -
>
> Key: GOBBLIN-852
> URL: https://issues.apache.org/jira/browse/GOBBLIN-852
> Project: Apache Gobblin
>  Issue Type: Task
>  Components: hive-registration
>Reporter: Zihan Li
>Assignee: Abhishek Tiwari
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-806) Enable metrics reporter during dataset discovery for retention job

2020-07-24 Thread Zihan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihan Li resolved GOBBLIN-806.
--
Resolution: Fixed

> Enable metrics reporter during dataset discovery for retention job
> --
>
> Key: GOBBLIN-806
> URL: https://issues.apache.org/jira/browse/GOBBLIN-806
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zihan Li
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-1210) Force AM to read from token file to update token when start up

2020-07-08 Thread Zihan Li (Jira)
Zihan Li created GOBBLIN-1210:
-

 Summary: Force AM to read from token file to update token when 
start up
 Key: GOBBLIN-1210
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1210
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zihan Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-1185) Enable dataset cleaner to emit kafka events

2020-06-09 Thread Zihan Li (Jira)
Zihan Li created GOBBLIN-1185:
-

 Summary: Enable dataset cleaner to emit kafka events
 Key: GOBBLIN-1185
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1185
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zihan Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-1183) Enable additional yarn class path set for app master

2020-06-04 Thread Zihan Li (Jira)
Zihan Li created GOBBLIN-1183:
-

 Summary: Enable additional yarn class path set for app master
 Key: GOBBLIN-1183
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1183
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zihan Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-1165) Add config to enable user to set additional yarn classpathes

2020-05-28 Thread Zihan Li (Jira)
Zihan Li created GOBBLIN-1165:
-

 Summary: Add config to enable user to set additional yarn 
classpathes
 Key: GOBBLIN-1165
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1165
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zihan Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-1158) Use input dir to document old files instead of file pathes to reduce memory cost

2020-05-21 Thread Zihan Li (Jira)
Zihan Li created GOBBLIN-1158:
-

 Summary: Use input dir to document old files instead of file 
pathes to reduce memory cost
 Key: GOBBLIN-1158
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1158
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zihan Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GOBBLIN-1158) Use input dir to document old files instead of file pathes to reduce memory cost in Compaction configurator

2020-05-21 Thread Zihan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihan Li updated GOBBLIN-1158:
--
Summary: Use input dir to document old files instead of file pathes to 
reduce memory cost in Compaction configurator  (was: Use input dir to document 
old files instead of file pathes to reduce memory cost)

> Use input dir to document old files instead of file pathes to reduce memory 
> cost in Compaction configurator
> ---
>
> Key: GOBBLIN-1158
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1158
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zihan Li
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-1147) Use one dfsClient in FsDataWriter to to rename and exists check to avoid inconsistency

2020-05-13 Thread Zihan Li (Jira)
Zihan Li created GOBBLIN-1147:
-

 Summary: Use one dfsClient in FsDataWriter to to rename and exists 
check to avoid inconsistency
 Key: GOBBLIN-1147
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1147
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zihan Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-1143) Add a generic wrapper producer client to communicate with Kafka

2020-05-05 Thread Zihan Li (Jira)
Zihan Li created GOBBLIN-1143:
-

 Summary: Add a generic wrapper producer client to communicate with 
Kafka
 Key: GOBBLIN-1143
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1143
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zihan Li


Add a generic wrapper producer client to communicate with Kafka and and it's 
implementation of kafka08 producer and kafka09producer



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-1136) Make LogCopier be able to refresh FileSystem for long running job use cases

2020-04-30 Thread Zihan Li (Jira)
Zihan Li created GOBBLIN-1136:
-

 Summary: Make LogCopier be able to refresh FileSystem for long 
running job use cases
 Key: GOBBLIN-1136
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1136
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zihan Li


Make LogCopier be able to refresh FileSystem for long running job use cases



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-1133) Add CompactionSuiteBaseWithConfigurableCompleteAction to make complete action configurable

2020-04-29 Thread Zihan Li (Jira)
Zihan Li created GOBBLIN-1133:
-

 Summary: Add CompactionSuiteBaseWithConfigurableCompleteAction to 
make complete action configurable
 Key: GOBBLIN-1133
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1133
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zihan Li


# Add CompactionSuiteBaseWithConfigurableCompleteAction to make complete action 
configurable
 # Include dstNewFiles and oldFiles in CompactionJobConfigurator



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-1121) Fix Issue that YarnService use the old token to acquire new container

2020-04-20 Thread Zihan Li (Jira)
Zihan Li created GOBBLIN-1121:
-

 Summary: Fix Issue that YarnService use the old token to acquire 
new container
 Key: GOBBLIN-1121
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1121
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zihan Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-1093) Use method overloading in AvroUtils for add creation time

2020-03-20 Thread Zihan Li (Jira)
Zihan Li created GOBBLIN-1093:
-

 Summary: Use method overloading in AvroUtils for add creation time
 Key: GOBBLIN-1093
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1093
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zihan Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-1080) Add configuration to preserve schema creation time in converter

2020-03-11 Thread Zihan Li (Jira)
Zihan Li created GOBBLIN-1080:
-

 Summary: Add configuration to preserve schema creation time in 
converter
 Key: GOBBLIN-1080
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1080
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zihan Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-1077) Fix bug in HiveDataset.resolveConfig

2020-03-10 Thread Zihan Li (Jira)
Zihan Li created GOBBLIN-1077:
-

 Summary: Fix bug in HiveDataset.resolveConfig
 Key: GOBBLIN-1077
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1077
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zihan Li


In resolveConfig, we get a config object, and resolve the value and put all of 
them in a property object without desanitize the key. And when transform the 
property object back to config, there is a chance to get runTime exception. 

Solution: directly construct a config object instead config->property->config



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-1069) Add NPE check in handleContainerCompletion method

2020-03-04 Thread Zihan Li (Jira)
Zihan Li created GOBBLIN-1069:
-

 Summary: Add NPE check in handleContainerCompletion method
 Key: GOBBLIN-1069
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1069
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zihan Li


Add NPE check in handleContainerCompletion method to make sure call the method 
twice for the same container will not fail the job



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GOBBLIN-1064) Make KafkaAvroSchemaRegistry extendable

2020-03-02 Thread Zihan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihan Li updated GOBBLIN-1064:
--
Summary: Make KafkaAvroSchemaRegistry extendable  (was: Add writer's schema 
to workUnitState)

> Make KafkaAvroSchemaRegistry extendable
> ---
>
> Key: GOBBLIN-1064
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1064
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zihan Li
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-1064) Add writer's schema to workUnitState

2020-02-27 Thread Zihan Li (Jira)
Zihan Li created GOBBLIN-1064:
-

 Summary: Add writer's schema to workUnitState
 Key: GOBBLIN-1064
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1064
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zihan Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GOBBLIN-1023) Fix the issue of lossing data when trying to commit

2020-01-13 Thread Zihan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihan Li updated GOBBLIN-1023:
--
Summary: Fix the issue of lossing data when trying to commit  (was: Fix the 
issue of losing data when trying to commit)

> Fix the issue of lossing data when trying to commit
> ---
>
> Key: GOBBLIN-1023
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1023
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zihan Li
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-1023) Fix the issue of losing data when trying to commit

2020-01-13 Thread Zihan Li (Jira)
Zihan Li created GOBBLIN-1023:
-

 Summary: Fix the issue of losing data when trying to commit
 Key: GOBBLIN-1023
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1023
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zihan Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-986) Persist the existing property of table when doing hive registration

2019-11-27 Thread Zihan Li (Jira)
Zihan Li created GOBBLIN-986:


 Summary: Persist the existing property of table when doing hive 
registration
 Key: GOBBLIN-986
 URL: https://issues.apache.org/jira/browse/GOBBLIN-986
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zihan Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-981) Handle backward compatibility issue in HiveSource

2019-11-25 Thread Zihan Li (Jira)
Zihan Li created GOBBLIN-981:


 Summary: Handle backward compatibility issue in HiveSource
 Key: GOBBLIN-981
 URL: https://issues.apache.org/jira/browse/GOBBLIN-981
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zihan Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-975) Add flag to enable/disable avro type check in AvroToOrc

2019-11-21 Thread Zihan Li (Jira)
Zihan Li created GOBBLIN-975:


 Summary: Add flag to enable/disable avro type check in AvroToOrc 
 Key: GOBBLIN-975
 URL: https://issues.apache.org/jira/browse/GOBBLIN-975
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zihan Li


Add flag to enable/disable avro type check when trying to get the schema. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-967) Change token refresh method in YarnContainerSecirityManager

2019-11-15 Thread Zihan Li (Jira)
Zihan Li created GOBBLIN-967:


 Summary: Change token refresh method in 
YarnContainerSecirityManager
 Key: GOBBLIN-967
 URL: https://issues.apache.org/jira/browse/GOBBLIN-967
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zihan Li


Change token refresh method in YarnContainerSecirityManager from adding token 
to directly adding credentials to make sure all the new credentials will be 
updated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-961) Bypass locked partitions when calculating src watermark

2019-11-13 Thread Zihan Li (Jira)
Zihan Li created GOBBLIN-961:


 Summary: Bypass locked partitions when calculating src watermark
 Key: GOBBLIN-961
 URL: https://issues.apache.org/jira/browse/GOBBLIN-961
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zihan Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-941) Enhance DDL to add column and column.types with case-preserving schema

2019-10-31 Thread Zihan Li (Jira)
Zihan Li created GOBBLIN-941:


 Summary: Enhance DDL to add column and column.types with 
case-preserving schema
 Key: GOBBLIN-941
 URL: https://issues.apache.org/jira/browse/GOBBLIN-941
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zihan Li


Enhance DDL to add column and column.types with case-preserving schema which 
would enforce avro2orc output preserving correct casing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-924) Get rid of orc.schema.literal in ORC-ingestion and registration

2019-10-23 Thread Zihan Li (Jira)
Zihan Li created GOBBLIN-924:


 Summary: Get rid of orc.schema.literal in ORC-ingestion and 
registration
 Key: GOBBLIN-924
 URL: https://issues.apache.org/jira/browse/GOBBLIN-924
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zihan Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-921) Make pull/push mode when registering partition to be configurable

2019-10-22 Thread Zihan Li (Jira)
Zihan Li created GOBBLIN-921:


 Summary: Make pull/push mode when registering partition to be 
configurable
 Key: GOBBLIN-921
 URL: https://issues.apache.org/jira/browse/GOBBLIN-921
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zihan Li


In pull mode, register will first try to check if the partition has already 
existed to reduce the call to add_partition. In push mode, register will try to 
call add_partition directly and relying on the exception to determine whether 
existed which mode should be used when most of the partition the HiveRegister 
try to register is not existed to reduce the call to HiveMetaStore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-916) Make ContainerLaunchContext instantiation in YarnService more efficient

2019-10-17 Thread Zihan Li (Jira)
Zihan Li created GOBBLIN-916:


 Summary: Make ContainerLaunchContext instantiation in YarnService 
more efficient
 Key: GOBBLIN-916
 URL: https://issues.apache.org/jira/browse/GOBBLIN-916
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zihan Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-912) Enable TTL caching on Hive Metastore client connection

2019-10-16 Thread Zihan Li (Jira)
Zihan Li created GOBBLIN-912:


 Summary: Enable TTL caching on Hive Metastore client connection
 Key: GOBBLIN-912
 URL: https://issues.apache.org/jira/browse/GOBBLIN-912
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zihan Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-902) Enable gobblin yarn app luncher class configurable

2019-10-07 Thread Zihan Li (Jira)
Zihan Li created GOBBLIN-902:


 Summary: Enable gobblin yarn app luncher class configurable
 Key: GOBBLIN-902
 URL: https://issues.apache.org/jira/browse/GOBBLIN-902
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zihan Li


Enable gobblin yarn app luncher class configurable



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-899) Add a key in dataset config to disable schema check for a specific dataset

2019-10-03 Thread Zihan Li (Jira)
Zihan Li created GOBBLIN-899:


 Summary: Add a key in dataset config to disable schema check for a 
specific dataset
 Key: GOBBLIN-899
 URL: https://issues.apache.org/jira/browse/GOBBLIN-899
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zihan Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-877) Add column metadata for partition for inline hive registration

2019-09-09 Thread Zihan Li (Jira)
Zihan Li created GOBBLIN-877:


 Summary: Add column metadata for partition for inline hive 
registration
 Key: GOBBLIN-877
 URL: https://issues.apache.org/jira/browse/GOBBLIN-877
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zihan Li


Previously, we remove the schema.literal for partition.  Because Avro schemas 
should _only_ be defined at the table level. Hive overrides table properties if 
the same property is defined on the partition. Defining them at the partition 
level may lead to partitions with inconsistent schemas. And because column 
metadata is calculated from schema.literal, so we remove the column metadata as 
well.

Then we encounter a problem that presto cannot read data from orc file. Because 
ORC (and other Hive serdes) need metadata in the partitions so that coercion 
can be done between a partition schema and the table schema.

So we need to treat Avro and other formate separately to make sure hive 
registration works well so that user can read right data from Presto.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (GOBBLIN-872) Only use one CouchbaseEnvironment per JVM

2019-09-06 Thread Zihan Li (Jira)
Zihan Li created GOBBLIN-872:


 Summary: Only use one CouchbaseEnvironment per JVM
 Key: GOBBLIN-872
 URL: https://issues.apache.org/jira/browse/GOBBLIN-872
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zihan Li






--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (GOBBLIN-863) Handle race condition between concurrent Gobblin tasks performing Hive registration

2019-08-21 Thread Zihan Li (Jira)
Zihan Li created GOBBLIN-863:


 Summary: Handle race condition between concurrent Gobblin tasks 
performing Hive registration
 Key: GOBBLIN-863
 URL: https://issues.apache.org/jira/browse/GOBBLIN-863
 Project: Apache Gobblin
  Issue Type: Task
  Components: hive-registration
Reporter: Zihan Li
Assignee: Abhishek Tiwari






--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (GOBBLIN-861) Skip getPartition() call to Hive Metastore when a partition already exists

2019-08-15 Thread Zihan Li (JIRA)
Zihan Li created GOBBLIN-861:


 Summary: Skip getPartition() call to Hive Metastore when a 
partition already exists
 Key: GOBBLIN-861
 URL: https://issues.apache.org/jira/browse/GOBBLIN-861
 Project: Apache Gobblin
  Issue Type: Task
  Components: hive-registration
Reporter: Zihan Li
Assignee: Abhishek Tiwari


Currently, we compute a diff between the current partition and an already 
registered partition when a partition has already been registered in Hive. This 
is done by calling getPartition() on the Hive metastore client, which can be 
expensive. Since no time-varying attributes are stored in a Hive partition, 
diff computation (and getPartition() call) can be skipped when a partition 
already exists.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (GOBBLIN-859) Let writer pass latest schema to workUnitState when schema change

2019-08-14 Thread Zihan Li (JIRA)
Zihan Li created GOBBLIN-859:


 Summary: Let writer pass latest schema to workUnitState when 
schema change
 Key: GOBBLIN-859
 URL: https://issues.apache.org/jira/browse/GOBBLIN-859
 Project: Apache Gobblin
  Issue Type: Task
  Components: gobblin-core
Reporter: Zihan Li
Assignee: Abhishek Tiwari


Let writer pass latest schema to workUnitState when initialize the writer and 
schema change so that hive registration can directly get the latest schema 
without maintain a logic to compute the latest schema version



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (GOBBLIN-852) Reorganize the code for hive registration to isolate function

2019-08-12 Thread Zihan Li (JIRA)
Zihan Li created GOBBLIN-852:


 Summary: Reorganize the code for hive registration to isolate 
function
 Key: GOBBLIN-852
 URL: https://issues.apache.org/jira/browse/GOBBLIN-852
 Project: Apache Gobblin
  Issue Type: Task
  Components: hive-registration
Reporter: Zihan Li
Assignee: Abhishek Tiwari






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (GOBBLIN-806) Enable metrics reporter during dataset discovery for retention job

2019-06-17 Thread Zihan Li (JIRA)
Zihan Li created GOBBLIN-806:


 Summary: Enable metrics reporter during dataset discovery for 
retention job
 Key: GOBBLIN-806
 URL: https://issues.apache.org/jira/browse/GOBBLIN-806
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zihan Li






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-799) Bugs in AvroSchemaCheckDefaultStrategy that not return after check ENUM and FIXED

2019-06-07 Thread Zihan Li (JIRA)
Zihan Li created GOBBLIN-799:


 Summary: Bugs in  AvroSchemaCheckDefaultStrategy that not return 
after check ENUM and FIXED
 Key: GOBBLIN-799
 URL: https://issues.apache.org/jira/browse/GOBBLIN-799
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Zihan Li


There are bugs in  AvroSchemaCheckDefaultStrategy that not return after check 
ENUM and FIXED, just need to add return statement



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-772) Implement Schema Comparison Strategy during Disctp

2019-05-16 Thread Zihan Li (JIRA)
Zihan Li created GOBBLIN-772:


 Summary: Implement Schema Comparison Strategy during Disctp
 Key: GOBBLIN-772
 URL: https://issues.apache.org/jira/browse/GOBBLIN-772
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zihan Li


We need a schema comparison strategy to make sure the real schema and the 
expected schema have matching field names and types.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-747) Set expected schema when creating workunits

2019-04-22 Thread Zihan Li (JIRA)
Zihan Li created GOBBLIN-747:


 Summary: Set expected schema when creating workunits
 Key: GOBBLIN-747
 URL: https://issues.apache.org/jira/browse/GOBBLIN-747
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Zihan Li


Set the property of gobblin.copy.expectedSchema when creating the workunit to 
enable schema check in distcp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-726) Enable Schema Verification During Primary Dataset Deployment

2019-04-08 Thread Zihan Li (JIRA)
Zihan Li created GOBBLIN-726:


 Summary: Enable Schema Verification During Primary Dataset 
Deployment
 Key: GOBBLIN-726
 URL: https://issues.apache.org/jira/browse/GOBBLIN-726
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Zihan Li


Each distcp mapper will first read the schema of the file to be copied, and 
abort if the file schema does not match the expected schema. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-717) Filter Out Empty MultiWorkUnits

2019-03-29 Thread Zihan Li (JIRA)
Zihan Li created GOBBLIN-717:


 Summary: Filter Out Empty MultiWorkUnits
 Key: GOBBLIN-717
 URL: https://issues.apache.org/jira/browse/GOBBLIN-717
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Zihan Li


Now when we run a job, Gobblin use the value of max mappers or the target size 
of a mapper to determine the number of mappers. But since one partition cannot 
be divided into several WorkUnits, work cannot be evenly distributed, there are 
many mappers(MultiWorkUnits) have no work to do. This will waste a lot of 
resources. So we need to filter out MultiWorkUnits which contains no WorkUnit 
when we determine the number of mappers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-715) Unit test for KafkaSoure

2019-03-27 Thread Zihan Li (JIRA)
Zihan Li created GOBBLIN-715:


 Summary: Unit test for KafkaSoure
 Key: GOBBLIN-715
 URL: https://issues.apache.org/jira/browse/GOBBLIN-715
 Project: Apache Gobblin
  Issue Type: Test
Reporter: Zihan Li


We have an abstract class KafkaSource which contains a function called 
getWorkunits that be used in many use cases. But we have no unit test for this 
function. We should implement a simple subclass of KafkaSource and have a unit 
test to test the logic inside the function to make sure it returns the desired 
WorkUnits.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)