[jira] [Created] (GOBBLIN-1223) Change the criteria for re-compaction, limit the time for re-compaction
Zihan Li created GOBBLIN-1223: - Summary: Change the criteria for re-compaction, limit the time for re-compaction Key: GOBBLIN-1223 URL: https://issues.apache.org/jira/browse/GOBBLIN-1223 Project: Apache Gobblin Issue Type: Task Reporter: Zihan Li -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GOBBLIN-1210) Force AM to read from token file to update token when start up
[ https://issues.apache.org/jira/browse/GOBBLIN-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zihan Li resolved GOBBLIN-1210. --- Resolution: Fixed > Force AM to read from token file to update token when start up > -- > > Key: GOBBLIN-1210 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1210 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zihan Li >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GOBBLIN-1185) Enable dataset cleaner to emit kafka events
[ https://issues.apache.org/jira/browse/GOBBLIN-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zihan Li resolved GOBBLIN-1185. --- Resolution: Fixed > Enable dataset cleaner to emit kafka events > --- > > Key: GOBBLIN-1185 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1185 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zihan Li >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GOBBLIN-1165) Add config to enable user to set additional yarn classpathes
[ https://issues.apache.org/jira/browse/GOBBLIN-1165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zihan Li resolved GOBBLIN-1165. --- Resolution: Fixed > Add config to enable user to set additional yarn classpathes > > > Key: GOBBLIN-1165 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1165 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zihan Li >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GOBBLIN-1183) Enable additional yarn class path set for app master
[ https://issues.apache.org/jira/browse/GOBBLIN-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zihan Li resolved GOBBLIN-1183. --- Resolution: Fixed > Enable additional yarn class path set for app master > > > Key: GOBBLIN-1183 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1183 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zihan Li >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GOBBLIN-1158) Use input dir to document old files instead of file pathes to reduce memory cost in Compaction configurator
[ https://issues.apache.org/jira/browse/GOBBLIN-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zihan Li resolved GOBBLIN-1158. --- Resolution: Fixed > Use input dir to document old files instead of file pathes to reduce memory > cost in Compaction configurator > --- > > Key: GOBBLIN-1158 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1158 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zihan Li >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GOBBLIN-1147) Use one dfsClient in FsDataWriter to to rename and exists check to avoid inconsistency
[ https://issues.apache.org/jira/browse/GOBBLIN-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zihan Li resolved GOBBLIN-1147. --- Resolution: Fixed > Use one dfsClient in FsDataWriter to to rename and exists check to avoid > inconsistency > -- > > Key: GOBBLIN-1147 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1147 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zihan Li >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GOBBLIN-1136) Make LogCopier be able to refresh FileSystem for long running job use cases
[ https://issues.apache.org/jira/browse/GOBBLIN-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zihan Li resolved GOBBLIN-1136. --- Resolution: Fixed > Make LogCopier be able to refresh FileSystem for long running job use cases > --- > > Key: GOBBLIN-1136 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1136 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zihan Li >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Make LogCopier be able to refresh FileSystem for long running job use cases -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GOBBLIN-1143) Add a generic wrapper producer client to communicate with Kafka
[ https://issues.apache.org/jira/browse/GOBBLIN-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zihan Li resolved GOBBLIN-1143. --- Resolution: Fixed > Add a generic wrapper producer client to communicate with Kafka > --- > > Key: GOBBLIN-1143 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1143 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zihan Li >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Add a generic wrapper producer client to communicate with Kafka and and it's > implementation of kafka08 producer and kafka09producer -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GOBBLIN-1133) Add CompactionSuiteBaseWithConfigurableCompleteAction to make complete action configurable
[ https://issues.apache.org/jira/browse/GOBBLIN-1133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zihan Li resolved GOBBLIN-1133. --- Resolution: Fixed > Add CompactionSuiteBaseWithConfigurableCompleteAction to make complete action > configurable > -- > > Key: GOBBLIN-1133 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1133 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zihan Li >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > # Add CompactionSuiteBaseWithConfigurableCompleteAction to make complete > action configurable > # Include dstNewFiles and oldFiles in CompactionJobConfigurator -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GOBBLIN-1121) Fix Issue that YarnService use the old token to acquire new container
[ https://issues.apache.org/jira/browse/GOBBLIN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zihan Li resolved GOBBLIN-1121. --- Resolution: Fixed > Fix Issue that YarnService use the old token to acquire new container > - > > Key: GOBBLIN-1121 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1121 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zihan Li >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GOBBLIN-1080) Add configuration to preserve schema creation time in converter
[ https://issues.apache.org/jira/browse/GOBBLIN-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zihan Li resolved GOBBLIN-1080. --- Resolution: Fixed > Add configuration to preserve schema creation time in converter > --- > > Key: GOBBLIN-1080 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1080 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zihan Li >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GOBBLIN-1069) Add NPE check in handleContainerCompletion method
[ https://issues.apache.org/jira/browse/GOBBLIN-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zihan Li resolved GOBBLIN-1069. --- Resolution: Fixed > Add NPE check in handleContainerCompletion method > - > > Key: GOBBLIN-1069 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1069 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zihan Li >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > Add NPE check in handleContainerCompletion method to make sure call the > method twice for the same container will not fail the job -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GOBBLIN-1064) Make KafkaAvroSchemaRegistry extendable
[ https://issues.apache.org/jira/browse/GOBBLIN-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zihan Li resolved GOBBLIN-1064. --- Resolution: Fixed > Make KafkaAvroSchemaRegistry extendable > --- > > Key: GOBBLIN-1064 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1064 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zihan Li >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GOBBLIN-1077) Fix bug in HiveDataset.resolveConfig
[ https://issues.apache.org/jira/browse/GOBBLIN-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zihan Li resolved GOBBLIN-1077. --- Resolution: Fixed > Fix bug in HiveDataset.resolveConfig > > > Key: GOBBLIN-1077 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1077 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zihan Li >Priority: Major > > In resolveConfig, we get a config object, and resolve the value and put all > of them in a property object without desanitize the key. And when transform > the property object back to config, there is a chance to get runTime > exception. > Solution: directly construct a config object instead config->property->config -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GOBBLIN-1023) Fix the issue of lossing data when trying to commit
[ https://issues.apache.org/jira/browse/GOBBLIN-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zihan Li resolved GOBBLIN-1023. --- Resolution: Won't Fix > Fix the issue of lossing data when trying to commit > --- > > Key: GOBBLIN-1023 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1023 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zihan Li >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GOBBLIN-981) Handle backward compatibility issue in HiveSource
[ https://issues.apache.org/jira/browse/GOBBLIN-981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zihan Li resolved GOBBLIN-981. -- Resolution: Fixed > Handle backward compatibility issue in HiveSource > - > > Key: GOBBLIN-981 > URL: https://issues.apache.org/jira/browse/GOBBLIN-981 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zihan Li >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GOBBLIN-986) Persist the existing property of table when doing hive registration
[ https://issues.apache.org/jira/browse/GOBBLIN-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zihan Li resolved GOBBLIN-986. -- Resolution: Fixed > Persist the existing property of table when doing hive registration > --- > > Key: GOBBLIN-986 > URL: https://issues.apache.org/jira/browse/GOBBLIN-986 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zihan Li >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GOBBLIN-975) Add flag to enable/disable avro type check in AvroToOrc
[ https://issues.apache.org/jira/browse/GOBBLIN-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zihan Li resolved GOBBLIN-975. -- Resolution: Fixed > Add flag to enable/disable avro type check in AvroToOrc > > > Key: GOBBLIN-975 > URL: https://issues.apache.org/jira/browse/GOBBLIN-975 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zihan Li >Priority: Major > > Add flag to enable/disable avro type check when trying to get the schema. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GOBBLIN-941) Enhance DDL to add column and column.types with case-preserving schema
[ https://issues.apache.org/jira/browse/GOBBLIN-941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zihan Li resolved GOBBLIN-941. -- Resolution: Fixed > Enhance DDL to add column and column.types with case-preserving schema > -- > > Key: GOBBLIN-941 > URL: https://issues.apache.org/jira/browse/GOBBLIN-941 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zihan Li >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > Enhance DDL to add column and column.types with case-preserving schema which > would enforce avro2orc output preserving correct casing -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GOBBLIN-924) Get rid of orc.schema.literal in ORC-ingestion and registration
[ https://issues.apache.org/jira/browse/GOBBLIN-924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zihan Li resolved GOBBLIN-924. -- Resolution: Fixed > Get rid of orc.schema.literal in ORC-ingestion and registration > --- > > Key: GOBBLIN-924 > URL: https://issues.apache.org/jira/browse/GOBBLIN-924 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zihan Li >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GOBBLIN-916) Make ContainerLaunchContext instantiation in YarnService more efficient
[ https://issues.apache.org/jira/browse/GOBBLIN-916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zihan Li resolved GOBBLIN-916. -- Resolution: Fixed > Make ContainerLaunchContext instantiation in YarnService more efficient > --- > > Key: GOBBLIN-916 > URL: https://issues.apache.org/jira/browse/GOBBLIN-916 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zihan Li >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GOBBLIN-921) Make pull/push mode when registering partition to be configurable
[ https://issues.apache.org/jira/browse/GOBBLIN-921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zihan Li resolved GOBBLIN-921. -- Resolution: Fixed > Make pull/push mode when registering partition to be configurable > - > > Key: GOBBLIN-921 > URL: https://issues.apache.org/jira/browse/GOBBLIN-921 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zihan Li >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > In pull mode, register will first try to check if the partition has already > existed to reduce the call to add_partition. In push mode, register will try > to call add_partition directly and relying on the exception to determine > whether existed which mode should be used when most of the partition the > HiveRegister try to register is not existed to reduce the call to > HiveMetaStore. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GOBBLIN-912) Enable TTL caching on Hive Metastore client connection
[ https://issues.apache.org/jira/browse/GOBBLIN-912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zihan Li resolved GOBBLIN-912. -- Resolution: Fixed > Enable TTL caching on Hive Metastore client connection > -- > > Key: GOBBLIN-912 > URL: https://issues.apache.org/jira/browse/GOBBLIN-912 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zihan Li >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GOBBLIN-902) Enable gobblin yarn app luncher class configurable
[ https://issues.apache.org/jira/browse/GOBBLIN-902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zihan Li resolved GOBBLIN-902. -- Resolution: Fixed > Enable gobblin yarn app luncher class configurable > -- > > Key: GOBBLIN-902 > URL: https://issues.apache.org/jira/browse/GOBBLIN-902 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zihan Li >Priority: Major > > Enable gobblin yarn app luncher class configurable -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GOBBLIN-899) Add a key in dataset config to disable schema check for a specific dataset
[ https://issues.apache.org/jira/browse/GOBBLIN-899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zihan Li resolved GOBBLIN-899. -- Resolution: Fixed > Add a key in dataset config to disable schema check for a specific dataset > -- > > Key: GOBBLIN-899 > URL: https://issues.apache.org/jira/browse/GOBBLIN-899 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zihan Li >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GOBBLIN-877) Add column metadata for partition for inline hive registration
[ https://issues.apache.org/jira/browse/GOBBLIN-877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zihan Li resolved GOBBLIN-877. -- Resolution: Fixed > Add column metadata for partition for inline hive registration > -- > > Key: GOBBLIN-877 > URL: https://issues.apache.org/jira/browse/GOBBLIN-877 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zihan Li >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Previously, we remove the schema.literal for partition. Because Avro schemas > should _only_ be defined at the table level. Hive overrides table properties > if the same property is defined on the partition. Defining them at the > partition level may lead to partitions with inconsistent schemas. And because > column metadata is calculated from schema.literal, so we remove the column > metadata as well. > Then we encounter a problem that presto cannot read data from orc file. > Because ORC (and other Hive serdes) need metadata in the partitions so that > coercion can be done between a partition schema and the table schema. > So we need to treat Avro and other formate separately to make sure hive > registration works well so that user can read right data from Presto. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GOBBLIN-863) Handle race condition between concurrent Gobblin tasks performing Hive registration
[ https://issues.apache.org/jira/browse/GOBBLIN-863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zihan Li resolved GOBBLIN-863. -- Resolution: Fixed > Handle race condition between concurrent Gobblin tasks performing Hive > registration > --- > > Key: GOBBLIN-863 > URL: https://issues.apache.org/jira/browse/GOBBLIN-863 > Project: Apache Gobblin > Issue Type: Task > Components: hive-registration >Reporter: Zihan Li >Assignee: Abhishek Tiwari >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GOBBLIN-861) Skip getPartition() call to Hive Metastore when a partition already exists
[ https://issues.apache.org/jira/browse/GOBBLIN-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zihan Li resolved GOBBLIN-861. -- Resolution: Fixed > Skip getPartition() call to Hive Metastore when a partition already exists > -- > > Key: GOBBLIN-861 > URL: https://issues.apache.org/jira/browse/GOBBLIN-861 > Project: Apache Gobblin > Issue Type: Task > Components: hive-registration >Reporter: Zihan Li >Assignee: Abhishek Tiwari >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > Currently, we compute a diff between the current partition and an already > registered partition when a partition has already been registered in Hive. > This is done by calling getPartition() on the Hive metastore client, which > can be expensive. Since no time-varying attributes are stored in a Hive > partition, diff computation (and getPartition() call) can be skipped when a > partition already exists. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GOBBLIN-859) Let writer pass latest schema to workUnitState when schema change
[ https://issues.apache.org/jira/browse/GOBBLIN-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zihan Li resolved GOBBLIN-859. -- Resolution: Fixed > Let writer pass latest schema to workUnitState when schema change > - > > Key: GOBBLIN-859 > URL: https://issues.apache.org/jira/browse/GOBBLIN-859 > Project: Apache Gobblin > Issue Type: Task > Components: gobblin-core >Reporter: Zihan Li >Assignee: Abhishek Tiwari >Priority: Major > > Let writer pass latest schema to workUnitState when initialize the writer and > schema change so that hive registration can directly get the latest schema > without maintain a logic to compute the latest schema version -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GOBBLIN-852) Reorganize the code for hive registration to isolate function
[ https://issues.apache.org/jira/browse/GOBBLIN-852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zihan Li resolved GOBBLIN-852. -- Resolution: Fixed > Reorganize the code for hive registration to isolate function > - > > Key: GOBBLIN-852 > URL: https://issues.apache.org/jira/browse/GOBBLIN-852 > Project: Apache Gobblin > Issue Type: Task > Components: hive-registration >Reporter: Zihan Li >Assignee: Abhishek Tiwari >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GOBBLIN-806) Enable metrics reporter during dataset discovery for retention job
[ https://issues.apache.org/jira/browse/GOBBLIN-806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zihan Li resolved GOBBLIN-806. -- Resolution: Fixed > Enable metrics reporter during dataset discovery for retention job > -- > > Key: GOBBLIN-806 > URL: https://issues.apache.org/jira/browse/GOBBLIN-806 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zihan Li >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-1210) Force AM to read from token file to update token when start up
Zihan Li created GOBBLIN-1210: - Summary: Force AM to read from token file to update token when start up Key: GOBBLIN-1210 URL: https://issues.apache.org/jira/browse/GOBBLIN-1210 Project: Apache Gobblin Issue Type: Task Reporter: Zihan Li -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-1185) Enable dataset cleaner to emit kafka events
Zihan Li created GOBBLIN-1185: - Summary: Enable dataset cleaner to emit kafka events Key: GOBBLIN-1185 URL: https://issues.apache.org/jira/browse/GOBBLIN-1185 Project: Apache Gobblin Issue Type: Task Reporter: Zihan Li -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-1183) Enable additional yarn class path set for app master
Zihan Li created GOBBLIN-1183: - Summary: Enable additional yarn class path set for app master Key: GOBBLIN-1183 URL: https://issues.apache.org/jira/browse/GOBBLIN-1183 Project: Apache Gobblin Issue Type: Task Reporter: Zihan Li -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-1165) Add config to enable user to set additional yarn classpathes
Zihan Li created GOBBLIN-1165: - Summary: Add config to enable user to set additional yarn classpathes Key: GOBBLIN-1165 URL: https://issues.apache.org/jira/browse/GOBBLIN-1165 Project: Apache Gobblin Issue Type: Task Reporter: Zihan Li -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-1158) Use input dir to document old files instead of file pathes to reduce memory cost
Zihan Li created GOBBLIN-1158: - Summary: Use input dir to document old files instead of file pathes to reduce memory cost Key: GOBBLIN-1158 URL: https://issues.apache.org/jira/browse/GOBBLIN-1158 Project: Apache Gobblin Issue Type: Task Reporter: Zihan Li -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (GOBBLIN-1158) Use input dir to document old files instead of file pathes to reduce memory cost in Compaction configurator
[ https://issues.apache.org/jira/browse/GOBBLIN-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zihan Li updated GOBBLIN-1158: -- Summary: Use input dir to document old files instead of file pathes to reduce memory cost in Compaction configurator (was: Use input dir to document old files instead of file pathes to reduce memory cost) > Use input dir to document old files instead of file pathes to reduce memory > cost in Compaction configurator > --- > > Key: GOBBLIN-1158 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1158 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zihan Li >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-1147) Use one dfsClient in FsDataWriter to to rename and exists check to avoid inconsistency
Zihan Li created GOBBLIN-1147: - Summary: Use one dfsClient in FsDataWriter to to rename and exists check to avoid inconsistency Key: GOBBLIN-1147 URL: https://issues.apache.org/jira/browse/GOBBLIN-1147 Project: Apache Gobblin Issue Type: Task Reporter: Zihan Li -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-1143) Add a generic wrapper producer client to communicate with Kafka
Zihan Li created GOBBLIN-1143: - Summary: Add a generic wrapper producer client to communicate with Kafka Key: GOBBLIN-1143 URL: https://issues.apache.org/jira/browse/GOBBLIN-1143 Project: Apache Gobblin Issue Type: Task Reporter: Zihan Li Add a generic wrapper producer client to communicate with Kafka and and it's implementation of kafka08 producer and kafka09producer -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-1136) Make LogCopier be able to refresh FileSystem for long running job use cases
Zihan Li created GOBBLIN-1136: - Summary: Make LogCopier be able to refresh FileSystem for long running job use cases Key: GOBBLIN-1136 URL: https://issues.apache.org/jira/browse/GOBBLIN-1136 Project: Apache Gobblin Issue Type: Task Reporter: Zihan Li Make LogCopier be able to refresh FileSystem for long running job use cases -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-1133) Add CompactionSuiteBaseWithConfigurableCompleteAction to make complete action configurable
Zihan Li created GOBBLIN-1133: - Summary: Add CompactionSuiteBaseWithConfigurableCompleteAction to make complete action configurable Key: GOBBLIN-1133 URL: https://issues.apache.org/jira/browse/GOBBLIN-1133 Project: Apache Gobblin Issue Type: Task Reporter: Zihan Li # Add CompactionSuiteBaseWithConfigurableCompleteAction to make complete action configurable # Include dstNewFiles and oldFiles in CompactionJobConfigurator -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-1121) Fix Issue that YarnService use the old token to acquire new container
Zihan Li created GOBBLIN-1121: - Summary: Fix Issue that YarnService use the old token to acquire new container Key: GOBBLIN-1121 URL: https://issues.apache.org/jira/browse/GOBBLIN-1121 Project: Apache Gobblin Issue Type: Task Reporter: Zihan Li -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-1093) Use method overloading in AvroUtils for add creation time
Zihan Li created GOBBLIN-1093: - Summary: Use method overloading in AvroUtils for add creation time Key: GOBBLIN-1093 URL: https://issues.apache.org/jira/browse/GOBBLIN-1093 Project: Apache Gobblin Issue Type: Task Reporter: Zihan Li -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-1080) Add configuration to preserve schema creation time in converter
Zihan Li created GOBBLIN-1080: - Summary: Add configuration to preserve schema creation time in converter Key: GOBBLIN-1080 URL: https://issues.apache.org/jira/browse/GOBBLIN-1080 Project: Apache Gobblin Issue Type: Task Reporter: Zihan Li -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-1077) Fix bug in HiveDataset.resolveConfig
Zihan Li created GOBBLIN-1077: - Summary: Fix bug in HiveDataset.resolveConfig Key: GOBBLIN-1077 URL: https://issues.apache.org/jira/browse/GOBBLIN-1077 Project: Apache Gobblin Issue Type: Task Reporter: Zihan Li In resolveConfig, we get a config object, and resolve the value and put all of them in a property object without desanitize the key. And when transform the property object back to config, there is a chance to get runTime exception. Solution: directly construct a config object instead config->property->config -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-1069) Add NPE check in handleContainerCompletion method
Zihan Li created GOBBLIN-1069: - Summary: Add NPE check in handleContainerCompletion method Key: GOBBLIN-1069 URL: https://issues.apache.org/jira/browse/GOBBLIN-1069 Project: Apache Gobblin Issue Type: Task Reporter: Zihan Li Add NPE check in handleContainerCompletion method to make sure call the method twice for the same container will not fail the job -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (GOBBLIN-1064) Make KafkaAvroSchemaRegistry extendable
[ https://issues.apache.org/jira/browse/GOBBLIN-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zihan Li updated GOBBLIN-1064: -- Summary: Make KafkaAvroSchemaRegistry extendable (was: Add writer's schema to workUnitState) > Make KafkaAvroSchemaRegistry extendable > --- > > Key: GOBBLIN-1064 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1064 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zihan Li >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-1064) Add writer's schema to workUnitState
Zihan Li created GOBBLIN-1064: - Summary: Add writer's schema to workUnitState Key: GOBBLIN-1064 URL: https://issues.apache.org/jira/browse/GOBBLIN-1064 Project: Apache Gobblin Issue Type: Task Reporter: Zihan Li -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (GOBBLIN-1023) Fix the issue of lossing data when trying to commit
[ https://issues.apache.org/jira/browse/GOBBLIN-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zihan Li updated GOBBLIN-1023: -- Summary: Fix the issue of lossing data when trying to commit (was: Fix the issue of losing data when trying to commit) > Fix the issue of lossing data when trying to commit > --- > > Key: GOBBLIN-1023 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1023 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zihan Li >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-1023) Fix the issue of losing data when trying to commit
Zihan Li created GOBBLIN-1023: - Summary: Fix the issue of losing data when trying to commit Key: GOBBLIN-1023 URL: https://issues.apache.org/jira/browse/GOBBLIN-1023 Project: Apache Gobblin Issue Type: Task Reporter: Zihan Li -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-986) Persist the existing property of table when doing hive registration
Zihan Li created GOBBLIN-986: Summary: Persist the existing property of table when doing hive registration Key: GOBBLIN-986 URL: https://issues.apache.org/jira/browse/GOBBLIN-986 Project: Apache Gobblin Issue Type: Task Reporter: Zihan Li -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-981) Handle backward compatibility issue in HiveSource
Zihan Li created GOBBLIN-981: Summary: Handle backward compatibility issue in HiveSource Key: GOBBLIN-981 URL: https://issues.apache.org/jira/browse/GOBBLIN-981 Project: Apache Gobblin Issue Type: Task Reporter: Zihan Li -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-975) Add flag to enable/disable avro type check in AvroToOrc
Zihan Li created GOBBLIN-975: Summary: Add flag to enable/disable avro type check in AvroToOrc Key: GOBBLIN-975 URL: https://issues.apache.org/jira/browse/GOBBLIN-975 Project: Apache Gobblin Issue Type: Task Reporter: Zihan Li Add flag to enable/disable avro type check when trying to get the schema. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-967) Change token refresh method in YarnContainerSecirityManager
Zihan Li created GOBBLIN-967: Summary: Change token refresh method in YarnContainerSecirityManager Key: GOBBLIN-967 URL: https://issues.apache.org/jira/browse/GOBBLIN-967 Project: Apache Gobblin Issue Type: Task Reporter: Zihan Li Change token refresh method in YarnContainerSecirityManager from adding token to directly adding credentials to make sure all the new credentials will be updated. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-961) Bypass locked partitions when calculating src watermark
Zihan Li created GOBBLIN-961: Summary: Bypass locked partitions when calculating src watermark Key: GOBBLIN-961 URL: https://issues.apache.org/jira/browse/GOBBLIN-961 Project: Apache Gobblin Issue Type: Task Reporter: Zihan Li -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-941) Enhance DDL to add column and column.types with case-preserving schema
Zihan Li created GOBBLIN-941: Summary: Enhance DDL to add column and column.types with case-preserving schema Key: GOBBLIN-941 URL: https://issues.apache.org/jira/browse/GOBBLIN-941 Project: Apache Gobblin Issue Type: Task Reporter: Zihan Li Enhance DDL to add column and column.types with case-preserving schema which would enforce avro2orc output preserving correct casing -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-924) Get rid of orc.schema.literal in ORC-ingestion and registration
Zihan Li created GOBBLIN-924: Summary: Get rid of orc.schema.literal in ORC-ingestion and registration Key: GOBBLIN-924 URL: https://issues.apache.org/jira/browse/GOBBLIN-924 Project: Apache Gobblin Issue Type: Task Reporter: Zihan Li -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-921) Make pull/push mode when registering partition to be configurable
Zihan Li created GOBBLIN-921: Summary: Make pull/push mode when registering partition to be configurable Key: GOBBLIN-921 URL: https://issues.apache.org/jira/browse/GOBBLIN-921 Project: Apache Gobblin Issue Type: Task Reporter: Zihan Li In pull mode, register will first try to check if the partition has already existed to reduce the call to add_partition. In push mode, register will try to call add_partition directly and relying on the exception to determine whether existed which mode should be used when most of the partition the HiveRegister try to register is not existed to reduce the call to HiveMetaStore. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-916) Make ContainerLaunchContext instantiation in YarnService more efficient
Zihan Li created GOBBLIN-916: Summary: Make ContainerLaunchContext instantiation in YarnService more efficient Key: GOBBLIN-916 URL: https://issues.apache.org/jira/browse/GOBBLIN-916 Project: Apache Gobblin Issue Type: Task Reporter: Zihan Li -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-912) Enable TTL caching on Hive Metastore client connection
Zihan Li created GOBBLIN-912: Summary: Enable TTL caching on Hive Metastore client connection Key: GOBBLIN-912 URL: https://issues.apache.org/jira/browse/GOBBLIN-912 Project: Apache Gobblin Issue Type: Task Reporter: Zihan Li -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-902) Enable gobblin yarn app luncher class configurable
Zihan Li created GOBBLIN-902: Summary: Enable gobblin yarn app luncher class configurable Key: GOBBLIN-902 URL: https://issues.apache.org/jira/browse/GOBBLIN-902 Project: Apache Gobblin Issue Type: Task Reporter: Zihan Li Enable gobblin yarn app luncher class configurable -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-899) Add a key in dataset config to disable schema check for a specific dataset
Zihan Li created GOBBLIN-899: Summary: Add a key in dataset config to disable schema check for a specific dataset Key: GOBBLIN-899 URL: https://issues.apache.org/jira/browse/GOBBLIN-899 Project: Apache Gobblin Issue Type: Task Reporter: Zihan Li -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-877) Add column metadata for partition for inline hive registration
Zihan Li created GOBBLIN-877: Summary: Add column metadata for partition for inline hive registration Key: GOBBLIN-877 URL: https://issues.apache.org/jira/browse/GOBBLIN-877 Project: Apache Gobblin Issue Type: Task Reporter: Zihan Li Previously, we remove the schema.literal for partition. Because Avro schemas should _only_ be defined at the table level. Hive overrides table properties if the same property is defined on the partition. Defining them at the partition level may lead to partitions with inconsistent schemas. And because column metadata is calculated from schema.literal, so we remove the column metadata as well. Then we encounter a problem that presto cannot read data from orc file. Because ORC (and other Hive serdes) need metadata in the partitions so that coercion can be done between a partition schema and the table schema. So we need to treat Avro and other formate separately to make sure hive registration works well so that user can read right data from Presto. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (GOBBLIN-872) Only use one CouchbaseEnvironment per JVM
Zihan Li created GOBBLIN-872: Summary: Only use one CouchbaseEnvironment per JVM Key: GOBBLIN-872 URL: https://issues.apache.org/jira/browse/GOBBLIN-872 Project: Apache Gobblin Issue Type: Task Reporter: Zihan Li -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (GOBBLIN-863) Handle race condition between concurrent Gobblin tasks performing Hive registration
Zihan Li created GOBBLIN-863: Summary: Handle race condition between concurrent Gobblin tasks performing Hive registration Key: GOBBLIN-863 URL: https://issues.apache.org/jira/browse/GOBBLIN-863 Project: Apache Gobblin Issue Type: Task Components: hive-registration Reporter: Zihan Li Assignee: Abhishek Tiwari -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (GOBBLIN-861) Skip getPartition() call to Hive Metastore when a partition already exists
Zihan Li created GOBBLIN-861: Summary: Skip getPartition() call to Hive Metastore when a partition already exists Key: GOBBLIN-861 URL: https://issues.apache.org/jira/browse/GOBBLIN-861 Project: Apache Gobblin Issue Type: Task Components: hive-registration Reporter: Zihan Li Assignee: Abhishek Tiwari Currently, we compute a diff between the current partition and an already registered partition when a partition has already been registered in Hive. This is done by calling getPartition() on the Hive metastore client, which can be expensive. Since no time-varying attributes are stored in a Hive partition, diff computation (and getPartition() call) can be skipped when a partition already exists. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (GOBBLIN-859) Let writer pass latest schema to workUnitState when schema change
Zihan Li created GOBBLIN-859: Summary: Let writer pass latest schema to workUnitState when schema change Key: GOBBLIN-859 URL: https://issues.apache.org/jira/browse/GOBBLIN-859 Project: Apache Gobblin Issue Type: Task Components: gobblin-core Reporter: Zihan Li Assignee: Abhishek Tiwari Let writer pass latest schema to workUnitState when initialize the writer and schema change so that hive registration can directly get the latest schema without maintain a logic to compute the latest schema version -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (GOBBLIN-852) Reorganize the code for hive registration to isolate function
Zihan Li created GOBBLIN-852: Summary: Reorganize the code for hive registration to isolate function Key: GOBBLIN-852 URL: https://issues.apache.org/jira/browse/GOBBLIN-852 Project: Apache Gobblin Issue Type: Task Components: hive-registration Reporter: Zihan Li Assignee: Abhishek Tiwari -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (GOBBLIN-806) Enable metrics reporter during dataset discovery for retention job
Zihan Li created GOBBLIN-806: Summary: Enable metrics reporter during dataset discovery for retention job Key: GOBBLIN-806 URL: https://issues.apache.org/jira/browse/GOBBLIN-806 Project: Apache Gobblin Issue Type: Task Reporter: Zihan Li -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-799) Bugs in AvroSchemaCheckDefaultStrategy that not return after check ENUM and FIXED
Zihan Li created GOBBLIN-799: Summary: Bugs in AvroSchemaCheckDefaultStrategy that not return after check ENUM and FIXED Key: GOBBLIN-799 URL: https://issues.apache.org/jira/browse/GOBBLIN-799 Project: Apache Gobblin Issue Type: Bug Reporter: Zihan Li There are bugs in AvroSchemaCheckDefaultStrategy that not return after check ENUM and FIXED, just need to add return statement -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-772) Implement Schema Comparison Strategy during Disctp
Zihan Li created GOBBLIN-772: Summary: Implement Schema Comparison Strategy during Disctp Key: GOBBLIN-772 URL: https://issues.apache.org/jira/browse/GOBBLIN-772 Project: Apache Gobblin Issue Type: Task Reporter: Zihan Li We need a schema comparison strategy to make sure the real schema and the expected schema have matching field names and types. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-747) Set expected schema when creating workunits
Zihan Li created GOBBLIN-747: Summary: Set expected schema when creating workunits Key: GOBBLIN-747 URL: https://issues.apache.org/jira/browse/GOBBLIN-747 Project: Apache Gobblin Issue Type: Improvement Reporter: Zihan Li Set the property of gobblin.copy.expectedSchema when creating the workunit to enable schema check in distcp. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-726) Enable Schema Verification During Primary Dataset Deployment
Zihan Li created GOBBLIN-726: Summary: Enable Schema Verification During Primary Dataset Deployment Key: GOBBLIN-726 URL: https://issues.apache.org/jira/browse/GOBBLIN-726 Project: Apache Gobblin Issue Type: Improvement Reporter: Zihan Li Each distcp mapper will first read the schema of the file to be copied, and abort if the file schema does not match the expected schema. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-717) Filter Out Empty MultiWorkUnits
Zihan Li created GOBBLIN-717: Summary: Filter Out Empty MultiWorkUnits Key: GOBBLIN-717 URL: https://issues.apache.org/jira/browse/GOBBLIN-717 Project: Apache Gobblin Issue Type: Improvement Reporter: Zihan Li Now when we run a job, Gobblin use the value of max mappers or the target size of a mapper to determine the number of mappers. But since one partition cannot be divided into several WorkUnits, work cannot be evenly distributed, there are many mappers(MultiWorkUnits) have no work to do. This will waste a lot of resources. So we need to filter out MultiWorkUnits which contains no WorkUnit when we determine the number of mappers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-715) Unit test for KafkaSoure
Zihan Li created GOBBLIN-715: Summary: Unit test for KafkaSoure Key: GOBBLIN-715 URL: https://issues.apache.org/jira/browse/GOBBLIN-715 Project: Apache Gobblin Issue Type: Test Reporter: Zihan Li We have an abstract class KafkaSource which contains a function called getWorkunits that be used in many use cases. But we have no unit test for this function. We should implement a simple subclass of KafkaSource and have a unit test to test the logic inside the function to make sure it returns the desired WorkUnits. -- This message was sent by Atlassian JIRA (v7.6.3#76005)