[jira] [Updated] (GOBBLIN-1174) Fail job on FileBasedSource ls invalid source directory
[ https://issues.apache.org/jira/browse/GOBBLIN-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-1174: --- Summary: Fail job on FileBasedSource ls invalid source directory (was: Fail job on FileBasedSource with invalid source directory) > Fail job on FileBasedSource ls invalid source directory > --- > > Key: GOBBLIN-1174 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1174 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-1174) Fail job on FileBasedSource with invalid source directory
Zhixiong Chen created GOBBLIN-1174: -- Summary: Fail job on FileBasedSource with invalid source directory Key: GOBBLIN-1174 URL: https://issues.apache.org/jira/browse/GOBBLIN-1174 Project: Apache Gobblin Issue Type: Task Reporter: Zhixiong Chen Assignee: Zhixiong Chen -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-1146) Allow configuring autocommit in JDBCWriters
Zhixiong Chen created GOBBLIN-1146: -- Summary: Allow configuring autocommit in JDBCWriters Key: GOBBLIN-1146 URL: https://issues.apache.org/jira/browse/GOBBLIN-1146 Project: Apache Gobblin Issue Type: Task Reporter: Zhixiong Chen Assignee: Zhixiong Chen -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (GOBBLIN-1142) Hive Distcp support filter on partitioned or snapshot table
[ https://issues.apache.org/jira/browse/GOBBLIN-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-1142: --- Description: The change adds support filtering a specific type of tables, e.g snapshot, partitioned, in `HiveDatasetFinder` (was: The change adds support filtering a specific type of tables, e.g snapshot, partitioned, in `HiveDatasetFinder` for distcp) > Hive Distcp support filter on partitioned or snapshot table > --- > > Key: GOBBLIN-1142 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1142 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > The change adds support filtering a specific type of tables, e.g snapshot, > partitioned, in `HiveDatasetFinder` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (GOBBLIN-1142) Hive Distcp support filter on partitioned or snapshot table
[ https://issues.apache.org/jira/browse/GOBBLIN-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-1142: --- Description: The change adds support filtering a specific type of tables, e.g snapshot, partitioned, in `HiveDatasetFinder` for distcp > Hive Distcp support filter on partitioned or snapshot table > --- > > Key: GOBBLIN-1142 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1142 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > The change adds support filtering a specific type of tables, e.g snapshot, > partitioned, in `HiveDatasetFinder` for distcp -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-1142) Hive Distcp support filter on partitioned or snapshot table
Zhixiong Chen created GOBBLIN-1142: -- Summary: Hive Distcp support filter on partitioned or snapshot table Key: GOBBLIN-1142 URL: https://issues.apache.org/jira/browse/GOBBLIN-1142 Project: Apache Gobblin Issue Type: Task Reporter: Zhixiong Chen Assignee: Zhixiong Chen -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-1096) Work with DST change in compaction watermark
Zhixiong Chen created GOBBLIN-1096: -- Summary: Work with DST change in compaction watermark Key: GOBBLIN-1096 URL: https://issues.apache.org/jira/browse/GOBBLIN-1096 Project: Apache Gobblin Issue Type: Task Reporter: Zhixiong Chen Assignee: Zhixiong Chen -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (GOBBLIN-1066) field projection with namespace
[ https://issues.apache.org/jira/browse/GOBBLIN-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-1066: --- Description: `AvroProjectionConverter` currently ignores extract namespace to identify fields to remove for a table. The change is to identify fields to remove with namespace into account, configurable. > field projection with namespace > --- > > Key: GOBBLIN-1066 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1066 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > `AvroProjectionConverter` currently ignores extract namespace to identify > fields to remove for a table. The change is to identify fields to remove with > namespace into account, configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-1066) field projection with namespace
Zhixiong Chen created GOBBLIN-1066: -- Summary: field projection with namespace Key: GOBBLIN-1066 URL: https://issues.apache.org/jira/browse/GOBBLIN-1066 Project: Apache Gobblin Issue Type: Task Reporter: Zhixiong Chen Assignee: Zhixiong Chen -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (GOBBLIN-1056) Allow customizing client pool population in KafkaSource
[ https://issues.apache.org/jira/browse/GOBBLIN-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-1056: --- Description: Put existing logic of consumer client pool population into method `populateClientPool`, it allows the client created for the pool to carry additional information from the client created to fetch topics (was: Put existing logic of consumer client pool population) > Allow customizing client pool population in KafkaSource > --- > > Key: GOBBLIN-1056 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1056 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > Put existing logic of consumer client pool population into method > `populateClientPool`, it allows the client created for the pool to carry > additional information from the client created to fetch topics -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (GOBBLIN-1056) Allow customizing client pool population in KafkaSource
[ https://issues.apache.org/jira/browse/GOBBLIN-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-1056: --- Summary: Allow customizing client pool population in KafkaSource (was: Allow customizing client pool creation in KafkaSource) > Allow customizing client pool population in KafkaSource > --- > > Key: GOBBLIN-1056 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1056 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > Put existing logic of consumer client pool population -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (GOBBLIN-1056) Allow customizing client pool creation in KafkaSource
[ https://issues.apache.org/jira/browse/GOBBLIN-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-1056: --- Description: Put existing logic of consumer client pool population > Allow customizing client pool creation in KafkaSource > - > > Key: GOBBLIN-1056 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1056 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > Put existing logic of consumer client pool population -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (GOBBLIN-1056) Allow customizing client pool creation in KafkaSource
[ https://issues.apache.org/jira/browse/GOBBLIN-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-1056: --- Summary: Allow customizing client pool creation in KafkaSource (was: Allow customize client pool creation in KafkaSource) > Allow customizing client pool creation in KafkaSource > - > > Key: GOBBLIN-1056 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1056 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-1056) Allow customize client pool creation in KafkaSource
Zhixiong Chen created GOBBLIN-1056: -- Summary: Allow customize client pool creation in KafkaSource Key: GOBBLIN-1056 URL: https://issues.apache.org/jira/browse/GOBBLIN-1056 Project: Apache Gobblin Issue Type: Task Reporter: Zhixiong Chen Assignee: Zhixiong Chen StandardManifestRecord: A standard representation of a record from a service that gobblin reads or writes DistributedClasspathManager: A class to add artifacts to the classpath of an MR job -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (GOBBLIN-1056) Allow customize client pool creation in KafkaSource
[ https://issues.apache.org/jira/browse/GOBBLIN-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-1056: --- Description: (was: StandardManifestRecord: A standard representation of a record from a service that gobblin reads or writes DistributedClasspathManager: A class to add artifacts to the classpath of an MR job) > Allow customize client pool creation in KafkaSource > --- > > Key: GOBBLIN-1056 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1056 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (GOBBLIN-1045) Emit more events in compaction job
[ https://issues.apache.org/jira/browse/GOBBLIN-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-1045: --- Description: Emit count event for the following item in compaction job - number of files, corresponding to hive metadata "numFiles" - record count, corresponding to hive metadata "numRows" - bytes written, corresponding to hive metadata "totalSize" was: Emit count event for the following hive metadata in a compaction job - numFiles - numRows - totalSize > Emit more events in compaction job > -- > > Key: GOBBLIN-1045 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1045 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > Emit count event for the following item in compaction job > - number of files, corresponding to hive metadata "numFiles" > - record count, corresponding to hive metadata "numRows" > - bytes written, corresponding to hive metadata "totalSize" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (GOBBLIN-1045) Emit more events in compaction job
[ https://issues.apache.org/jira/browse/GOBBLIN-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-1045: --- Summary: Emit more events in compaction job (was: Emit events for hive metadata in compaction job) > Emit more events in compaction job > -- > > Key: GOBBLIN-1045 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1045 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > Emit count event for the following hive metadata in a compaction job > - numFiles > - numRows > - totalSize -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (GOBBLIN-1045) Emit events for hive metadata in compaction job
[ https://issues.apache.org/jira/browse/GOBBLIN-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-1045: --- Description: Emit count event for the following hive metadata in a compaction job - numFiles - numRows - totalSize > Emit events for hive metadata in compaction job > --- > > Key: GOBBLIN-1045 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1045 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > Emit count event for the following hive metadata in a compaction job > - numFiles > - numRows > - totalSize -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-1045) Emit events for hive metadata in compaction job
Zhixiong Chen created GOBBLIN-1045: -- Summary: Emit events for hive metadata in compaction job Key: GOBBLIN-1045 URL: https://issues.apache.org/jira/browse/GOBBLIN-1045 Project: Apache Gobblin Issue Type: Task Reporter: Zhixiong Chen Assignee: Zhixiong Chen -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (GOBBLIN-1012) Implement CompactionWithWatermarkSuite
[ https://issues.apache.org/jira/browse/GOBBLIN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-1012: --- Description: - A `compactionWatermark` is a timestamp indicating the data we've seen up to that time is compacted. If we explicitly checked that we've seen all the data up to that time, the timestamp would be published also as `completionAndCompactionWatermark`. - `CompactionWithWatermarkSuite` reports compaction watermarks as part of the compaction pipeline and publishes watermarks as hive table properties was:`CompactionWithWatermarkSuite` will report compaction watermark as part of the compaction pipeline. > Implement CompactionWithWatermarkSuite > -- > > Key: GOBBLIN-1012 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1012 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > - A `compactionWatermark` is a timestamp indicating the data we've seen up to > that time is compacted. If we explicitly checked that we've seen all the data > up to that time, the timestamp would be published also as > `completionAndCompactionWatermark`. > - `CompactionWithWatermarkSuite` reports compaction watermarks as part of > the compaction pipeline and publishes watermarks as hive table properties -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (GOBBLIN-1012) Implement CompactionWithWatermarkSuite
[ https://issues.apache.org/jira/browse/GOBBLIN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-1012: --- Description: `CompactionWithWatermarkSuite` will report compaction watermark as part of the compaction pipeline. (was: StandardManifestRecord: A standard representation of a record from a service that gobblin reads or writes DistributedClasspathManager: A class to add artifacts to the classpath of an MR job) > Implement CompactionWithWatermarkSuite > -- > > Key: GOBBLIN-1012 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1012 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > `CompactionWithWatermarkSuite` will report compaction watermark as part of > the compaction pipeline. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-1012) Implement CompactionWithWatermarkSuite
Zhixiong Chen created GOBBLIN-1012: -- Summary: Implement CompactionWithWatermarkSuite Key: GOBBLIN-1012 URL: https://issues.apache.org/jira/browse/GOBBLIN-1012 Project: Apache Gobblin Issue Type: Task Reporter: Zhixiong Chen Assignee: Zhixiong Chen StandardManifestRecord: A standard representation of a record from a service that gobblin reads or writes DistributedClasspathManager: A class to add artifacts to the classpath of an MR job -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (GOBBLIN-1011) Adjust compaction flow to work with virtual partition
[ https://issues.apache.org/jira/browse/GOBBLIN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-1011: --- Description: - Update existing `CompactionVerifier`s and `CompactionCompleteAction`s to work with virtual simple file system dataset proply - Improve ser/de of `FileSystemDataset` in `CompactionSuiteBase` - Update gobblin-hive-registration to work with table parameters properly > Adjust compaction flow to work with virtual partition > - > > Key: GOBBLIN-1011 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1011 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > - Update existing `CompactionVerifier`s and `CompactionCompleteAction`s to > work with virtual simple file system dataset proply > - Improve ser/de of `FileSystemDataset` in `CompactionSuiteBase` > - Update gobblin-hive-registration to work with table parameters properly -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-1011) Adjust compaction flow to work with virtual partition
Zhixiong Chen created GOBBLIN-1011: -- Summary: Adjust compaction flow to work with virtual partition Key: GOBBLIN-1011 URL: https://issues.apache.org/jira/browse/GOBBLIN-1011 Project: Apache Gobblin Issue Type: Task Reporter: Zhixiong Chen Assignee: Zhixiong Chen -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-1001) Implement TimePartitionGlobFinder
Zhixiong Chen created GOBBLIN-1001: -- Summary: Implement TimePartitionGlobFinder Key: GOBBLIN-1001 URL: https://issues.apache.org/jira/browse/GOBBLIN-1001 Project: Apache Gobblin Issue Type: Task Reporter: Zhixiong Chen Assignee: Zhixiong Chen -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-993) Support job level hive configuration override
Zhixiong Chen created GOBBLIN-993: - Summary: Support job level hive configuration override Key: GOBBLIN-993 URL: https://issues.apache.org/jira/browse/GOBBLIN-993 Project: Apache Gobblin Issue Type: Task Reporter: Zhixiong Chen -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-896) Clone schema or field props in AvroFieldRemover
Zhixiong Chen created GOBBLIN-896: - Summary: Clone schema or field props in AvroFieldRemover Key: GOBBLIN-896 URL: https://issues.apache.org/jira/browse/GOBBLIN-896 Project: Apache Gobblin Issue Type: Task Reporter: Zhixiong Chen Currently, `AvroFieldRemover` ignores schema and field level properties while cloning the schema and its fields. The change is to fix it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-831) Fix NPE in KafkaWorkUnitPacker when there is no WorkUnit created
Zhixiong Chen created GOBBLIN-831: - Summary: Fix NPE in KafkaWorkUnitPacker when there is no WorkUnit created Key: GOBBLIN-831 URL: https://issues.apache.org/jira/browse/GOBBLIN-831 Project: Apache Gobblin Issue Type: Task Reporter: Zhixiong Chen Assignee: Zhixiong Chen -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (GOBBLIN-827) Add more events
Zhixiong Chen created GOBBLIN-827: - Summary: Add more events Key: GOBBLIN-827 URL: https://issues.apache.org/jira/browse/GOBBLIN-827 Project: Apache Gobblin Issue Type: Task Reporter: Zhixiong Chen Add the following events - `JobStateEventBuilder` to report gobblin job state or MR job state - `EntityMissingEventBuilder` to report a missing instance of a certain entity -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (GOBBLIN-769) Support string record timestamp in TimeBasedAvroWriterPartitioner
[ https://issues.apache.org/jira/browse/GOBBLIN-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-769: -- Summary: Support string record timestamp in TimeBasedAvroWriterPartitioner (was: Support record timestamp as string in TimeBasedWriterPartitioner) > Support string record timestamp in TimeBasedAvroWriterPartitioner > - > > Key: GOBBLIN-769 > URL: https://issues.apache.org/jira/browse/GOBBLIN-769 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-769) Support string record timestamp in TimeBasedAvroWriterPartitioner
[ https://issues.apache.org/jira/browse/GOBBLIN-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-769: -- Description: Currently, if a record timestamp is a string, `TimeBasedAvroWriterPartitioner` will not be able to recognize it and will use current time > Support string record timestamp in TimeBasedAvroWriterPartitioner > - > > Key: GOBBLIN-769 > URL: https://issues.apache.org/jira/browse/GOBBLIN-769 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > Currently, if a record timestamp is a string, > `TimeBasedAvroWriterPartitioner` will not be able to recognize it and will > use current time -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-767) Support different time units in TimeBasedWriterPartitioner
[ https://issues.apache.org/jira/browse/GOBBLIN-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-767: -- Description: Currently, `TimeBasedWriterPartitioner` assumes the timestamp value from a record is in millis. The task is to remove the assumption and support timestamp in different units, by default, in millis. > Support different time units in TimeBasedWriterPartitioner > -- > > Key: GOBBLIN-767 > URL: https://issues.apache.org/jira/browse/GOBBLIN-767 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > Currently, `TimeBasedWriterPartitioner` assumes the timestamp value from a > record is in millis. The task is to remove the assumption and support > timestamp in different units, by default, in millis. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-767) Support different time units in TimeBasedWriterPartitioner
Zhixiong Chen created GOBBLIN-767: - Summary: Support different time units in TimeBasedWriterPartitioner Key: GOBBLIN-767 URL: https://issues.apache.org/jira/browse/GOBBLIN-767 Project: Apache Gobblin Issue Type: Task Reporter: Zhixiong Chen Assignee: Zhixiong Chen -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-763) Fix incorrect AvroUtils.removeUncomparableFields implementation
[ https://issues.apache.org/jira/browse/GOBBLIN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-763: -- Description: - Remove fields, specified by configuration `compaction.job.key.fieldBlacklist`, while computing compaction dedup key schema - Fix incorrect `AvroUtils.removeUncomparableFields` implementation, which only keeps the first field of any schema, dropping all other fields which have the same schema. was:Currently, `AvroUtils.removeUncomparableFields` will only keep the first field of any schema, dropping all other fields which have the same schema. > Fix incorrect AvroUtils.removeUncomparableFields implementation > --- > > Key: GOBBLIN-763 > URL: https://issues.apache.org/jira/browse/GOBBLIN-763 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > - Remove fields, specified by configuration > `compaction.job.key.fieldBlacklist`, while computing compaction dedup key > schema > - Fix incorrect `AvroUtils.removeUncomparableFields` implementation, which > only keeps the first field of any schema, dropping all other fields which > have the same schema. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-763) Support fields removal for compaction dedup key schema
[ https://issues.apache.org/jira/browse/GOBBLIN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-763: -- Summary: Support fields removal for compaction dedup key schema (was: Fix incorrect AvroUtils.removeUncomparableFields implementation) > Support fields removal for compaction dedup key schema > -- > > Key: GOBBLIN-763 > URL: https://issues.apache.org/jira/browse/GOBBLIN-763 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > - Remove fields, specified by configuration > `compaction.job.key.fieldBlacklist`, while computing compaction dedup key > schema > - Fix incorrect `AvroUtils.removeUncomparableFields` implementation, which > only keeps the first field of any schema, dropping all other fields which > have the same schema. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-763) Fix incorrect AvroUtils.removeUncomparableFields implementation
[ https://issues.apache.org/jira/browse/GOBBLIN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-763: -- Description: Currently, `AvroUtils.removeUncomparableFields` will only keep the first field of any schema, dropping all other fields which have the same schema. (was: Currently, ) > Fix incorrect AvroUtils.removeUncomparableFields implementation > --- > > Key: GOBBLIN-763 > URL: https://issues.apache.org/jira/browse/GOBBLIN-763 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > Currently, `AvroUtils.removeUncomparableFields` will only keep the first > field of any schema, dropping all other fields which have the same schema. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-763) Fix incorrect removeUncomparableFields implementation in AvroUtils
Zhixiong Chen created GOBBLIN-763: - Summary: Fix incorrect removeUncomparableFields implementation in AvroUtils Key: GOBBLIN-763 URL: https://issues.apache.org/jira/browse/GOBBLIN-763 Project: Apache Gobblin Issue Type: Task Reporter: Zhixiong Chen Assignee: Zhixiong Chen StandardManifestRecord: A standard representation of a record from a service that gobblin reads or writes DistributedClasspathManager: A class to add artifacts to the classpath of an MR job -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-763) Fix incorrect AvroUtils.removeUncomparableFields implementation
[ https://issues.apache.org/jira/browse/GOBBLIN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-763: -- Summary: Fix incorrect AvroUtils.removeUncomparableFields implementation (was: Fix incorrect removeUncomparableFields implementation in AvroUtils) > Fix incorrect AvroUtils.removeUncomparableFields implementation > --- > > Key: GOBBLIN-763 > URL: https://issues.apache.org/jira/browse/GOBBLIN-763 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > Currently, -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-763) Fix incorrect removeUncomparableFields implementation in AvroUtils
[ https://issues.apache.org/jira/browse/GOBBLIN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-763: -- Description: Currently, (was: StandardManifestRecord: A standard representation of a record from a service that gobblin reads or writes DistributedClasspathManager: A class to add artifacts to the classpath of an MR job) > Fix incorrect removeUncomparableFields implementation in AvroUtils > -- > > Key: GOBBLIN-763 > URL: https://issues.apache.org/jira/browse/GOBBLIN-763 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > Currently, -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-738) Open a way to customize decoding KafkaConsumerRecord
Zhixiong Chen created GOBBLIN-738: - Summary: Open a way to customize decoding KafkaConsumerRecord Key: GOBBLIN-738 URL: https://issues.apache.org/jira/browse/GOBBLIN-738 Project: Apache Gobblin Issue Type: Task Reporter: Zhixiong Chen Currently, decoding a `KafkaConsumerRecord` is limited to 2 forms: - decode as a `ByteArrayBasedKafkaRecord` message - convert value from a `DecodeableKafkaRecord` message The task is to open a way for arbitrary decoding mechanism -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-716) Add lineage in FileBasedSource
[ https://issues.apache.org/jira/browse/GOBBLIN-716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-716: -- Description: Add lineage in `FileBasedSource` - By default, `FileBasedSource` marks dataset level source lineage - A `PartitionedFileSourceBase` marks partition level source lineage Fix destinations overwritten in `LineageInfo.putDestination(List descriptors, int branchId, State state)`. Multiple calls should append given descriptors > Add lineage in FileBasedSource > -- > > Key: GOBBLIN-716 > URL: https://issues.apache.org/jira/browse/GOBBLIN-716 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > Add lineage in `FileBasedSource` > - By default, `FileBasedSource` marks dataset level source lineage > - A `PartitionedFileSourceBase` marks partition level source lineage > Fix destinations overwritten in `LineageInfo.putDestination(List > descriptors, int branchId, State state)`. Multiple calls should append given > descriptors -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-716) Add lineage in FileBasedSource
[ https://issues.apache.org/jira/browse/GOBBLIN-716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-716: -- Summary: Add lineage in FileBasedSource (was: Add FileBasedSource lineage event) > Add lineage in FileBasedSource > -- > > Key: GOBBLIN-716 > URL: https://issues.apache.org/jira/browse/GOBBLIN-716 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-716) Add FileBasedSource lineage event
[ https://issues.apache.org/jira/browse/GOBBLIN-716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-716: -- Description: (was: It'd be useful to support configuration properties to override the default username when connecting to a HDFS cluster, e.g. in the HDFS writers. The system username that owns the Gobblin process is used by default. One particular use case for this is for stand-alone Gobblin instances running as the `root` system user within a Docker container. Individual users within an organization employing a stand-alone Gobblin cluster for data integration needs across multiple teams may have multiple users submitting jobs meant to touch different parts of the HDFS namespace under the control of separate users. Note that this feature is not quite security-relevant, as this would still allow any job configuration file to specify any username, so there aren't any enforced privilege boundaries anyway. One solution that does not appear to work is to specify the `hadoop.job.ugi` property in a job configuration file, despite what this appears to suggest in [FsDataWriter.java](https://github.com/linkedin/gobblin/blob/7141ec88c255c8c3cbc7054fb8146eebe77fc09d/gobblin-core/src/main/java/gobblin/writer/FsDataWriter.java#L88-L91): ```java Configuration conf = new Configuration(); // Add all job configuration properties so they are picked up by Hadoop JobConfigurationUtils.putStateIntoConfiguration(properties, conf); this.fs = WriterUtils.getWriterFS(properties, this.numBranches, this.branchId); ``` *Github Url* : https://github.com/linkedin/gobblin/issues/1904 *Github Reporter* : *mgomezch* *Github Created At* : 2017-05-26T18:58:16Z *Github Updated At* : 2017-05-26T18:58:16Z) > Add FileBasedSource lineage event > - > > Key: GOBBLIN-716 > URL: https://issues.apache.org/jira/browse/GOBBLIN-716 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-716) Add FileBasedSource lineage event
Zhixiong Chen created GOBBLIN-716: - Summary: Add FileBasedSource lineage event Key: GOBBLIN-716 URL: https://issues.apache.org/jira/browse/GOBBLIN-716 Project: Apache Gobblin Issue Type: Bug Reporter: Zhixiong Chen Assignee: Zhixiong Chen It'd be useful to support configuration properties to override the default username when connecting to a HDFS cluster, e.g. in the HDFS writers. The system username that owns the Gobblin process is used by default. One particular use case for this is for stand-alone Gobblin instances running as the `root` system user within a Docker container. Individual users within an organization employing a stand-alone Gobblin cluster for data integration needs across multiple teams may have multiple users submitting jobs meant to touch different parts of the HDFS namespace under the control of separate users. Note that this feature is not quite security-relevant, as this would still allow any job configuration file to specify any username, so there aren't any enforced privilege boundaries anyway. One solution that does not appear to work is to specify the `hadoop.job.ugi` property in a job configuration file, despite what this appears to suggest in [FsDataWriter.java](https://github.com/linkedin/gobblin/blob/7141ec88c255c8c3cbc7054fb8146eebe77fc09d/gobblin-core/src/main/java/gobblin/writer/FsDataWriter.java#L88-L91): ```java Configuration conf = new Configuration(); // Add all job configuration properties so they are picked up by Hadoop JobConfigurationUtils.putStateIntoConfiguration(properties, conf); this.fs = WriterUtils.getWriterFS(properties, this.numBranches, this.branchId); ``` *Github Url* : https://github.com/linkedin/gobblin/issues/1904 *Github Reporter* : *mgomezch* *Github Created At* : 2017-05-26T18:58:16Z *Github Updated At* : 2017-05-26T18:58:16Z -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-716) Add FileBasedSource lineage event
[ https://issues.apache.org/jira/browse/GOBBLIN-716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-716: -- Issue Type: Task (was: Bug) > Add FileBasedSource lineage event > - > > Key: GOBBLIN-716 > URL: https://issues.apache.org/jira/browse/GOBBLIN-716 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > It'd be useful to support configuration properties to override the default > username when connecting to a HDFS cluster, e.g. in the HDFS writers. The > system username that owns the Gobblin process is used by default. > One particular use case for this is for stand-alone Gobblin instances running > as the `root` system user within a Docker container. Individual users within > an organization employing a stand-alone Gobblin cluster for data integration > needs across multiple teams may have multiple users submitting jobs meant to > touch different parts of the HDFS namespace under the control of separate > users. > Note that this feature is not quite security-relevant, as this would still > allow any job configuration file to specify any username, so there aren't any > enforced privilege boundaries anyway. > One solution that does not appear to work is to specify the `hadoop.job.ugi` > property in a job configuration file, despite what this appears to suggest in > [FsDataWriter.java](https://github.com/linkedin/gobblin/blob/7141ec88c255c8c3cbc7054fb8146eebe77fc09d/gobblin-core/src/main/java/gobblin/writer/FsDataWriter.java#L88-L91): > ```java > Configuration conf = new Configuration(); > // Add all job configuration properties so they are picked up by Hadoop > JobConfigurationUtils.putStateIntoConfiguration(properties, conf); > this.fs = WriterUtils.getWriterFS(properties, this.numBranches, > this.branchId); > ``` > > *Github Url* : https://github.com/linkedin/gobblin/issues/1904 > *Github Reporter* : *mgomezch* > *Github Created At* : 2017-05-26T18:58:16Z > *Github Updated At* : 2017-05-26T18:58:16Z -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-716) Add FileBasedSource lineage event
[ https://issues.apache.org/jira/browse/GOBBLIN-716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-716: -- External issue URL: (was: https://github.com/linkedin/gobblin/issues/1904) > Add FileBasedSource lineage event > - > > Key: GOBBLIN-716 > URL: https://issues.apache.org/jira/browse/GOBBLIN-716 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > It'd be useful to support configuration properties to override the default > username when connecting to a HDFS cluster, e.g. in the HDFS writers. The > system username that owns the Gobblin process is used by default. > One particular use case for this is for stand-alone Gobblin instances running > as the `root` system user within a Docker container. Individual users within > an organization employing a stand-alone Gobblin cluster for data integration > needs across multiple teams may have multiple users submitting jobs meant to > touch different parts of the HDFS namespace under the control of separate > users. > Note that this feature is not quite security-relevant, as this would still > allow any job configuration file to specify any username, so there aren't any > enforced privilege boundaries anyway. > One solution that does not appear to work is to specify the `hadoop.job.ugi` > property in a job configuration file, despite what this appears to suggest in > [FsDataWriter.java](https://github.com/linkedin/gobblin/blob/7141ec88c255c8c3cbc7054fb8146eebe77fc09d/gobblin-core/src/main/java/gobblin/writer/FsDataWriter.java#L88-L91): > ```java > Configuration conf = new Configuration(); > // Add all job configuration properties so they are picked up by Hadoop > JobConfigurationUtils.putStateIntoConfiguration(properties, conf); > this.fs = WriterUtils.getWriterFS(properties, this.numBranches, > this.branchId); > ``` > > *Github Url* : https://github.com/linkedin/gobblin/issues/1904 > *Github Reporter* : *mgomezch* > *Github Created At* : 2017-05-26T18:58:16Z > *Github Updated At* : 2017-05-26T18:58:16Z -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-543) Opensource StandardManifestRecord and DistributedClasspathManager
[ https://issues.apache.org/jira/browse/GOBBLIN-543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-543: -- Description: StandardManifestRecord: A standard representation of a record from a service that gobblin reads or writes DistributedClasspathManager: A class to add artifacts to the classpath of an MR job was: StandardManifestRecord: A standardized record that represents an input record for Gobblin DistributedClasspathManager: A class to add artifacts to the classpath of an MR job > Opensource StandardManifestRecord and DistributedClasspathManager > - > > Key: GOBBLIN-543 > URL: https://issues.apache.org/jira/browse/GOBBLIN-543 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > StandardManifestRecord: A standard representation of a record from a service > that gobblin reads or writes > DistributedClasspathManager: A class to add artifacts to the classpath of an > MR job -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-543) Opensource distributedClasspathManager
[ https://issues.apache.org/jira/browse/GOBBLIN-543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-543: -- Summary: Opensource distributedClasspathManager (was: DistributedClasspathManager) > Opensource distributedClasspathManager > -- > > Key: GOBBLIN-543 > URL: https://issues.apache.org/jira/browse/GOBBLIN-543 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > A class to add artifacts to the classpath of an MR job -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-543) Opensource DistributedClasspathManager
[ https://issues.apache.org/jira/browse/GOBBLIN-543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-543: -- Summary: Opensource DistributedClasspathManager (was: Opensource distributedClasspathManager) > Opensource DistributedClasspathManager > -- > > Key: GOBBLIN-543 > URL: https://issues.apache.org/jira/browse/GOBBLIN-543 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > A class to add artifacts to the classpath of an MR job -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-621) Add utilities
[ https://issues.apache.org/jira/browse/GOBBLIN-621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-621: -- Description: - `TestIOUtils.readAllRecords`: read json data as `GenericRecord` with avro schema - `FileListUtils.getAnyNonHiddenFile` finds the first non-hidden file under a given path in a file system was: - `readAllRecords`: read json data as `GenericRecord` with avro schema - `getAnyNonHiddenFile` finds the first non-hidden file under a given path in a file system > Add utilities > - > > Key: GOBBLIN-621 > URL: https://issues.apache.org/jira/browse/GOBBLIN-621 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > - `TestIOUtils.readAllRecords`: read json data as `GenericRecord` with avro > schema > - `FileListUtils.getAnyNonHiddenFile` finds the first non-hidden file under a > given path in a file system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-621) Add utilities
Zhixiong Chen created GOBBLIN-621: - Summary: Add utilities Key: GOBBLIN-621 URL: https://issues.apache.org/jira/browse/GOBBLIN-621 Project: Apache Gobblin Issue Type: Task Reporter: Zhixiong Chen Assignee: Zhixiong Chen - `readAllRecords`: read json data as `GenericRecord` with avro schema - `getAnyNonHiddenFile` finds the first non-hidden file under a given path in a file system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (GOBBLIN-565) Implement partition level lineage event for job using TimePartitionedDataPublisher
[ https://issues.apache.org/jira/browse/GOBBLIN-565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen closed GOBBLIN-565. - Resolution: Fixed > Implement partition level lineage event for job using > TimePartitionedDataPublisher > -- > > Key: GOBBLIN-565 > URL: https://issues.apache.org/jira/browse/GOBBLIN-565 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > Currently Gobblin reports dataset level lineage, for example, it reports > lineage from `(kafka:topic1)` to `(hdfs:/data/tracking/PageViewEvent)`. The > task is to report lineage from `(kafka:topic1)` to > `(hdfs:/data/tracking/PageViewEvent, hourly/2018/08/15/16)` where > `hourly/2018/08/15/16` is a partition -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-587) Implement partition level lineage for fs based destination
[ https://issues.apache.org/jira/browse/GOBBLIN-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-587: -- Summary: Implement partition level lineage for fs based destination (was: Implement gobblin fs sink partition level lineage) > Implement partition level lineage for fs based destination > -- > > Key: GOBBLIN-587 > URL: https://issues.apache.org/jira/browse/GOBBLIN-587 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > Currently, gobblin lineage is sent at dataset level. The task is to send > partition level lineage for fs sink. An example kafka-hdfs partition lineage > is > {code:java} > { > "timestamp": 1536785248451, > "namespace": { > "string": "gobblin.event.lineage" > }, > "name": "LoginEvent", > "metadata": { > "destination": > "{\"object-type\":\"org.apache.gobblin.dataset.PartitionDescriptor\",\"object-data\":{\"dataset\":{\"object-type\":\"org.apache.gobblin.dataset.DatasetDescriptor\",\"object-data\":{\"platform\":\"hdfs\",\"metadata\":{\"branch\":\"0\"},\"name\":\"/data/tracking/LoginEvent\"}},\"name\":\"hourly/2018/09/12/12\"}}", > "eventType": "LineageEvent", > "source": > "{\"object-type\":\"org.apache.gobblin.dataset.DatasetDescriptor\",\"object-data\":{\"platform\":\"kafka\",\"metadata\":{},\"name\":\"LoginEvent\"}}", > "metricContextName": > "org.apache.gobblin.runtime.SafeDatasetCommit.1693032310", > "metricContextID": "1a7895b0-9e93-414e-ac0b-038f9375c82e", > "class": "org.apache.gobblin.runtime.SafeDatasetCommit", > } > } > {code} > {color:#d04437}*Note*{color}: Lineage is not available automatically. You > might have to implement the support in your source-destination pair. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-587) Implement gobblin fs sink partition level lineage
[ https://issues.apache.org/jira/browse/GOBBLIN-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-587: -- Description: Currently, gobblin lineage is sent at dataset level. The task is to send partition level lineage for fs sink. An example kafka-hdfs partition lineage is {code:java} { "timestamp": 1536785248451, "namespace": { "string": "gobblin.event.lineage" }, "name": "LoginEvent", "metadata": { "destination": "{\"object-type\":\"org.apache.gobblin.dataset.PartitionDescriptor\",\"object-data\":{\"dataset\":{\"object-type\":\"org.apache.gobblin.dataset.DatasetDescriptor\",\"object-data\":{\"platform\":\"hdfs\",\"metadata\":{\"branch\":\"0\"},\"name\":\"/data/tracking/LoginEvent\"}},\"name\":\"hourly/2018/09/12/12\"}}", "eventType": "LineageEvent", "source": "{\"object-type\":\"org.apache.gobblin.dataset.DatasetDescriptor\",\"object-data\":{\"platform\":\"kafka\",\"metadata\":{},\"name\":\"LoginEvent\"}}", "metricContextName": "org.apache.gobblin.runtime.SafeDatasetCommit.1693032310", "metricContextID": "1a7895b0-9e93-414e-ac0b-038f9375c82e", "class": "org.apache.gobblin.runtime.SafeDatasetCommit", } } {code} {color:#d04437}*Note*{color}: Lineage is not available automatically. You might have to implement the support in your source-destination pair. was: Currently, gobblin lineage is sent at dataset level. The task is to send partition level lineage for fs sink. An example kafka-hdfs partition lineage is {code:java} { "timestamp": 1536785248451, "namespace": { "string": "gobblin.event.lineage" }, "name": "LoginEvent", "metadata": { "destination": "{\"object-type\":\"org.apache.gobblin.dataset.PartitionDescriptor\",\"object-data\":{\"dataset\":{\"object-type\":\"org.apache.gobblin.dataset.DatasetDescriptor\",\"object-data\":{\"platform\":\"hdfs\",\"metadata\":{\"branch\":\"0\"},\"name\":\"/data/tracking/LoginEvent\"}},\"name\":\"hourly/2018/09/12/12\"}}", "eventType": "LineageEvent", "source": "{\"object-type\":\"org.apache.gobblin.dataset.DatasetDescriptor\",\"object-data\":{\"platform\":\"kafka\",\"metadata\":{},\"name\":\"LoginEvent\"}}", "metricContextName": "org.apache.gobblin.runtime.SafeDatasetCommit.1693032310", "metricContextID": "1a7895b0-9e93-414e-ac0b-038f9375c82e", "class": "org.apache.gobblin.runtime.SafeDatasetCommit", } } {code} Note: Lineage is not available automatically. You might have to implement the support in your source-destination pair. > Implement gobblin fs sink partition level lineage > - > > Key: GOBBLIN-587 > URL: https://issues.apache.org/jira/browse/GOBBLIN-587 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > Currently, gobblin lineage is sent at dataset level. The task is to send > partition level lineage for fs sink. An example kafka-hdfs partition lineage > is > {code:java} > { > "timestamp": 1536785248451, > "namespace": { > "string": "gobblin.event.lineage" > }, > "name": "LoginEvent", > "metadata": { > "destination": > "{\"object-type\":\"org.apache.gobblin.dataset.PartitionDescriptor\",\"object-data\":{\"dataset\":{\"object-type\":\"org.apache.gobblin.dataset.DatasetDescriptor\",\"object-data\":{\"platform\":\"hdfs\",\"metadata\":{\"branch\":\"0\"},\"name\":\"/data/tracking/LoginEvent\"}},\"name\":\"hourly/2018/09/12/12\"}}", > "eventType": "LineageEvent", > "source": > "{\"object-type\":\"org.apache.gobblin.dataset.DatasetDescriptor\",\"object-data\":{\"platform\":\"kafka\",\"metadata\":{},\"name\":\"LoginEvent\"}}", > "metricContextName": > "org.apache.gobblin.runtime.SafeDatasetCommit.1693032310", > "metricContextID": "1a7895b0-9e93-414e-ac0b-038f9375c82e", > "class": "org.apache.gobblin.runtime.SafeDatasetCommit", > } > } > {code} > {color:#d04437}*Note*{color}: Lineage is not available automatically. You > might have to implement the support in your source-destination pair. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-587) Implement gobblin fs sink partition level lineage
[ https://issues.apache.org/jira/browse/GOBBLIN-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-587: -- Description: Currently, gobblin lineage is sent at dataset level. The task is to send partition level lineage for fs sink. An example kafka-hdfs partition lineage is {code:java} { "timestamp": 1536785248451, "namespace": { "string": "gobblin.event.lineage" }, "name": "LoginEvent", "metadata": { "destination": "{\"object-type\":\"org.apache.gobblin.dataset.PartitionDescriptor\",\"object-data\":{\"dataset\":{\"object-type\":\"org.apache.gobblin.dataset.DatasetDescriptor\",\"object-data\":{\"platform\":\"hdfs\",\"metadata\":{\"branch\":\"0\"},\"name\":\"/data/tracking/LoginEvent\"}},\"name\":\"hourly/2018/09/12/12\"}}", "eventType": "LineageEvent", "source": "{\"object-type\":\"org.apache.gobblin.dataset.DatasetDescriptor\",\"object-data\":{\"platform\":\"kafka\",\"metadata\":{},\"name\":\"LoginEvent\"}}", "metricContextName": "org.apache.gobblin.runtime.SafeDatasetCommit.1693032310", "metricContextID": "1a7895b0-9e93-414e-ac0b-038f9375c82e", "class": "org.apache.gobblin.runtime.SafeDatasetCommit", } } {code} Note: Lineage is not available automatically. You might have to implement the support in your source-destination pair. was: Currently, gobblin lineage is sent at dataset level. The task is to send partition level lineage for fs sink. An example kafka-hdfs partition lineage is {code:java} { "timestamp": 1536785248451, "namespace": { "string": "gobblin.event.lineage" }, "name": "LoginEvent", "metadata": { "destination": "{\"object-type\":\"org.apache.gobblin.dataset.PartitionDescriptor\",\"object-data\":{\"dataset\":{\"object-type\":\"org.apache.gobblin.dataset.DatasetDescriptor\",\"object-data\":{\"platform\":\"hdfs\",\"metadata\":{\"branch\":\"0\"},\"name\":\"/tmp/zhchen/data/tracking/LoginEvent\"}},\"name\":\"hourly/2018/09/12/12\"}}", "eventType": "LineageEvent", "source": "{\"object-type\":\"org.apache.gobblin.dataset.DatasetDescriptor\",\"object-data\":{\"platform\":\"kafka\",\"metadata\":{},\"name\":\"LoginEvent\"}}", "metricContextName": "org.apache.gobblin.runtime.SafeDatasetCommit.1693032310", "metricContextID": "1a7895b0-9e93-414e-ac0b-038f9375c82e", "class": "org.apache.gobblin.runtime.SafeDatasetCommit", } } {code} > Implement gobblin fs sink partition level lineage > - > > Key: GOBBLIN-587 > URL: https://issues.apache.org/jira/browse/GOBBLIN-587 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > Currently, gobblin lineage is sent at dataset level. The task is to send > partition level lineage for fs sink. An example kafka-hdfs partition lineage > is > {code:java} > { > "timestamp": 1536785248451, > "namespace": { > "string": "gobblin.event.lineage" > }, > "name": "LoginEvent", > "metadata": { > "destination": > "{\"object-type\":\"org.apache.gobblin.dataset.PartitionDescriptor\",\"object-data\":{\"dataset\":{\"object-type\":\"org.apache.gobblin.dataset.DatasetDescriptor\",\"object-data\":{\"platform\":\"hdfs\",\"metadata\":{\"branch\":\"0\"},\"name\":\"/data/tracking/LoginEvent\"}},\"name\":\"hourly/2018/09/12/12\"}}", > "eventType": "LineageEvent", > "source": > "{\"object-type\":\"org.apache.gobblin.dataset.DatasetDescriptor\",\"object-data\":{\"platform\":\"kafka\",\"metadata\":{},\"name\":\"LoginEvent\"}}", > "metricContextName": > "org.apache.gobblin.runtime.SafeDatasetCommit.1693032310", > "metricContextID": "1a7895b0-9e93-414e-ac0b-038f9375c82e", > "class": "org.apache.gobblin.runtime.SafeDatasetCommit", > } > } > {code} > Note: Lineage is not available automatically. You might have to implement the > support in your source-destination pair. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-587) Implement gobblin fs sink partition level lineage
Zhixiong Chen created GOBBLIN-587: - Summary: Implement gobblin fs sink partition level lineage Key: GOBBLIN-587 URL: https://issues.apache.org/jira/browse/GOBBLIN-587 Project: Apache Gobblin Issue Type: Task Reporter: Zhixiong Chen Assignee: Zhixiong Chen Currently, gobblin lineage is sent at dataset level. The task is to send partition level lineage for fs sink. An example kafka-hdfs partition lineage is {code:java} { "timestamp": 1536785248451, "namespace": { "string": "gobblin.event.lineage" }, "name": "LoginEvent", "metadata": { "destination": "{\"object-type\":\"org.apache.gobblin.dataset.PartitionDescriptor\",\"object-data\":{\"dataset\":{\"object-type\":\"org.apache.gobblin.dataset.DatasetDescriptor\",\"object-data\":{\"platform\":\"hdfs\",\"metadata\":{\"branch\":\"0\"},\"name\":\"/tmp/zhchen/data/tracking/LoginEvent\"}},\"name\":\"hourly/2018/09/12/12\"}}", "eventType": "LineageEvent", "source": "{\"object-type\":\"org.apache.gobblin.dataset.DatasetDescriptor\",\"object-data\":{\"platform\":\"kafka\",\"metadata\":{},\"name\":\"LoginEvent\"}}", "metricContextName": "org.apache.gobblin.runtime.SafeDatasetCommit.1693032310", "metricContextID": "1a7895b0-9e93-414e-ac0b-038f9375c82e", "class": "org.apache.gobblin.runtime.SafeDatasetCommit", } } {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-576) Send partition level lineage in hive distcp
[ https://issues.apache.org/jira/browse/GOBBLIN-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-576: -- Description: Currently hive distcp only supports dataset/table level lineage. The task is to send lineage at the table partition level if any. > Send partition level lineage in hive distcp > --- > > Key: GOBBLIN-576 > URL: https://issues.apache.org/jira/browse/GOBBLIN-576 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > Currently hive distcp only supports dataset/table level lineage. The task is > to send lineage at the table partition level if any. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-576) Send partition level lineage in hive distcp
Zhixiong Chen created GOBBLIN-576: - Summary: Send partition level lineage in hive distcp Key: GOBBLIN-576 URL: https://issues.apache.org/jira/browse/GOBBLIN-576 Project: Apache Gobblin Issue Type: Task Reporter: Zhixiong Chen Assignee: Zhixiong Chen -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-565) Implement partition level lineage event for job using TimePartitionedDataPublisher
Zhixiong Chen created GOBBLIN-565: - Summary: Implement partition level lineage event for job using TimePartitionedDataPublisher Key: GOBBLIN-565 URL: https://issues.apache.org/jira/browse/GOBBLIN-565 Project: Apache Gobblin Issue Type: Task Reporter: Zhixiong Chen Assignee: Zhixiong Chen Currently Gobblin reports dataset level lineage, for example, it reports lineage from `(kafka:topic1)` to `(hdfs:/data/tracking/PageViewEvent)`. The task is to report lineage from `(kafka:topic1)` to `(hdfs:/data/tracking/PageViewEvent, hourly/2018/08/15/16)` where `hourly/2018/08/15/16` is a partition -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-543) DistributedClasspathManager
Zhixiong Chen created GOBBLIN-543: - Summary: DistributedClasspathManager Key: GOBBLIN-543 URL: https://issues.apache.org/jira/browse/GOBBLIN-543 Project: Apache Gobblin Issue Type: Task Reporter: Zhixiong Chen Assignee: Zhixiong Chen A class to add artifacts to the classpath of an MR job -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-517) Add missing apache license info
Zhixiong Chen created GOBBLIN-517: - Summary: Add missing apache license info Key: GOBBLIN-517 URL: https://issues.apache.org/jira/browse/GOBBLIN-517 Project: Apache Gobblin Issue Type: Bug Reporter: Zhixiong Chen Assignee: Zhixiong Chen -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-489) Implement PusherFactory
[ https://issues.apache.org/jira/browse/GOBBLIN-489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-489: -- Summary: Implement PusherFactory (was: Create PusherFactory) > Implement PusherFactory > --- > > Key: GOBBLIN-489 > URL: https://issues.apache.org/jira/browse/GOBBLIN-489 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > An `PusherFactory` creates a `Pusher`. Changes are: > * `PusherFactory` and gobblin scope specific factory > `GobblinScopePusherFactory` > * Load broker config from configurable multiple namespaces besides > `gobblin.broker` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-489) Implement PusherFactory
[ https://issues.apache.org/jira/browse/GOBBLIN-489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-489: -- Description: A `PusherFactory` creates a `Pusher`. Changes are: * `PusherFactory` and gobblin scope specific factory `GobblinScopePusherFactory` * Load broker config from configurable multiple namespaces besides `gobblin.broker` was: An `PusherFactory` creates a `Pusher`. Changes are: * `PusherFactory` and gobblin scope specific factory `GobblinScopePusherFactory` * Load broker config from configurable multiple namespaces besides `gobblin.broker` > Implement PusherFactory > --- > > Key: GOBBLIN-489 > URL: https://issues.apache.org/jira/browse/GOBBLIN-489 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > A `PusherFactory` creates a `Pusher`. Changes are: > * `PusherFactory` and gobblin scope specific factory > `GobblinScopePusherFactory` > * Load broker config from configurable multiple namespaces besides > `gobblin.broker` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-489) Create PusherFactory
[ https://issues.apache.org/jira/browse/GOBBLIN-489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-489: -- Description: An `PusherFactory` creates a `Pusher`. Changes are: * `PusherFactory` and gobblin scope specific factory `GobblinScopePusherFactory` * Load broker config from configurable multiple namespaces besides `gobblin.broker` was: An `EventProducer` produces an event and sends it out with a `Pusher`. The changes are: * An `EventProducer` class and its corresponding `EventProducerFactory` which creates a shared `EventProducer` instance * Load broker config from configurable multiple namespaces besides `gobblin.broker` > Create PusherFactory > > > Key: GOBBLIN-489 > URL: https://issues.apache.org/jira/browse/GOBBLIN-489 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > An `PusherFactory` creates a `Pusher`. Changes are: > * `PusherFactory` and gobblin scope specific factory > `GobblinScopePusherFactory` > * Load broker config from configurable multiple namespaces besides > `gobblin.broker` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-489) Create PusherFactory
[ https://issues.apache.org/jira/browse/GOBBLIN-489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-489: -- Summary: Create PusherFactory (was: Create general EventProducer with a Pusher) > Create PusherFactory > > > Key: GOBBLIN-489 > URL: https://issues.apache.org/jira/browse/GOBBLIN-489 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > An `EventProducer` produces an event and sends it out with a `Pusher`. The > changes are: > * An `EventProducer` class and its corresponding `EventProducerFactory` > which creates a shared `EventProducer` instance > * Load broker config from configurable multiple namespaces besides > `gobblin.broker` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-501) Fix NPE thrown from read after EOF of LazyMaterializeDecryptorInputStream
Zhixiong Chen created GOBBLIN-501: - Summary: Fix NPE thrown from read after EOF of LazyMaterializeDecryptorInputStream Key: GOBBLIN-501 URL: https://issues.apache.org/jira/browse/GOBBLIN-501 Project: Apache Gobblin Issue Type: Bug Reporter: Zhixiong Chen Assignee: Zhixiong Chen A `read` call to a LazyMaterializeDecryptorInputStream when it reaches EOF will throw a NPE. The fix is to return `-1` for any read after EOF. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-493) Fix build issue in GithubDataEventTypesPartitioner
Zhixiong Chen created GOBBLIN-493: - Summary: Fix build issue in GithubDataEventTypesPartitioner Key: GOBBLIN-493 URL: https://issues.apache.org/jira/browse/GOBBLIN-493 Project: Apache Gobblin Issue Type: Task Reporter: Zhixiong Chen Assignee: Zhixiong Chen Build failure because of `GithubDataEventTypesPartitioner` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (GOBBLIN-489) Create general EventProducer with a Pusher
[ https://issues.apache.org/jira/browse/GOBBLIN-489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen reassigned GOBBLIN-489: - Assignee: Zhixiong Chen > Create general EventProducer with a Pusher > -- > > Key: GOBBLIN-489 > URL: https://issues.apache.org/jira/browse/GOBBLIN-489 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > An `EventProducer` produces an event and sends it out with a `Pusher`. The > changes are: > * An `EventProducer` class and its corresponding `EventProducerFactory` > which creates a shared `EventProducer` instance > * Load broker config from configurable multiple namespaces besides > `gobblin.broker` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-488) Make `AsyncRequest` aware of records
Zhixiong Chen created GOBBLIN-488: - Summary: Make `AsyncRequest` aware of records Key: GOBBLIN-488 URL: https://issues.apache.org/jira/browse/GOBBLIN-488 Project: Apache Gobblin Issue Type: Task Reporter: Zhixiong Chen Assignee: Zhixiong Chen Currently, while building an `AsyncRequest` from a collection of records, it doesn't know what records are processed. The change is to make it records aware so that a `ResponseHandler` can do post-process the records if necessary -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-487) Integrate PasswordManager in R2RestWriterBuilder
Zhixiong Chen created GOBBLIN-487: - Summary: Integrate PasswordManager in R2RestWriterBuilder Key: GOBBLIN-487 URL: https://issues.apache.org/jira/browse/GOBBLIN-487 Project: Apache Gobblin Issue Type: Task Reporter: Zhixiong Chen Assignee: Zhixiong Chen -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-482) Add http write documentation
[ https://issues.apache.org/jira/browse/GOBBLIN-482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-482: -- Description: {color:#33}The old http write framework under `AbstractHttpWriter` and `AbstractHttpWriterBuilder`is deprecated! Use `AsyncHttpWriter` and `AsyncHttpWriterBuilder` instead{color} (was: The old http write framework under [AbstractHttpWriter|{color:#ffc66d}[https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/writer/http/AbstractHttpWriter.java]{color}] and {color:#287bde}[`AbstractHttpWriterBuilder`|{color}{color:#ffc66d}[https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/writer/http/AbstractHttpWriterBuilder.java]{color}] is deprecated! Use {color:#808080}`AsyncHttpWriter` {color}and {color:#808080}`AsyncHttpWriterBuilder` {color}instead) > Add http write documentation > > > Key: GOBBLIN-482 > URL: https://issues.apache.org/jira/browse/GOBBLIN-482 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > {color:#33}The old http write framework under `AbstractHttpWriter` and > `AbstractHttpWriterBuilder`is deprecated! Use `AsyncHttpWriter` and > `AsyncHttpWriterBuilder` instead{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-482) Add http write documentation
[ https://issues.apache.org/jira/browse/GOBBLIN-482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-482: -- Description: The old http write framework under [AbstractHttpWriter|{color:#ffc66d}[https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/writer/http/AbstractHttpWriter.java]{color}] and {color:#287bde}[`AbstractHttpWriterBuilder`|{color}{color:#ffc66d}[https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/writer/http/AbstractHttpWriterBuilder.java]{color}] is deprecated! Use {color:#808080}`AsyncHttpWriter` {color}and {color:#808080}`AsyncHttpWriterBuilder` {color}instead was: The old http write framework under [`AbstractHttpWriter`|{color:#ffc66d}https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/writer/http/AbstractHttpWriter.java{color}] and {color:#287bde}[`AbstractHttpWriterBuilder`|{color}{color:#ffc66d}https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/writer/http/AbstractHttpWriterBuilder.java{color}] is deprecated! Use {color:#808080}`AsyncHttpWriter` {color}and {color:#808080}`AsyncHttpWriterBuilder` {color}instead > Add http write documentation > > > Key: GOBBLIN-482 > URL: https://issues.apache.org/jira/browse/GOBBLIN-482 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > The old http write framework under > [AbstractHttpWriter|{color:#ffc66d}[https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/writer/http/AbstractHttpWriter.java]{color}] > and > {color:#287bde}[`AbstractHttpWriterBuilder`|{color}{color:#ffc66d}[https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/writer/http/AbstractHttpWriterBuilder.java]{color}] > is deprecated! Use {color:#808080}`AsyncHttpWriter` {color}and > {color:#808080}`AsyncHttpWriterBuilder` {color}instead -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-435) Fix data publisher created from job broker not closed
Zhixiong Chen created GOBBLIN-435: - Summary: Fix data publisher created from job broker not closed Key: GOBBLIN-435 URL: https://issues.apache.org/jira/browse/GOBBLIN-435 Project: Apache Gobblin Issue Type: Task Reporter: Zhixiong Chen Assignee: Zhixiong Chen -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-430) Add lineage in SalesforceSource
[ https://issues.apache.org/jira/browse/GOBBLIN-430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-430: -- Description: - Set source lineage info into work units generated by `SalesforceSource` - Full lineage events can be sent if `SalesforceSource` is used together with a writer/publisher which put destination lineage info was:Add lineage in `SalesforceSource` > Add lineage in SalesforceSource > --- > > Key: GOBBLIN-430 > URL: https://issues.apache.org/jira/browse/GOBBLIN-430 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > - Set source lineage info into work units generated by `SalesforceSource` > - Full lineage events can be sent if `SalesforceSource` is used together > with a writer/publisher which put destination lineage info -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-430) Add lineage in SalesforceSource
[ https://issues.apache.org/jira/browse/GOBBLIN-430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-430: -- Summary: Add lineage in SalesforceSource (was: Add lineage for salesforce source) > Add lineage in SalesforceSource > --- > > Key: GOBBLIN-430 > URL: https://issues.apache.org/jira/browse/GOBBLIN-430 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > Add lineage in `SalesforceSource` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-430) Add lineage for salesforce source
Zhixiong Chen created GOBBLIN-430: - Summary: Add lineage for salesforce source Key: GOBBLIN-430 URL: https://issues.apache.org/jira/browse/GOBBLIN-430 Project: Apache Gobblin Issue Type: Task Reporter: Zhixiong Chen Assignee: Zhixiong Chen Add lineage in `SalesforceSource` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-395) Add lineage for copying config based dataset
Zhixiong Chen created GOBBLIN-395: - Summary: Add lineage for copying config based dataset Key: GOBBLIN-395 URL: https://issues.apache.org/jira/browse/GOBBLIN-395 Project: Apache Gobblin Issue Type: Task Reporter: Zhixiong Chen Assignee: Zhixiong Chen Set file system based source and destination datasets for `CopyableFile`s of `ConfigBasedDataset`s -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (GOBBLIN-374) GobblinMetrics failed to close event reporters
[ https://issues.apache.org/jira/browse/GOBBLIN-374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen closed GOBBLIN-374. - Resolution: Fixed > GobblinMetrics failed to close event reporters > -- > > Key: GOBBLIN-374 > URL: https://issues.apache.org/jira/browse/GOBBLIN-374 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > A GobblinMetrics instance is cached as a soft value, which can be GC'ed > inadvertently without knowing that it is required to close the event > reporters when job completes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (GOBBLIN-380) Add log about time elapsed for waiting services to be healthy
[ https://issues.apache.org/jira/browse/GOBBLIN-380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen closed GOBBLIN-380. - Resolution: Fixed > Add log about time elapsed for waiting services to be healthy > - > > Key: GOBBLIN-380 > URL: https://issues.apache.org/jira/browse/GOBBLIN-380 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > Logs are added for `QuartzJobSpecScheduler` and > `StandardGobblinInstanceDriver` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-380) Add log about time elapsed for waiting services to be healthy
Zhixiong Chen created GOBBLIN-380: - Summary: Add log about time elapsed for waiting services to be healthy Key: GOBBLIN-380 URL: https://issues.apache.org/jira/browse/GOBBLIN-380 Project: Apache Gobblin Issue Type: Task Reporter: Zhixiong Chen Assignee: Zhixiong Chen Logs are added for `QuartzJobSpecScheduler` and `StandardGobblinInstanceDriver` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-374) GobblinMetrics failed to close event reporters
Zhixiong Chen created GOBBLIN-374: - Summary: GobblinMetrics failed to close event reporters Key: GOBBLIN-374 URL: https://issues.apache.org/jira/browse/GOBBLIN-374 Project: Apache Gobblin Issue Type: Bug Reporter: Zhixiong Chen A GobblinMetrics instance is cached as a soft value, which can be GC'ed inadvertently without knowing that it is required to close the event reporters when job completes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (GOBBLIN-374) GobblinMetrics failed to close event reporters
[ https://issues.apache.org/jira/browse/GOBBLIN-374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen reassigned GOBBLIN-374: - Assignee: Zhixiong Chen > GobblinMetrics failed to close event reporters > -- > > Key: GOBBLIN-374 > URL: https://issues.apache.org/jira/browse/GOBBLIN-374 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > > A GobblinMetrics instance is cached as a soft value, which can be GC'ed > inadvertently without knowing that it is required to close the event > reporters when job completes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-364) Exclude JobState from WorkUnit created by PartitionedFileSourceBase
Zhixiong Chen created GOBBLIN-364: - Summary: Exclude JobState from WorkUnit created by PartitionedFileSourceBase Key: GOBBLIN-364 URL: https://issues.apache.org/jira/browse/GOBBLIN-364 Project: Apache Gobblin Issue Type: Task Reporter: Zhixiong Chen Assignee: Zhixiong Chen Currently, each `WorkUnit` created by a `PartitionedFileSourceBase` source has a copy of the entire job configurations. For the following 2 reasons, we want to exclude job configurations from `WorkUnit`: - It's redundant as the runtime counterpart of `WorkUnit`, which is `WorkUnitState`, would have a reference to all job configurations. - Adding job configurations to `WorkUnit` has the bad side effect of masking dynamic job level configurations in MR Task runner -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (GOBBLIN-344) Fix help method getResolver in LineageInfo is private
[ https://issues.apache.org/jira/browse/GOBBLIN-344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-344: -- Summary: Fix help method getResolver in LineageInfo is private (was: Fix getResolver help method in LineageInfo is private) > Fix help method getResolver in LineageInfo is private > - > > Key: GOBBLIN-344 > URL: https://issues.apache.org/jira/browse/GOBBLIN-344 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen > > In the PR https://github.com/apache/incubator-gobblin/pull/2187, I mistakenly > made help method `LineageInfo#getResolver` private. It should be `public`. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-344) Fix getResolver help method in LineageInfo is private
Zhixiong Chen created GOBBLIN-344: - Summary: Fix getResolver help method in LineageInfo is private Key: GOBBLIN-344 URL: https://issues.apache.org/jira/browse/GOBBLIN-344 Project: Apache Gobblin Issue Type: Bug Reporter: Zhixiong Chen Assignee: Zhixiong Chen In the PR https://github.com/apache/incubator-gobblin/pull/2187, I mistakenly made help method `LineageInfo#getResolver` private. It should be `public`. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (GOBBLIN-319) Add DatasetResolver to transform raw Gobblin dataset to application specific dataset
[ https://issues.apache.org/jira/browse/GOBBLIN-319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-319: -- Description: - Add DatasetResolver to transform raw Gobblin dataset to application specific dataset - Fix lineage info not set while publishing single task data in `BaseDataPublisher` was: - Add `exampleDataDir` metadata for file system based datasets - Fix lineage info not set while publishing single task data in `BaseDataPublisher` > Add DatasetResolver to transform raw Gobblin dataset to application specific > dataset > > > Key: GOBBLIN-319 > URL: https://issues.apache.org/jira/browse/GOBBLIN-319 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen > > - Add DatasetResolver to transform raw Gobblin dataset to application > specific dataset > - Fix lineage info not set while publishing single task data in > `BaseDataPublisher` -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (GOBBLIN-319) Add DatasetResolver to transform raw Gobblin dataset to application specific dataset
[ https://issues.apache.org/jira/browse/GOBBLIN-319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-319: -- Summary: Add DatasetResolver to transform raw Gobblin dataset to application specific dataset (was: Add exampleDataDir when sending file system based dataset lineage) > Add DatasetResolver to transform raw Gobblin dataset to application specific > dataset > > > Key: GOBBLIN-319 > URL: https://issues.apache.org/jira/browse/GOBBLIN-319 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen > > - Add `exampleDataDir` metadata for file system based datasets > - Fix lineage info not set while publishing single task data in > `BaseDataPublisher` -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (GOBBLIN-319) Add exampleDataDir when sending file system based dataset lineage
[ https://issues.apache.org/jira/browse/GOBBLIN-319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-319: -- Summary: Add exampleDataDir when sending file system based dataset lineage (was: Lineage event not sent for publishing single task) > Add exampleDataDir when sending file system based dataset lineage > -- > > Key: GOBBLIN-319 > URL: https://issues.apache.org/jira/browse/GOBBLIN-319 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen > > - Fix lineage info not set while publishing single task data in > `BaseDataPublisher` > - Add `exampleDataDir` metadata for file system based datasets -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (GOBBLIN-319) Add exampleDataDir when sending file system based dataset lineage
[ https://issues.apache.org/jira/browse/GOBBLIN-319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-319: -- Description: - Add `exampleDataDir` metadata for file system based datasets - Fix lineage info not set while publishing single task data in `BaseDataPublisher` was: - Fix lineage info not set while publishing single task data in `BaseDataPublisher` - Add `exampleDataDir` metadata for file system based datasets > Add exampleDataDir when sending file system based dataset lineage > -- > > Key: GOBBLIN-319 > URL: https://issues.apache.org/jira/browse/GOBBLIN-319 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen > > - Add `exampleDataDir` metadata for file system based datasets > - Fix lineage info not set while publishing single task data in > `BaseDataPublisher` -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (GOBBLIN-319) Lineage event not sent for publishing single task
[ https://issues.apache.org/jira/browse/GOBBLIN-319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-319: -- Description: - Fix lineage info not set while publishing single task data in `BaseDataPublisher` - Add `exampleDataDir` metadata for file system based datasets > Lineage event not sent for publishing single task > - > > Key: GOBBLIN-319 > URL: https://issues.apache.org/jira/browse/GOBBLIN-319 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen > > - Fix lineage info not set while publishing single task data in > `BaseDataPublisher` > - Add `exampleDataDir` metadata for file system based datasets -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-319) Lineage event not sent for publishing single task
Zhixiong Chen created GOBBLIN-319: - Summary: Lineage event not sent for publishing single task Key: GOBBLIN-319 URL: https://issues.apache.org/jira/browse/GOBBLIN-319 Project: Apache Gobblin Issue Type: Bug Reporter: Zhixiong Chen Assignee: Zhixiong Chen -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-315) Fix shaded avro is used in LineageEventBuilder
Zhixiong Chen created GOBBLIN-315: - Summary: Fix shaded avro is used in LineageEventBuilder Key: GOBBLIN-315 URL: https://issues.apache.org/jira/browse/GOBBLIN-315 Project: Apache Gobblin Issue Type: Task Reporter: Zhixiong Chen Assignee: Zhixiong Chen -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (GOBBLIN-307) Implement lineage event in gobblin
[ https://issues.apache.org/jira/browse/GOBBLIN-307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-307: -- Summary: Implement lineage event in gobblin (was: Define lineage event in gobblin) > Implement lineage event in gobblin > -- > > Key: GOBBLIN-307 > URL: https://issues.apache.org/jira/browse/GOBBLIN-307 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-307) Define lineage event in gobblin
Zhixiong Chen created GOBBLIN-307: - Summary: Define lineage event in gobblin Key: GOBBLIN-307 URL: https://issues.apache.org/jira/browse/GOBBLIN-307 Project: Apache Gobblin Issue Type: Task Reporter: Zhixiong Chen Assignee: Zhixiong Chen -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-305) Add csv-kafka and kafka-hdfs template
Zhixiong Chen created GOBBLIN-305: - Summary: Add csv-kafka and kafka-hdfs template Key: GOBBLIN-305 URL: https://issues.apache.org/jira/browse/GOBBLIN-305 Project: Apache Gobblin Issue Type: Task Reporter: Zhixiong Chen Assignee: Zhixiong Chen -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (GOBBLIN-296) Kafka json source and writer
[ https://issues.apache.org/jira/browse/GOBBLIN-296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-296: -- Description: - Add a json source and writer for kafka 09: `Kafka09JsonSource` and `Kafka09JsonObjectWriterBuilder` - Move common gson ser/de logic to gobblin-kafka-common module > Kafka json source and writer > > > Key: GOBBLIN-296 > URL: https://issues.apache.org/jira/browse/GOBBLIN-296 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen > > - Add a json source and writer for kafka 09: `Kafka09JsonSource` and > `Kafka09JsonObjectWriterBuilder` > - Move common gson ser/de logic to gobblin-kafka-common module -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-296) Kafka json source and writer
Zhixiong Chen created GOBBLIN-296: - Summary: Kafka json source and writer Key: GOBBLIN-296 URL: https://issues.apache.org/jira/browse/GOBBLIN-296 Project: Apache Gobblin Issue Type: Task Reporter: Zhixiong Chen Assignee: Zhixiong Chen -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-283) Refactor EnvelopePayloadConverter to support multi fields conversion
Zhixiong Chen created GOBBLIN-283: - Summary: Refactor EnvelopePayloadConverter to support multi fields conversion Key: GOBBLIN-283 URL: https://issues.apache.org/jira/browse/GOBBLIN-283 Project: Apache Gobblin Issue Type: Task Reporter: Zhixiong Chen Assignee: Zhixiong Chen -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (GOBBLIN-278) Fix sending lineage event for KafkaSource
[ https://issues.apache.org/jira/browse/GOBBLIN-278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated GOBBLIN-278: -- Description: 1. Fix lineage event for KafkaSource not send, and void resending the events by removing configurations with key prefix `gobblin.lineage` from the state 2. Fix `KafkaWorkUnitPacker` disregards existing configurations of work units > Fix sending lineage event for KafkaSource > - > > Key: GOBBLIN-278 > URL: https://issues.apache.org/jira/browse/GOBBLIN-278 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen > > 1. Fix lineage event for KafkaSource not send, and void resending the events > by removing configurations with key prefix `gobblin.lineage` from the state > 2. Fix `KafkaWorkUnitPacker` disregards existing configurations of work units -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-278) Fix sending lineage event for KafkaSource
Zhixiong Chen created GOBBLIN-278: - Summary: Fix sending lineage event for KafkaSource Key: GOBBLIN-278 URL: https://issues.apache.org/jira/browse/GOBBLIN-278 Project: Apache Gobblin Issue Type: Bug Reporter: Zhixiong Chen Assignee: Zhixiong Chen -- This message was sent by Atlassian JIRA (v6.4.14#64029)