[jira] [Updated] (GOBBLIN-1174) Fail job on FileBasedSource ls invalid source directory

2020-06-01 Thread Zhixiong Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-1174:
---
Summary: Fail job on FileBasedSource ls invalid source directory  (was: 
Fail job on FileBasedSource with invalid source directory)

> Fail job on FileBasedSource ls invalid source directory
> ---
>
> Key: GOBBLIN-1174
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1174
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-1174) Fail job on FileBasedSource with invalid source directory

2020-06-01 Thread Zhixiong Chen (Jira)
Zhixiong Chen created GOBBLIN-1174:
--

 Summary: Fail job on FileBasedSource with invalid source directory
 Key: GOBBLIN-1174
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1174
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-1146) Allow configuring autocommit in JDBCWriters

2020-05-12 Thread Zhixiong Chen (Jira)
Zhixiong Chen created GOBBLIN-1146:
--

 Summary: Allow configuring autocommit in JDBCWriters
 Key: GOBBLIN-1146
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1146
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GOBBLIN-1142) Hive Distcp support filter on partitioned or snapshot table

2020-05-05 Thread Zhixiong Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-1142:
---
Description: The change adds support filtering a specific type of tables, 
e.g snapshot, partitioned, in `HiveDatasetFinder`  (was: The change adds 
support filtering a specific type of tables, e.g snapshot, partitioned, in 
`HiveDatasetFinder` for distcp)

> Hive Distcp support filter on partitioned or snapshot table
> ---
>
> Key: GOBBLIN-1142
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1142
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> The change adds support filtering a specific type of tables, e.g snapshot, 
> partitioned, in `HiveDatasetFinder`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GOBBLIN-1142) Hive Distcp support filter on partitioned or snapshot table

2020-05-05 Thread Zhixiong Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-1142:
---
Description: The change adds support filtering a specific type of tables, 
e.g snapshot, partitioned, in `HiveDatasetFinder` for distcp

> Hive Distcp support filter on partitioned or snapshot table
> ---
>
> Key: GOBBLIN-1142
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1142
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> The change adds support filtering a specific type of tables, e.g snapshot, 
> partitioned, in `HiveDatasetFinder` for distcp



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-1142) Hive Distcp support filter on partitioned or snapshot table

2020-05-05 Thread Zhixiong Chen (Jira)
Zhixiong Chen created GOBBLIN-1142:
--

 Summary: Hive Distcp support filter on partitioned or snapshot 
table
 Key: GOBBLIN-1142
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1142
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-1096) Work with DST change in compaction watermark

2020-03-24 Thread Zhixiong Chen (Jira)
Zhixiong Chen created GOBBLIN-1096:
--

 Summary: Work with DST change in compaction watermark
 Key: GOBBLIN-1096
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1096
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GOBBLIN-1066) field projection with namespace

2020-02-28 Thread Zhixiong Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-1066:
---
Description: `AvroProjectionConverter` currently ignores extract namespace 
to identify fields to remove for a table. The change is to identify fields to 
remove with namespace into account, configurable. 

> field projection with namespace
> ---
>
> Key: GOBBLIN-1066
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1066
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> `AvroProjectionConverter` currently ignores extract namespace to identify 
> fields to remove for a table. The change is to identify fields to remove with 
> namespace into account, configurable. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-1066) field projection with namespace

2020-02-28 Thread Zhixiong Chen (Jira)
Zhixiong Chen created GOBBLIN-1066:
--

 Summary: field projection with namespace
 Key: GOBBLIN-1066
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1066
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GOBBLIN-1056) Allow customizing client pool population in KafkaSource

2020-02-20 Thread Zhixiong Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-1056:
---
Description: Put existing logic of consumer client pool population into 
method `populateClientPool`, it allows the client created for the pool to carry 
additional information from the client created to fetch topics  (was: Put 
existing logic of consumer client pool population)

> Allow customizing client pool population in KafkaSource
> ---
>
> Key: GOBBLIN-1056
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1056
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> Put existing logic of consumer client pool population into method 
> `populateClientPool`, it allows the client created for the pool to carry 
> additional information from the client created to fetch topics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GOBBLIN-1056) Allow customizing client pool population in KafkaSource

2020-02-20 Thread Zhixiong Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-1056:
---
Summary: Allow customizing client pool population in KafkaSource  (was: 
Allow customizing client pool creation in KafkaSource)

> Allow customizing client pool population in KafkaSource
> ---
>
> Key: GOBBLIN-1056
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1056
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> Put existing logic of consumer client pool population



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GOBBLIN-1056) Allow customizing client pool creation in KafkaSource

2020-02-20 Thread Zhixiong Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-1056:
---
Description: Put existing logic of consumer client pool population

> Allow customizing client pool creation in KafkaSource
> -
>
> Key: GOBBLIN-1056
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1056
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> Put existing logic of consumer client pool population



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GOBBLIN-1056) Allow customizing client pool creation in KafkaSource

2020-02-20 Thread Zhixiong Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-1056:
---
Summary: Allow customizing client pool creation in KafkaSource  (was: Allow 
customize client pool creation in KafkaSource)

> Allow customizing client pool creation in KafkaSource
> -
>
> Key: GOBBLIN-1056
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1056
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-1056) Allow customize client pool creation in KafkaSource

2020-02-20 Thread Zhixiong Chen (Jira)
Zhixiong Chen created GOBBLIN-1056:
--

 Summary: Allow customize client pool creation in KafkaSource
 Key: GOBBLIN-1056
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1056
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen


StandardManifestRecord: A standard representation of a record from a service 
that gobblin reads or writes
DistributedClasspathManager: A class to add artifacts to the classpath of an MR 
job



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GOBBLIN-1056) Allow customize client pool creation in KafkaSource

2020-02-20 Thread Zhixiong Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-1056:
---
Description: (was: StandardManifestRecord: A standard representation of 
a record from a service that gobblin reads or writes
DistributedClasspathManager: A class to add artifacts to the classpath of an MR 
job)

> Allow customize client pool creation in KafkaSource
> ---
>
> Key: GOBBLIN-1056
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1056
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GOBBLIN-1045) Emit more events in compaction job

2020-02-10 Thread Zhixiong Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-1045:
---
Description: 
Emit count event for the following item in compaction job
- number of files, corresponding to hive metadata "numFiles"
- record count, corresponding to hive metadata "numRows"
- bytes written, corresponding to hive metadata "totalSize"

  was:
Emit count event for the following hive metadata in a compaction job
- numFiles
- numRows
- totalSize


> Emit more events in compaction job
> --
>
> Key: GOBBLIN-1045
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1045
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> Emit count event for the following item in compaction job
> - number of files, corresponding to hive metadata "numFiles"
> - record count, corresponding to hive metadata "numRows"
> - bytes written, corresponding to hive metadata "totalSize"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GOBBLIN-1045) Emit more events in compaction job

2020-02-10 Thread Zhixiong Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-1045:
---
Summary: Emit more events in compaction job  (was: Emit events for hive 
metadata in compaction job)

> Emit more events in compaction job
> --
>
> Key: GOBBLIN-1045
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1045
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> Emit count event for the following hive metadata in a compaction job
> - numFiles
> - numRows
> - totalSize



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GOBBLIN-1045) Emit events for hive metadata in compaction job

2020-02-10 Thread Zhixiong Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-1045:
---
Description: 
Emit count event for the following hive metadata in a compaction job
- numFiles
- numRows
- totalSize

> Emit events for hive metadata in compaction job
> ---
>
> Key: GOBBLIN-1045
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1045
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> Emit count event for the following hive metadata in a compaction job
> - numFiles
> - numRows
> - totalSize



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-1045) Emit events for hive metadata in compaction job

2020-02-10 Thread Zhixiong Chen (Jira)
Zhixiong Chen created GOBBLIN-1045:
--

 Summary: Emit events for hive metadata in compaction job
 Key: GOBBLIN-1045
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1045
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GOBBLIN-1012) Implement CompactionWithWatermarkSuite

2020-01-02 Thread Zhixiong Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-1012:
---
Description: 
- A `compactionWatermark` is a timestamp indicating the data we've seen up to 
that time is compacted. If we explicitly checked that we've seen all the data 
up to that time, the timestamp would be published also as 
`completionAndCompactionWatermark`.
 - `CompactionWithWatermarkSuite` reports compaction watermarks as part of the 
compaction pipeline and publishes watermarks as hive table properties

  was:`CompactionWithWatermarkSuite` will report compaction watermark as part 
of the compaction pipeline.


> Implement CompactionWithWatermarkSuite
> --
>
> Key: GOBBLIN-1012
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1012
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> - A `compactionWatermark` is a timestamp indicating the data we've seen up to 
> that time is compacted. If we explicitly checked that we've seen all the data 
> up to that time, the timestamp would be published also as 
> `completionAndCompactionWatermark`.
>  - `CompactionWithWatermarkSuite` reports compaction watermarks as part of 
> the compaction pipeline and publishes watermarks as hive table properties



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GOBBLIN-1012) Implement CompactionWithWatermarkSuite

2020-01-02 Thread Zhixiong Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-1012:
---
Description: `CompactionWithWatermarkSuite` will report compaction 
watermark as part of the compaction pipeline.  (was: StandardManifestRecord: A 
standard representation of a record from a service that gobblin reads or writes
DistributedClasspathManager: A class to add artifacts to the classpath of an MR 
job)

> Implement CompactionWithWatermarkSuite
> --
>
> Key: GOBBLIN-1012
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1012
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> `CompactionWithWatermarkSuite` will report compaction watermark as part of 
> the compaction pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-1012) Implement CompactionWithWatermarkSuite

2020-01-02 Thread Zhixiong Chen (Jira)
Zhixiong Chen created GOBBLIN-1012:
--

 Summary: Implement CompactionWithWatermarkSuite
 Key: GOBBLIN-1012
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1012
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen


StandardManifestRecord: A standard representation of a record from a service 
that gobblin reads or writes
DistributedClasspathManager: A class to add artifacts to the classpath of an MR 
job



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GOBBLIN-1011) Adjust compaction flow to work with virtual partition

2019-12-20 Thread Zhixiong Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-1011:
---
Description: 
- Update existing `CompactionVerifier`s and `CompactionCompleteAction`s to work 
with virtual simple file system dataset proply
- Improve ser/de of `FileSystemDataset` in `CompactionSuiteBase`
- Update gobblin-hive-registration to work with table parameters properly



> Adjust compaction flow to work with virtual partition
> -
>
> Key: GOBBLIN-1011
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1011
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> - Update existing `CompactionVerifier`s and `CompactionCompleteAction`s to 
> work with virtual simple file system dataset proply
> - Improve ser/de of `FileSystemDataset` in `CompactionSuiteBase`
> - Update gobblin-hive-registration to work with table parameters properly



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-1011) Adjust compaction flow to work with virtual partition

2019-12-20 Thread Zhixiong Chen (Jira)
Zhixiong Chen created GOBBLIN-1011:
--

 Summary: Adjust compaction flow to work with virtual partition
 Key: GOBBLIN-1011
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1011
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-1001) Implement TimePartitionGlobFinder

2019-12-10 Thread Zhixiong Chen (Jira)
Zhixiong Chen created GOBBLIN-1001:
--

 Summary: Implement TimePartitionGlobFinder
 Key: GOBBLIN-1001
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1001
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-993) Support job level hive configuration override

2019-12-04 Thread Zhixiong Chen (Jira)
Zhixiong Chen created GOBBLIN-993:
-

 Summary: Support job level hive configuration override
 Key: GOBBLIN-993
 URL: https://issues.apache.org/jira/browse/GOBBLIN-993
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zhixiong Chen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-896) Clone schema or field props in AvroFieldRemover

2019-10-03 Thread Zhixiong Chen (Jira)
Zhixiong Chen created GOBBLIN-896:
-

 Summary: Clone schema or field props in AvroFieldRemover
 Key: GOBBLIN-896
 URL: https://issues.apache.org/jira/browse/GOBBLIN-896
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zhixiong Chen


Currently, `AvroFieldRemover` ignores schema and field level properties while 
cloning the schema and its fields. The change is to fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-831) Fix NPE in KafkaWorkUnitPacker when there is no WorkUnit created

2019-07-18 Thread Zhixiong Chen (JIRA)
Zhixiong Chen created GOBBLIN-831:
-

 Summary: Fix NPE in KafkaWorkUnitPacker when there is no WorkUnit 
created
 Key: GOBBLIN-831
 URL: https://issues.apache.org/jira/browse/GOBBLIN-831
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (GOBBLIN-827) Add more events

2019-07-15 Thread Zhixiong Chen (JIRA)
Zhixiong Chen created GOBBLIN-827:
-

 Summary: Add more events
 Key: GOBBLIN-827
 URL: https://issues.apache.org/jira/browse/GOBBLIN-827
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zhixiong Chen


Add the following events
- `JobStateEventBuilder` to report gobblin job state or MR job state
- `EntityMissingEventBuilder` to report a missing instance of a certain entity



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (GOBBLIN-769) Support string record timestamp in TimeBasedAvroWriterPartitioner

2019-05-13 Thread Zhixiong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-769:
--
Summary: Support string record timestamp in TimeBasedAvroWriterPartitioner  
(was: Support record timestamp as string in TimeBasedWriterPartitioner)

> Support string record timestamp in TimeBasedAvroWriterPartitioner
> -
>
> Key: GOBBLIN-769
> URL: https://issues.apache.org/jira/browse/GOBBLIN-769
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-769) Support string record timestamp in TimeBasedAvroWriterPartitioner

2019-05-13 Thread Zhixiong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-769:
--
Description: Currently, if a record timestamp is a string, 
`TimeBasedAvroWriterPartitioner` will not be able to recognize it and will use 
current time

> Support string record timestamp in TimeBasedAvroWriterPartitioner
> -
>
> Key: GOBBLIN-769
> URL: https://issues.apache.org/jira/browse/GOBBLIN-769
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> Currently, if a record timestamp is a string, 
> `TimeBasedAvroWriterPartitioner` will not be able to recognize it and will 
> use current time



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-767) Support different time units in TimeBasedWriterPartitioner

2019-05-10 Thread Zhixiong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-767:
--
Description: Currently, `TimeBasedWriterPartitioner` assumes the timestamp 
value from a record is in millis. The task is to remove the assumption and 
support timestamp in different units, by default, in millis.

> Support different time units in TimeBasedWriterPartitioner
> --
>
> Key: GOBBLIN-767
> URL: https://issues.apache.org/jira/browse/GOBBLIN-767
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> Currently, `TimeBasedWriterPartitioner` assumes the timestamp value from a 
> record is in millis. The task is to remove the assumption and support 
> timestamp in different units, by default, in millis.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-767) Support different time units in TimeBasedWriterPartitioner

2019-05-10 Thread Zhixiong Chen (JIRA)
Zhixiong Chen created GOBBLIN-767:
-

 Summary: Support different time units in TimeBasedWriterPartitioner
 Key: GOBBLIN-767
 URL: https://issues.apache.org/jira/browse/GOBBLIN-767
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-763) Fix incorrect AvroUtils.removeUncomparableFields implementation

2019-05-02 Thread Zhixiong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-763:
--
Description: 
- Remove fields, specified by configuration 
`compaction.job.key.fieldBlacklist`, while computing compaction dedup key schema
- Fix incorrect `AvroUtils.removeUncomparableFields` implementation, which only 
keeps the first field of any schema, dropping all other fields which have the 
same schema. 

  was:Currently, `AvroUtils.removeUncomparableFields` will only keep the first 
field of any schema, dropping all other fields which have the same schema. 


> Fix incorrect AvroUtils.removeUncomparableFields implementation
> ---
>
> Key: GOBBLIN-763
> URL: https://issues.apache.org/jira/browse/GOBBLIN-763
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> - Remove fields, specified by configuration 
> `compaction.job.key.fieldBlacklist`, while computing compaction dedup key 
> schema
> - Fix incorrect `AvroUtils.removeUncomparableFields` implementation, which 
> only keeps the first field of any schema, dropping all other fields which 
> have the same schema. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-763) Support fields removal for compaction dedup key schema

2019-05-02 Thread Zhixiong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-763:
--
Summary: Support fields removal for compaction dedup key schema  (was: Fix 
incorrect AvroUtils.removeUncomparableFields implementation)

> Support fields removal for compaction dedup key schema
> --
>
> Key: GOBBLIN-763
> URL: https://issues.apache.org/jira/browse/GOBBLIN-763
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> - Remove fields, specified by configuration 
> `compaction.job.key.fieldBlacklist`, while computing compaction dedup key 
> schema
> - Fix incorrect `AvroUtils.removeUncomparableFields` implementation, which 
> only keeps the first field of any schema, dropping all other fields which 
> have the same schema. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-763) Fix incorrect AvroUtils.removeUncomparableFields implementation

2019-05-02 Thread Zhixiong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-763:
--
Description: Currently, `AvroUtils.removeUncomparableFields` will only keep 
the first field of any schema, dropping all other fields which have the same 
schema.   (was: Currently, )

> Fix incorrect AvroUtils.removeUncomparableFields implementation
> ---
>
> Key: GOBBLIN-763
> URL: https://issues.apache.org/jira/browse/GOBBLIN-763
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> Currently, `AvroUtils.removeUncomparableFields` will only keep the first 
> field of any schema, dropping all other fields which have the same schema. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-763) Fix incorrect removeUncomparableFields implementation in AvroUtils

2019-05-02 Thread Zhixiong Chen (JIRA)
Zhixiong Chen created GOBBLIN-763:
-

 Summary: Fix incorrect removeUncomparableFields implementation in 
AvroUtils
 Key: GOBBLIN-763
 URL: https://issues.apache.org/jira/browse/GOBBLIN-763
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen


StandardManifestRecord: A standard representation of a record from a service 
that gobblin reads or writes
DistributedClasspathManager: A class to add artifacts to the classpath of an MR 
job



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-763) Fix incorrect AvroUtils.removeUncomparableFields implementation

2019-05-02 Thread Zhixiong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-763:
--
Summary: Fix incorrect AvroUtils.removeUncomparableFields implementation  
(was: Fix incorrect removeUncomparableFields implementation in AvroUtils)

> Fix incorrect AvroUtils.removeUncomparableFields implementation
> ---
>
> Key: GOBBLIN-763
> URL: https://issues.apache.org/jira/browse/GOBBLIN-763
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> Currently, 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-763) Fix incorrect removeUncomparableFields implementation in AvroUtils

2019-05-02 Thread Zhixiong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-763:
--
Description: Currently,   (was: StandardManifestRecord: A standard 
representation of a record from a service that gobblin reads or writes
DistributedClasspathManager: A class to add artifacts to the classpath of an MR 
job)

> Fix incorrect removeUncomparableFields implementation in AvroUtils
> --
>
> Key: GOBBLIN-763
> URL: https://issues.apache.org/jira/browse/GOBBLIN-763
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> Currently, 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-738) Open a way to customize decoding KafkaConsumerRecord

2019-04-15 Thread Zhixiong Chen (JIRA)
Zhixiong Chen created GOBBLIN-738:
-

 Summary: Open a way to customize decoding KafkaConsumerRecord
 Key: GOBBLIN-738
 URL: https://issues.apache.org/jira/browse/GOBBLIN-738
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zhixiong Chen


Currently, decoding a `KafkaConsumerRecord` is limited to 2 forms:
  - decode as a `ByteArrayBasedKafkaRecord` message
  - convert value from a `DecodeableKafkaRecord` message

The task is to open a way for arbitrary decoding mechanism



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-716) Add lineage in FileBasedSource

2019-03-27 Thread Zhixiong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-716:
--
Description: 
Add lineage in `FileBasedSource`
- By default, `FileBasedSource` marks dataset level source lineage
- A `PartitionedFileSourceBase` marks partition level source lineage

Fix destinations overwritten in `LineageInfo.putDestination(List 
descriptors, int branchId, State state)`. Multiple calls should append given 
descriptors

> Add lineage in FileBasedSource
> --
>
> Key: GOBBLIN-716
> URL: https://issues.apache.org/jira/browse/GOBBLIN-716
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> Add lineage in `FileBasedSource`
> - By default, `FileBasedSource` marks dataset level source lineage
> - A `PartitionedFileSourceBase` marks partition level source lineage
> Fix destinations overwritten in `LineageInfo.putDestination(List 
> descriptors, int branchId, State state)`. Multiple calls should append given 
> descriptors



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-716) Add lineage in FileBasedSource

2019-03-27 Thread Zhixiong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-716:
--
Summary: Add lineage in FileBasedSource  (was: Add FileBasedSource lineage 
event)

> Add lineage in FileBasedSource
> --
>
> Key: GOBBLIN-716
> URL: https://issues.apache.org/jira/browse/GOBBLIN-716
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-716) Add FileBasedSource lineage event

2019-03-27 Thread Zhixiong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-716:
--
Description: (was: It'd be useful to support configuration properties 
to override the default username when connecting to a HDFS cluster, e.g. in the 
HDFS writers.  The system username that owns the Gobblin process is used by 
default.

One particular use case for this is for stand-alone Gobblin instances running 
as the `root` system user within a Docker container.  Individual users within 
an organization employing a stand-alone Gobblin cluster for data integration 
needs across multiple teams may have multiple users submitting jobs meant to 
touch different parts of the HDFS namespace under the control of separate users.

Note that this feature is not quite security-relevant, as this would still 
allow any job configuration file to specify any username, so there aren't any 
enforced privilege boundaries anyway.

One solution that does not appear to work is to specify the `hadoop.job.ugi` 
property in a job configuration file, despite what this appears to suggest in 
[FsDataWriter.java](https://github.com/linkedin/gobblin/blob/7141ec88c255c8c3cbc7054fb8146eebe77fc09d/gobblin-core/src/main/java/gobblin/writer/FsDataWriter.java#L88-L91):

```java
Configuration conf = new Configuration();
// Add all job configuration properties so they are picked up by Hadoop
JobConfigurationUtils.putStateIntoConfiguration(properties, conf);
this.fs = WriterUtils.getWriterFS(properties, this.numBranches, 
this.branchId);
```
 
*Github Url* : https://github.com/linkedin/gobblin/issues/1904 
*Github Reporter* : *mgomezch* 
*Github Created At* : 2017-05-26T18:58:16Z 
*Github Updated At* : 2017-05-26T18:58:16Z)

> Add FileBasedSource lineage event
> -
>
> Key: GOBBLIN-716
> URL: https://issues.apache.org/jira/browse/GOBBLIN-716
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-716) Add FileBasedSource lineage event

2019-03-27 Thread Zhixiong Chen (JIRA)
Zhixiong Chen created GOBBLIN-716:
-

 Summary: Add FileBasedSource lineage event
 Key: GOBBLIN-716
 URL: https://issues.apache.org/jira/browse/GOBBLIN-716
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen


It'd be useful to support configuration properties to override the default 
username when connecting to a HDFS cluster, e.g. in the HDFS writers.  The 
system username that owns the Gobblin process is used by default.

One particular use case for this is for stand-alone Gobblin instances running 
as the `root` system user within a Docker container.  Individual users within 
an organization employing a stand-alone Gobblin cluster for data integration 
needs across multiple teams may have multiple users submitting jobs meant to 
touch different parts of the HDFS namespace under the control of separate users.

Note that this feature is not quite security-relevant, as this would still 
allow any job configuration file to specify any username, so there aren't any 
enforced privilege boundaries anyway.

One solution that does not appear to work is to specify the `hadoop.job.ugi` 
property in a job configuration file, despite what this appears to suggest in 
[FsDataWriter.java](https://github.com/linkedin/gobblin/blob/7141ec88c255c8c3cbc7054fb8146eebe77fc09d/gobblin-core/src/main/java/gobblin/writer/FsDataWriter.java#L88-L91):

```java
Configuration conf = new Configuration();
// Add all job configuration properties so they are picked up by Hadoop
JobConfigurationUtils.putStateIntoConfiguration(properties, conf);
this.fs = WriterUtils.getWriterFS(properties, this.numBranches, 
this.branchId);
```
 
*Github Url* : https://github.com/linkedin/gobblin/issues/1904 
*Github Reporter* : *mgomezch* 
*Github Created At* : 2017-05-26T18:58:16Z 
*Github Updated At* : 2017-05-26T18:58:16Z



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-716) Add FileBasedSource lineage event

2019-03-27 Thread Zhixiong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-716:
--
Issue Type: Task  (was: Bug)

> Add FileBasedSource lineage event
> -
>
> Key: GOBBLIN-716
> URL: https://issues.apache.org/jira/browse/GOBBLIN-716
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> It'd be useful to support configuration properties to override the default 
> username when connecting to a HDFS cluster, e.g. in the HDFS writers.  The 
> system username that owns the Gobblin process is used by default.
> One particular use case for this is for stand-alone Gobblin instances running 
> as the `root` system user within a Docker container.  Individual users within 
> an organization employing a stand-alone Gobblin cluster for data integration 
> needs across multiple teams may have multiple users submitting jobs meant to 
> touch different parts of the HDFS namespace under the control of separate 
> users.
> Note that this feature is not quite security-relevant, as this would still 
> allow any job configuration file to specify any username, so there aren't any 
> enforced privilege boundaries anyway.
> One solution that does not appear to work is to specify the `hadoop.job.ugi` 
> property in a job configuration file, despite what this appears to suggest in 
> [FsDataWriter.java](https://github.com/linkedin/gobblin/blob/7141ec88c255c8c3cbc7054fb8146eebe77fc09d/gobblin-core/src/main/java/gobblin/writer/FsDataWriter.java#L88-L91):
> ```java
> Configuration conf = new Configuration();
> // Add all job configuration properties so they are picked up by Hadoop
> JobConfigurationUtils.putStateIntoConfiguration(properties, conf);
> this.fs = WriterUtils.getWriterFS(properties, this.numBranches, 
> this.branchId);
> ```
>  
> *Github Url* : https://github.com/linkedin/gobblin/issues/1904 
> *Github Reporter* : *mgomezch* 
> *Github Created At* : 2017-05-26T18:58:16Z 
> *Github Updated At* : 2017-05-26T18:58:16Z



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-716) Add FileBasedSource lineage event

2019-03-27 Thread Zhixiong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-716:
--
External issue URL:   (was: https://github.com/linkedin/gobblin/issues/1904)

> Add FileBasedSource lineage event
> -
>
> Key: GOBBLIN-716
> URL: https://issues.apache.org/jira/browse/GOBBLIN-716
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> It'd be useful to support configuration properties to override the default 
> username when connecting to a HDFS cluster, e.g. in the HDFS writers.  The 
> system username that owns the Gobblin process is used by default.
> One particular use case for this is for stand-alone Gobblin instances running 
> as the `root` system user within a Docker container.  Individual users within 
> an organization employing a stand-alone Gobblin cluster for data integration 
> needs across multiple teams may have multiple users submitting jobs meant to 
> touch different parts of the HDFS namespace under the control of separate 
> users.
> Note that this feature is not quite security-relevant, as this would still 
> allow any job configuration file to specify any username, so there aren't any 
> enforced privilege boundaries anyway.
> One solution that does not appear to work is to specify the `hadoop.job.ugi` 
> property in a job configuration file, despite what this appears to suggest in 
> [FsDataWriter.java](https://github.com/linkedin/gobblin/blob/7141ec88c255c8c3cbc7054fb8146eebe77fc09d/gobblin-core/src/main/java/gobblin/writer/FsDataWriter.java#L88-L91):
> ```java
> Configuration conf = new Configuration();
> // Add all job configuration properties so they are picked up by Hadoop
> JobConfigurationUtils.putStateIntoConfiguration(properties, conf);
> this.fs = WriterUtils.getWriterFS(properties, this.numBranches, 
> this.branchId);
> ```
>  
> *Github Url* : https://github.com/linkedin/gobblin/issues/1904 
> *Github Reporter* : *mgomezch* 
> *Github Created At* : 2017-05-26T18:58:16Z 
> *Github Updated At* : 2017-05-26T18:58:16Z



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-543) Opensource StandardManifestRecord and DistributedClasspathManager

2018-12-21 Thread Zhixiong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-543:
--
Description: 
StandardManifestRecord: A standard representation of a record from a service 
that gobblin reads or writes
DistributedClasspathManager: A class to add artifacts to the classpath of an MR 
job

  was:
StandardManifestRecord: A standardized record that represents an input record 
for Gobblin
DistributedClasspathManager: A class to add artifacts to the classpath of an MR 
job


> Opensource StandardManifestRecord and DistributedClasspathManager
> -
>
> Key: GOBBLIN-543
> URL: https://issues.apache.org/jira/browse/GOBBLIN-543
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> StandardManifestRecord: A standard representation of a record from a service 
> that gobblin reads or writes
> DistributedClasspathManager: A class to add artifacts to the classpath of an 
> MR job



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-543) Opensource distributedClasspathManager

2018-12-21 Thread Zhixiong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-543:
--
Summary: Opensource distributedClasspathManager  (was: 
DistributedClasspathManager)

> Opensource distributedClasspathManager
> --
>
> Key: GOBBLIN-543
> URL: https://issues.apache.org/jira/browse/GOBBLIN-543
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> A class to add artifacts to the classpath of an MR job



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-543) Opensource DistributedClasspathManager

2018-12-21 Thread Zhixiong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-543:
--
Summary: Opensource DistributedClasspathManager  (was: Opensource 
distributedClasspathManager)

> Opensource DistributedClasspathManager
> --
>
> Key: GOBBLIN-543
> URL: https://issues.apache.org/jira/browse/GOBBLIN-543
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> A class to add artifacts to the classpath of an MR job



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-621) Add utilities

2018-10-26 Thread Zhixiong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-621:
--
Description: 
- `TestIOUtils.readAllRecords`: read json data as `GenericRecord` with avro 
schema
- `FileListUtils.getAnyNonHiddenFile` finds the first non-hidden file under a 
given path in a file system

  was:
- `readAllRecords`: read json data as `GenericRecord` with avro schema
- `getAnyNonHiddenFile` finds the first non-hidden file under a given path in a 
file system


> Add utilities
> -
>
> Key: GOBBLIN-621
> URL: https://issues.apache.org/jira/browse/GOBBLIN-621
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> - `TestIOUtils.readAllRecords`: read json data as `GenericRecord` with avro 
> schema
> - `FileListUtils.getAnyNonHiddenFile` finds the first non-hidden file under a 
> given path in a file system



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-621) Add utilities

2018-10-26 Thread Zhixiong Chen (JIRA)
Zhixiong Chen created GOBBLIN-621:
-

 Summary: Add utilities
 Key: GOBBLIN-621
 URL: https://issues.apache.org/jira/browse/GOBBLIN-621
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen


- `readAllRecords`: read json data as `GenericRecord` with avro schema
- `getAnyNonHiddenFile` finds the first non-hidden file under a given path in a 
file system



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (GOBBLIN-565) Implement partition level lineage event for job using TimePartitionedDataPublisher

2018-10-26 Thread Zhixiong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen closed GOBBLIN-565.
-
Resolution: Fixed

> Implement partition level lineage event for job using 
> TimePartitionedDataPublisher
> --
>
> Key: GOBBLIN-565
> URL: https://issues.apache.org/jira/browse/GOBBLIN-565
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> Currently Gobblin reports dataset level lineage, for example, it reports 
> lineage from `(kafka:topic1)` to `(hdfs:/data/tracking/PageViewEvent)`. The 
> task is to report lineage from `(kafka:topic1)` to 
> `(hdfs:/data/tracking/PageViewEvent, hourly/2018/08/15/16)` where 
> `hourly/2018/08/15/16` is a partition



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-587) Implement partition level lineage for fs based destination

2018-09-12 Thread Zhixiong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-587:
--
Summary: Implement partition level lineage for fs based destination  (was: 
Implement gobblin fs sink partition level lineage)

> Implement partition level lineage for fs based destination
> --
>
> Key: GOBBLIN-587
> URL: https://issues.apache.org/jira/browse/GOBBLIN-587
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> Currently, gobblin lineage is sent at dataset level. The task is to send 
> partition level lineage for fs sink. An example kafka-hdfs partition lineage 
> is
> {code:java}
> {
>   "timestamp": 1536785248451,
>   "namespace": {
> "string": "gobblin.event.lineage"
>   },
>   "name": "LoginEvent",
>   "metadata": {
> "destination": 
> "{\"object-type\":\"org.apache.gobblin.dataset.PartitionDescriptor\",\"object-data\":{\"dataset\":{\"object-type\":\"org.apache.gobblin.dataset.DatasetDescriptor\",\"object-data\":{\"platform\":\"hdfs\",\"metadata\":{\"branch\":\"0\"},\"name\":\"/data/tracking/LoginEvent\"}},\"name\":\"hourly/2018/09/12/12\"}}",
> "eventType": "LineageEvent",
> "source": 
> "{\"object-type\":\"org.apache.gobblin.dataset.DatasetDescriptor\",\"object-data\":{\"platform\":\"kafka\",\"metadata\":{},\"name\":\"LoginEvent\"}}",
> "metricContextName": 
> "org.apache.gobblin.runtime.SafeDatasetCommit.1693032310",
> "metricContextID": "1a7895b0-9e93-414e-ac0b-038f9375c82e",
> "class": "org.apache.gobblin.runtime.SafeDatasetCommit",
>   }
> }
> {code}
> {color:#d04437}*Note*{color}: Lineage is not available automatically. You 
> might have to implement the support in your source-destination pair.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-587) Implement gobblin fs sink partition level lineage

2018-09-12 Thread Zhixiong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-587:
--
Description: 
Currently, gobblin lineage is sent at dataset level. The task is to send 
partition level lineage for fs sink. An example kafka-hdfs partition lineage is

{code:java}
{
  "timestamp": 1536785248451,
  "namespace": {
"string": "gobblin.event.lineage"
  },
  "name": "LoginEvent",
  "metadata": {
"destination": 
"{\"object-type\":\"org.apache.gobblin.dataset.PartitionDescriptor\",\"object-data\":{\"dataset\":{\"object-type\":\"org.apache.gobblin.dataset.DatasetDescriptor\",\"object-data\":{\"platform\":\"hdfs\",\"metadata\":{\"branch\":\"0\"},\"name\":\"/data/tracking/LoginEvent\"}},\"name\":\"hourly/2018/09/12/12\"}}",
"eventType": "LineageEvent",
"source": 
"{\"object-type\":\"org.apache.gobblin.dataset.DatasetDescriptor\",\"object-data\":{\"platform\":\"kafka\",\"metadata\":{},\"name\":\"LoginEvent\"}}",
"metricContextName": 
"org.apache.gobblin.runtime.SafeDatasetCommit.1693032310",
"metricContextID": "1a7895b0-9e93-414e-ac0b-038f9375c82e",
"class": "org.apache.gobblin.runtime.SafeDatasetCommit",
  }
}
{code}

{color:#d04437}*Note*{color}: Lineage is not available automatically. You might 
have to implement the support in your source-destination pair.


  was:
Currently, gobblin lineage is sent at dataset level. The task is to send 
partition level lineage for fs sink. An example kafka-hdfs partition lineage is

{code:java}
{
  "timestamp": 1536785248451,
  "namespace": {
"string": "gobblin.event.lineage"
  },
  "name": "LoginEvent",
  "metadata": {
"destination": 
"{\"object-type\":\"org.apache.gobblin.dataset.PartitionDescriptor\",\"object-data\":{\"dataset\":{\"object-type\":\"org.apache.gobblin.dataset.DatasetDescriptor\",\"object-data\":{\"platform\":\"hdfs\",\"metadata\":{\"branch\":\"0\"},\"name\":\"/data/tracking/LoginEvent\"}},\"name\":\"hourly/2018/09/12/12\"}}",
"eventType": "LineageEvent",
"source": 
"{\"object-type\":\"org.apache.gobblin.dataset.DatasetDescriptor\",\"object-data\":{\"platform\":\"kafka\",\"metadata\":{},\"name\":\"LoginEvent\"}}",
"metricContextName": 
"org.apache.gobblin.runtime.SafeDatasetCommit.1693032310",
"metricContextID": "1a7895b0-9e93-414e-ac0b-038f9375c82e",
"class": "org.apache.gobblin.runtime.SafeDatasetCommit",
  }
}
{code}

Note: Lineage is not available automatically. You might have to implement the 
support in your source-destination pair.



> Implement gobblin fs sink partition level lineage
> -
>
> Key: GOBBLIN-587
> URL: https://issues.apache.org/jira/browse/GOBBLIN-587
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> Currently, gobblin lineage is sent at dataset level. The task is to send 
> partition level lineage for fs sink. An example kafka-hdfs partition lineage 
> is
> {code:java}
> {
>   "timestamp": 1536785248451,
>   "namespace": {
> "string": "gobblin.event.lineage"
>   },
>   "name": "LoginEvent",
>   "metadata": {
> "destination": 
> "{\"object-type\":\"org.apache.gobblin.dataset.PartitionDescriptor\",\"object-data\":{\"dataset\":{\"object-type\":\"org.apache.gobblin.dataset.DatasetDescriptor\",\"object-data\":{\"platform\":\"hdfs\",\"metadata\":{\"branch\":\"0\"},\"name\":\"/data/tracking/LoginEvent\"}},\"name\":\"hourly/2018/09/12/12\"}}",
> "eventType": "LineageEvent",
> "source": 
> "{\"object-type\":\"org.apache.gobblin.dataset.DatasetDescriptor\",\"object-data\":{\"platform\":\"kafka\",\"metadata\":{},\"name\":\"LoginEvent\"}}",
> "metricContextName": 
> "org.apache.gobblin.runtime.SafeDatasetCommit.1693032310",
> "metricContextID": "1a7895b0-9e93-414e-ac0b-038f9375c82e",
> "class": "org.apache.gobblin.runtime.SafeDatasetCommit",
>   }
> }
> {code}
> {color:#d04437}*Note*{color}: Lineage is not available automatically. You 
> might have to implement the support in your source-destination pair.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-587) Implement gobblin fs sink partition level lineage

2018-09-12 Thread Zhixiong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-587:
--
Description: 
Currently, gobblin lineage is sent at dataset level. The task is to send 
partition level lineage for fs sink. An example kafka-hdfs partition lineage is

{code:java}
{
  "timestamp": 1536785248451,
  "namespace": {
"string": "gobblin.event.lineage"
  },
  "name": "LoginEvent",
  "metadata": {
"destination": 
"{\"object-type\":\"org.apache.gobblin.dataset.PartitionDescriptor\",\"object-data\":{\"dataset\":{\"object-type\":\"org.apache.gobblin.dataset.DatasetDescriptor\",\"object-data\":{\"platform\":\"hdfs\",\"metadata\":{\"branch\":\"0\"},\"name\":\"/data/tracking/LoginEvent\"}},\"name\":\"hourly/2018/09/12/12\"}}",
"eventType": "LineageEvent",
"source": 
"{\"object-type\":\"org.apache.gobblin.dataset.DatasetDescriptor\",\"object-data\":{\"platform\":\"kafka\",\"metadata\":{},\"name\":\"LoginEvent\"}}",
"metricContextName": 
"org.apache.gobblin.runtime.SafeDatasetCommit.1693032310",
"metricContextID": "1a7895b0-9e93-414e-ac0b-038f9375c82e",
"class": "org.apache.gobblin.runtime.SafeDatasetCommit",
  }
}
{code}

Note: Lineage is not available automatically. You might have to implement the 
support in your source-destination pair.


  was:
Currently, gobblin lineage is sent at dataset level. The task is to send 
partition level lineage for fs sink. An example kafka-hdfs partition lineage is

{code:java}
{
  "timestamp": 1536785248451,
  "namespace": {
"string": "gobblin.event.lineage"
  },
  "name": "LoginEvent",
  "metadata": {
"destination": 
"{\"object-type\":\"org.apache.gobblin.dataset.PartitionDescriptor\",\"object-data\":{\"dataset\":{\"object-type\":\"org.apache.gobblin.dataset.DatasetDescriptor\",\"object-data\":{\"platform\":\"hdfs\",\"metadata\":{\"branch\":\"0\"},\"name\":\"/tmp/zhchen/data/tracking/LoginEvent\"}},\"name\":\"hourly/2018/09/12/12\"}}",
"eventType": "LineageEvent",
"source": 
"{\"object-type\":\"org.apache.gobblin.dataset.DatasetDescriptor\",\"object-data\":{\"platform\":\"kafka\",\"metadata\":{},\"name\":\"LoginEvent\"}}",
"metricContextName": 
"org.apache.gobblin.runtime.SafeDatasetCommit.1693032310",
"metricContextID": "1a7895b0-9e93-414e-ac0b-038f9375c82e",
"class": "org.apache.gobblin.runtime.SafeDatasetCommit",
  }
}
{code}



> Implement gobblin fs sink partition level lineage
> -
>
> Key: GOBBLIN-587
> URL: https://issues.apache.org/jira/browse/GOBBLIN-587
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> Currently, gobblin lineage is sent at dataset level. The task is to send 
> partition level lineage for fs sink. An example kafka-hdfs partition lineage 
> is
> {code:java}
> {
>   "timestamp": 1536785248451,
>   "namespace": {
> "string": "gobblin.event.lineage"
>   },
>   "name": "LoginEvent",
>   "metadata": {
> "destination": 
> "{\"object-type\":\"org.apache.gobblin.dataset.PartitionDescriptor\",\"object-data\":{\"dataset\":{\"object-type\":\"org.apache.gobblin.dataset.DatasetDescriptor\",\"object-data\":{\"platform\":\"hdfs\",\"metadata\":{\"branch\":\"0\"},\"name\":\"/data/tracking/LoginEvent\"}},\"name\":\"hourly/2018/09/12/12\"}}",
> "eventType": "LineageEvent",
> "source": 
> "{\"object-type\":\"org.apache.gobblin.dataset.DatasetDescriptor\",\"object-data\":{\"platform\":\"kafka\",\"metadata\":{},\"name\":\"LoginEvent\"}}",
> "metricContextName": 
> "org.apache.gobblin.runtime.SafeDatasetCommit.1693032310",
> "metricContextID": "1a7895b0-9e93-414e-ac0b-038f9375c82e",
> "class": "org.apache.gobblin.runtime.SafeDatasetCommit",
>   }
> }
> {code}
> Note: Lineage is not available automatically. You might have to implement the 
> support in your source-destination pair.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-587) Implement gobblin fs sink partition level lineage

2018-09-12 Thread Zhixiong Chen (JIRA)
Zhixiong Chen created GOBBLIN-587:
-

 Summary: Implement gobblin fs sink partition level lineage
 Key: GOBBLIN-587
 URL: https://issues.apache.org/jira/browse/GOBBLIN-587
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen


Currently, gobblin lineage is sent at dataset level. The task is to send 
partition level lineage for fs sink. An example kafka-hdfs partition lineage is

{code:java}
{
  "timestamp": 1536785248451,
  "namespace": {
"string": "gobblin.event.lineage"
  },
  "name": "LoginEvent",
  "metadata": {
"destination": 
"{\"object-type\":\"org.apache.gobblin.dataset.PartitionDescriptor\",\"object-data\":{\"dataset\":{\"object-type\":\"org.apache.gobblin.dataset.DatasetDescriptor\",\"object-data\":{\"platform\":\"hdfs\",\"metadata\":{\"branch\":\"0\"},\"name\":\"/tmp/zhchen/data/tracking/LoginEvent\"}},\"name\":\"hourly/2018/09/12/12\"}}",
"eventType": "LineageEvent",
"source": 
"{\"object-type\":\"org.apache.gobblin.dataset.DatasetDescriptor\",\"object-data\":{\"platform\":\"kafka\",\"metadata\":{},\"name\":\"LoginEvent\"}}",
"metricContextName": 
"org.apache.gobblin.runtime.SafeDatasetCommit.1693032310",
"metricContextID": "1a7895b0-9e93-414e-ac0b-038f9375c82e",
"class": "org.apache.gobblin.runtime.SafeDatasetCommit",
  }
}
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-576) Send partition level lineage in hive distcp

2018-09-05 Thread Zhixiong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-576:
--
Description: Currently hive distcp only supports dataset/table level 
lineage. The task is to send lineage at the table partition level if any.

> Send partition level lineage in hive distcp
> ---
>
> Key: GOBBLIN-576
> URL: https://issues.apache.org/jira/browse/GOBBLIN-576
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> Currently hive distcp only supports dataset/table level lineage. The task is 
> to send lineage at the table partition level if any.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-576) Send partition level lineage in hive distcp

2018-09-05 Thread Zhixiong Chen (JIRA)
Zhixiong Chen created GOBBLIN-576:
-

 Summary: Send partition level lineage in hive distcp
 Key: GOBBLIN-576
 URL: https://issues.apache.org/jira/browse/GOBBLIN-576
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-565) Implement partition level lineage event for job using TimePartitionedDataPublisher

2018-08-15 Thread Zhixiong Chen (JIRA)
Zhixiong Chen created GOBBLIN-565:
-

 Summary: Implement partition level lineage event for job using 
TimePartitionedDataPublisher
 Key: GOBBLIN-565
 URL: https://issues.apache.org/jira/browse/GOBBLIN-565
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen


Currently Gobblin reports dataset level lineage, for example, it reports 
lineage from `(kafka:topic1)` to `(hdfs:/data/tracking/PageViewEvent)`. The 
task is to report lineage from `(kafka:topic1)` to 
`(hdfs:/data/tracking/PageViewEvent, hourly/2018/08/15/16)` where 
`hourly/2018/08/15/16` is a partition



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-543) DistributedClasspathManager

2018-07-26 Thread Zhixiong Chen (JIRA)
Zhixiong Chen created GOBBLIN-543:
-

 Summary: DistributedClasspathManager
 Key: GOBBLIN-543
 URL: https://issues.apache.org/jira/browse/GOBBLIN-543
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen


A class to add artifacts to the classpath of an MR job



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-517) Add missing apache license info

2018-06-26 Thread Zhixiong Chen (JIRA)
Zhixiong Chen created GOBBLIN-517:
-

 Summary: Add missing apache license info
 Key: GOBBLIN-517
 URL: https://issues.apache.org/jira/browse/GOBBLIN-517
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-489) Implement PusherFactory

2018-06-11 Thread Zhixiong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-489:
--
Summary: Implement PusherFactory  (was: Create PusherFactory)

> Implement PusherFactory
> ---
>
> Key: GOBBLIN-489
> URL: https://issues.apache.org/jira/browse/GOBBLIN-489
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> An `PusherFactory` creates a `Pusher`. Changes are:
>  * `PusherFactory` and gobblin scope specific factory 
> `GobblinScopePusherFactory`
>  * Load broker config from configurable multiple namespaces besides 
> `gobblin.broker`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-489) Implement PusherFactory

2018-06-11 Thread Zhixiong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-489:
--
Description: 
A `PusherFactory` creates a `Pusher`. Changes are:
 * `PusherFactory` and gobblin scope specific factory 
`GobblinScopePusherFactory`
 * Load broker config from configurable multiple namespaces besides 
`gobblin.broker`

  was:
An `PusherFactory` creates a `Pusher`. Changes are:
 * `PusherFactory` and gobblin scope specific factory 
`GobblinScopePusherFactory`
 * Load broker config from configurable multiple namespaces besides 
`gobblin.broker`


> Implement PusherFactory
> ---
>
> Key: GOBBLIN-489
> URL: https://issues.apache.org/jira/browse/GOBBLIN-489
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> A `PusherFactory` creates a `Pusher`. Changes are:
>  * `PusherFactory` and gobblin scope specific factory 
> `GobblinScopePusherFactory`
>  * Load broker config from configurable multiple namespaces besides 
> `gobblin.broker`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-489) Create PusherFactory

2018-06-11 Thread Zhixiong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-489:
--
Description: 
An `PusherFactory` creates a `Pusher`. Changes are:
 * `PusherFactory` and gobblin scope specific factory 
`GobblinScopePusherFactory`
 * Load broker config from configurable multiple namespaces besides 
`gobblin.broker`

  was:
An `EventProducer` produces an event and sends it out with a `Pusher`. The 
changes are:
 * An `EventProducer` class and its corresponding `EventProducerFactory` which 
creates a shared `EventProducer` instance
 * Load broker config from configurable multiple namespaces besides 
`gobblin.broker`


> Create PusherFactory
> 
>
> Key: GOBBLIN-489
> URL: https://issues.apache.org/jira/browse/GOBBLIN-489
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> An `PusherFactory` creates a `Pusher`. Changes are:
>  * `PusherFactory` and gobblin scope specific factory 
> `GobblinScopePusherFactory`
>  * Load broker config from configurable multiple namespaces besides 
> `gobblin.broker`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-489) Create PusherFactory

2018-06-11 Thread Zhixiong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-489:
--
Summary: Create PusherFactory  (was: Create general EventProducer with a 
Pusher)

> Create PusherFactory
> 
>
> Key: GOBBLIN-489
> URL: https://issues.apache.org/jira/browse/GOBBLIN-489
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> An `EventProducer` produces an event and sends it out with a `Pusher`. The 
> changes are:
>  * An `EventProducer` class and its corresponding `EventProducerFactory` 
> which creates a shared `EventProducer` instance
>  * Load broker config from configurable multiple namespaces besides 
> `gobblin.broker`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-501) Fix NPE thrown from read after EOF of LazyMaterializeDecryptorInputStream

2018-05-24 Thread Zhixiong Chen (JIRA)
Zhixiong Chen created GOBBLIN-501:
-

 Summary: Fix NPE thrown from read after EOF of 
LazyMaterializeDecryptorInputStream
 Key: GOBBLIN-501
 URL: https://issues.apache.org/jira/browse/GOBBLIN-501
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen


A `read` call to a LazyMaterializeDecryptorInputStream when it reaches EOF will 
throw a NPE. The fix is to return `-1` for any read after EOF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-493) Fix build issue in GithubDataEventTypesPartitioner

2018-05-15 Thread Zhixiong Chen (JIRA)
Zhixiong Chen created GOBBLIN-493:
-

 Summary: Fix build issue in GithubDataEventTypesPartitioner
 Key: GOBBLIN-493
 URL: https://issues.apache.org/jira/browse/GOBBLIN-493
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen


Build failure because of `GithubDataEventTypesPartitioner`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (GOBBLIN-489) Create general EventProducer with a Pusher

2018-05-08 Thread Zhixiong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen reassigned GOBBLIN-489:
-

Assignee: Zhixiong Chen

> Create general EventProducer with a Pusher
> --
>
> Key: GOBBLIN-489
> URL: https://issues.apache.org/jira/browse/GOBBLIN-489
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> An `EventProducer` produces an event and sends it out with a `Pusher`. The 
> changes are:
>  * An `EventProducer` class and its corresponding `EventProducerFactory` 
> which creates a shared `EventProducer` instance
>  * Load broker config from configurable multiple namespaces besides 
> `gobblin.broker`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-488) Make `AsyncRequest` aware of records

2018-05-04 Thread Zhixiong Chen (JIRA)
Zhixiong Chen created GOBBLIN-488:
-

 Summary: Make `AsyncRequest` aware of records
 Key: GOBBLIN-488
 URL: https://issues.apache.org/jira/browse/GOBBLIN-488
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen


Currently, while building an `AsyncRequest` from a collection of records, it 
doesn't know what records are processed. The change is to make it records aware 
so that a `ResponseHandler` can do post-process the records if necessary



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-487) Integrate PasswordManager in R2RestWriterBuilder

2018-05-04 Thread Zhixiong Chen (JIRA)
Zhixiong Chen created GOBBLIN-487:
-

 Summary: Integrate PasswordManager in R2RestWriterBuilder 
 Key: GOBBLIN-487
 URL: https://issues.apache.org/jira/browse/GOBBLIN-487
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-482) Add http write documentation

2018-04-30 Thread Zhixiong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-482:
--
Description: {color:#33}The old http write framework under 
`AbstractHttpWriter` and `AbstractHttpWriterBuilder`is deprecated! Use  
`AsyncHttpWriter` and `AsyncHttpWriterBuilder` instead{color}  (was: The old 
http write framework under 
[AbstractHttpWriter|{color:#ffc66d}[https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/writer/http/AbstractHttpWriter.java]{color}]
 and 
{color:#287bde}[`AbstractHttpWriterBuilder`|{color}{color:#ffc66d}[https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/writer/http/AbstractHttpWriterBuilder.java]{color}]
 is deprecated! Use {color:#808080}`AsyncHttpWriter` {color}and 
{color:#808080}`AsyncHttpWriterBuilder` {color}instead)

> Add http write documentation
> 
>
> Key: GOBBLIN-482
> URL: https://issues.apache.org/jira/browse/GOBBLIN-482
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> {color:#33}The old http write framework under `AbstractHttpWriter` and 
> `AbstractHttpWriterBuilder`is deprecated! Use  `AsyncHttpWriter` and 
> `AsyncHttpWriterBuilder` instead{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-482) Add http write documentation

2018-04-30 Thread Zhixiong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-482:
--
Description: 
The old http write framework under 
[AbstractHttpWriter|{color:#ffc66d}[https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/writer/http/AbstractHttpWriter.java]{color}]
 and 
{color:#287bde}[`AbstractHttpWriterBuilder`|{color}{color:#ffc66d}[https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/writer/http/AbstractHttpWriterBuilder.java]{color}]
 is deprecated! Use {color:#808080}`AsyncHttpWriter` {color}and 
{color:#808080}`AsyncHttpWriterBuilder` {color}instead

  was:
The old http write framework under 
[`AbstractHttpWriter`|{color:#ffc66d}https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/writer/http/AbstractHttpWriter.java{color}]
and 
{color:#287bde}[`AbstractHttpWriterBuilder`|{color}{color:#ffc66d}https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/writer/http/AbstractHttpWriterBuilder.java{color}]
is deprecated! Use {color:#808080}`AsyncHttpWriter` {color}and 
{color:#808080}`AsyncHttpWriterBuilder` {color}instead


> Add http write documentation
> 
>
> Key: GOBBLIN-482
> URL: https://issues.apache.org/jira/browse/GOBBLIN-482
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> The old http write framework under 
> [AbstractHttpWriter|{color:#ffc66d}[https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/writer/http/AbstractHttpWriter.java]{color}]
>  and 
> {color:#287bde}[`AbstractHttpWriterBuilder`|{color}{color:#ffc66d}[https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/writer/http/AbstractHttpWriterBuilder.java]{color}]
>  is deprecated! Use {color:#808080}`AsyncHttpWriter` {color}and 
> {color:#808080}`AsyncHttpWriterBuilder` {color}instead



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-435) Fix data publisher created from job broker not closed

2018-03-22 Thread Zhixiong Chen (JIRA)
Zhixiong Chen created GOBBLIN-435:
-

 Summary: Fix data publisher created from job broker not closed
 Key: GOBBLIN-435
 URL: https://issues.apache.org/jira/browse/GOBBLIN-435
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-430) Add lineage in SalesforceSource

2018-03-20 Thread Zhixiong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-430:
--
Description: 
- Set source lineage info into work units generated by `SalesforceSource`
  - Full lineage events can be sent if `SalesforceSource` is used together with 
a writer/publisher which put destination lineage info

  was:Add lineage in `SalesforceSource`


> Add lineage in SalesforceSource
> ---
>
> Key: GOBBLIN-430
> URL: https://issues.apache.org/jira/browse/GOBBLIN-430
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> - Set source lineage info into work units generated by `SalesforceSource`
>   - Full lineage events can be sent if `SalesforceSource` is used together 
> with a writer/publisher which put destination lineage info



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-430) Add lineage in SalesforceSource

2018-03-20 Thread Zhixiong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-430:
--
Summary: Add lineage in SalesforceSource  (was: Add lineage for salesforce 
source)

> Add lineage in SalesforceSource
> ---
>
> Key: GOBBLIN-430
> URL: https://issues.apache.org/jira/browse/GOBBLIN-430
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> Add lineage in `SalesforceSource`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-430) Add lineage for salesforce source

2018-03-20 Thread Zhixiong Chen (JIRA)
Zhixiong Chen created GOBBLIN-430:
-

 Summary: Add lineage for salesforce source
 Key: GOBBLIN-430
 URL: https://issues.apache.org/jira/browse/GOBBLIN-430
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen


Add lineage in `SalesforceSource`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-395) Add lineage for copying config based dataset

2018-01-30 Thread Zhixiong Chen (JIRA)
Zhixiong Chen created GOBBLIN-395:
-

 Summary: Add lineage for copying config based dataset
 Key: GOBBLIN-395
 URL: https://issues.apache.org/jira/browse/GOBBLIN-395
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen


Set file system based source and destination datasets for `CopyableFile`s of 
`ConfigBasedDataset`s



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (GOBBLIN-374) GobblinMetrics failed to close event reporters

2018-01-30 Thread Zhixiong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen closed GOBBLIN-374.
-
Resolution: Fixed

> GobblinMetrics failed to close event reporters
> --
>
> Key: GOBBLIN-374
> URL: https://issues.apache.org/jira/browse/GOBBLIN-374
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> A GobblinMetrics instance is cached as a soft value, which can be GC'ed 
> inadvertently without knowing that it is required to close the event 
> reporters when job completes. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (GOBBLIN-380) Add log about time elapsed for waiting services to be healthy

2018-01-30 Thread Zhixiong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen closed GOBBLIN-380.
-
Resolution: Fixed

> Add log about time elapsed for waiting services to be healthy
> -
>
> Key: GOBBLIN-380
> URL: https://issues.apache.org/jira/browse/GOBBLIN-380
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> Logs are added for `QuartzJobSpecScheduler` and  
> `StandardGobblinInstanceDriver`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-380) Add log about time elapsed for waiting services to be healthy

2018-01-18 Thread Zhixiong Chen (JIRA)
Zhixiong Chen created GOBBLIN-380:
-

 Summary: Add log about time elapsed for waiting services to be 
healthy
 Key: GOBBLIN-380
 URL: https://issues.apache.org/jira/browse/GOBBLIN-380
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen


Logs are added for `QuartzJobSpecScheduler` and  `StandardGobblinInstanceDriver`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-374) GobblinMetrics failed to close event reporters

2018-01-16 Thread Zhixiong Chen (JIRA)
Zhixiong Chen created GOBBLIN-374:
-

 Summary: GobblinMetrics failed to close event reporters
 Key: GOBBLIN-374
 URL: https://issues.apache.org/jira/browse/GOBBLIN-374
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Zhixiong Chen


A GobblinMetrics instance is cached as a soft value, which can be GC'ed 
inadvertently without knowing that it is required to close the event reporters 
when job completes. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (GOBBLIN-374) GobblinMetrics failed to close event reporters

2018-01-16 Thread Zhixiong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen reassigned GOBBLIN-374:
-

Assignee: Zhixiong Chen

> GobblinMetrics failed to close event reporters
> --
>
> Key: GOBBLIN-374
> URL: https://issues.apache.org/jira/browse/GOBBLIN-374
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
>
> A GobblinMetrics instance is cached as a soft value, which can be GC'ed 
> inadvertently without knowing that it is required to close the event 
> reporters when job completes. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-364) Exclude JobState from WorkUnit created by PartitionedFileSourceBase

2018-01-10 Thread Zhixiong Chen (JIRA)
Zhixiong Chen created GOBBLIN-364:
-

 Summary: Exclude JobState from WorkUnit created by 
PartitionedFileSourceBase
 Key: GOBBLIN-364
 URL: https://issues.apache.org/jira/browse/GOBBLIN-364
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen


Currently, each `WorkUnit` created by a `PartitionedFileSourceBase` source has 
a copy of the entire job configurations. For the following 2 reasons, we want 
to exclude job configurations from `WorkUnit`:
  - It's redundant as the runtime counterpart of `WorkUnit`, which is 
`WorkUnitState`, would have a reference to all job configurations.
  - Adding job configurations to `WorkUnit` has the bad side effect of masking 
dynamic job level configurations in MR Task runner



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (GOBBLIN-344) Fix help method getResolver in LineageInfo is private

2017-12-11 Thread Zhixiong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-344:
--
Summary: Fix help method getResolver in LineageInfo is private  (was: Fix 
getResolver help method in LineageInfo is private)

> Fix help method getResolver in LineageInfo is private
> -
>
> Key: GOBBLIN-344
> URL: https://issues.apache.org/jira/browse/GOBBLIN-344
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>
> In the PR https://github.com/apache/incubator-gobblin/pull/2187, I mistakenly 
> made help method `LineageInfo#getResolver` private. It should be `public`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GOBBLIN-344) Fix getResolver help method in LineageInfo is private

2017-12-11 Thread Zhixiong Chen (JIRA)
Zhixiong Chen created GOBBLIN-344:
-

 Summary: Fix getResolver help method in LineageInfo is private
 Key: GOBBLIN-344
 URL: https://issues.apache.org/jira/browse/GOBBLIN-344
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen


In the PR https://github.com/apache/incubator-gobblin/pull/2187, I mistakenly 
made help method `LineageInfo#getResolver` private. It should be `public`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (GOBBLIN-319) Add DatasetResolver to transform raw Gobblin dataset to application specific dataset

2017-11-28 Thread Zhixiong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-319:
--
Description: 
- Add DatasetResolver to transform raw Gobblin dataset to application specific 
dataset
- Fix lineage info not set while publishing single task data in 
`BaseDataPublisher`

  was:
- Add `exampleDataDir` metadata for file system based datasets
- Fix lineage info not set while publishing single task data in 
`BaseDataPublisher`


> Add DatasetResolver to transform raw Gobblin dataset to application specific 
> dataset
> 
>
> Key: GOBBLIN-319
> URL: https://issues.apache.org/jira/browse/GOBBLIN-319
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>
> - Add DatasetResolver to transform raw Gobblin dataset to application 
> specific dataset
> - Fix lineage info not set while publishing single task data in 
> `BaseDataPublisher`



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (GOBBLIN-319) Add DatasetResolver to transform raw Gobblin dataset to application specific dataset

2017-11-28 Thread Zhixiong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-319:
--
Summary: Add DatasetResolver to transform raw Gobblin dataset to 
application specific dataset  (was:  Add exampleDataDir when sending file 
system based dataset lineage)

> Add DatasetResolver to transform raw Gobblin dataset to application specific 
> dataset
> 
>
> Key: GOBBLIN-319
> URL: https://issues.apache.org/jira/browse/GOBBLIN-319
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>
> - Add `exampleDataDir` metadata for file system based datasets
> - Fix lineage info not set while publishing single task data in 
> `BaseDataPublisher`



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (GOBBLIN-319) Add exampleDataDir when sending file system based dataset lineage

2017-11-27 Thread Zhixiong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-319:
--
Summary:  Add exampleDataDir when sending file system based dataset lineage 
 (was: Lineage event not sent for publishing single task)

>  Add exampleDataDir when sending file system based dataset lineage
> --
>
> Key: GOBBLIN-319
> URL: https://issues.apache.org/jira/browse/GOBBLIN-319
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>
> - Fix lineage info not set while publishing single task data in 
> `BaseDataPublisher`
> - Add `exampleDataDir` metadata for file system based datasets



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (GOBBLIN-319) Add exampleDataDir when sending file system based dataset lineage

2017-11-27 Thread Zhixiong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-319:
--
Description: 
- Add `exampleDataDir` metadata for file system based datasets
- Fix lineage info not set while publishing single task data in 
`BaseDataPublisher`

  was:
- Fix lineage info not set while publishing single task data in 
`BaseDataPublisher`
- Add `exampleDataDir` metadata for file system based datasets


>  Add exampleDataDir when sending file system based dataset lineage
> --
>
> Key: GOBBLIN-319
> URL: https://issues.apache.org/jira/browse/GOBBLIN-319
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>
> - Add `exampleDataDir` metadata for file system based datasets
> - Fix lineage info not set while publishing single task data in 
> `BaseDataPublisher`



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (GOBBLIN-319) Lineage event not sent for publishing single task

2017-11-21 Thread Zhixiong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-319:
--
Description: 
- Fix lineage info not set while publishing single task data in 
`BaseDataPublisher`
- Add `exampleDataDir` metadata for file system based datasets

> Lineage event not sent for publishing single task
> -
>
> Key: GOBBLIN-319
> URL: https://issues.apache.org/jira/browse/GOBBLIN-319
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>
> - Fix lineage info not set while publishing single task data in 
> `BaseDataPublisher`
> - Add `exampleDataDir` metadata for file system based datasets



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GOBBLIN-319) Lineage event not sent for publishing single task

2017-11-17 Thread Zhixiong Chen (JIRA)
Zhixiong Chen created GOBBLIN-319:
-

 Summary: Lineage event not sent for publishing single task
 Key: GOBBLIN-319
 URL: https://issues.apache.org/jira/browse/GOBBLIN-319
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GOBBLIN-315) Fix shaded avro is used in LineageEventBuilder

2017-11-14 Thread Zhixiong Chen (JIRA)
Zhixiong Chen created GOBBLIN-315:
-

 Summary: Fix shaded avro is used in LineageEventBuilder
 Key: GOBBLIN-315
 URL: https://issues.apache.org/jira/browse/GOBBLIN-315
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (GOBBLIN-307) Implement lineage event in gobblin

2017-11-07 Thread Zhixiong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-307:
--
Summary: Implement lineage event in gobblin  (was: Define lineage event in 
gobblin)

> Implement lineage event in gobblin
> --
>
> Key: GOBBLIN-307
> URL: https://issues.apache.org/jira/browse/GOBBLIN-307
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GOBBLIN-307) Define lineage event in gobblin

2017-11-07 Thread Zhixiong Chen (JIRA)
Zhixiong Chen created GOBBLIN-307:
-

 Summary: Define lineage event in gobblin
 Key: GOBBLIN-307
 URL: https://issues.apache.org/jira/browse/GOBBLIN-307
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GOBBLIN-305) Add csv-kafka and kafka-hdfs template

2017-11-06 Thread Zhixiong Chen (JIRA)
Zhixiong Chen created GOBBLIN-305:
-

 Summary: Add csv-kafka and kafka-hdfs template
 Key: GOBBLIN-305
 URL: https://issues.apache.org/jira/browse/GOBBLIN-305
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (GOBBLIN-296) Kafka json source and writer

2017-10-26 Thread Zhixiong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-296:
--
Description: 
- Add a json source and writer for kafka 09: `Kafka09JsonSource` and 
`Kafka09JsonObjectWriterBuilder`
- Move common gson ser/de logic to gobblin-kafka-common module

> Kafka json source and writer
> 
>
> Key: GOBBLIN-296
> URL: https://issues.apache.org/jira/browse/GOBBLIN-296
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>
> - Add a json source and writer for kafka 09: `Kafka09JsonSource` and 
> `Kafka09JsonObjectWriterBuilder`
> - Move common gson ser/de logic to gobblin-kafka-common module



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GOBBLIN-296) Kafka json source and writer

2017-10-24 Thread Zhixiong Chen (JIRA)
Zhixiong Chen created GOBBLIN-296:
-

 Summary: Kafka json source and writer
 Key: GOBBLIN-296
 URL: https://issues.apache.org/jira/browse/GOBBLIN-296
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GOBBLIN-283) Refactor EnvelopePayloadConverter to support multi fields conversion

2017-10-10 Thread Zhixiong Chen (JIRA)
Zhixiong Chen created GOBBLIN-283:
-

 Summary: Refactor EnvelopePayloadConverter to support multi fields 
conversion
 Key: GOBBLIN-283
 URL: https://issues.apache.org/jira/browse/GOBBLIN-283
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (GOBBLIN-278) Fix sending lineage event for KafkaSource

2017-10-06 Thread Zhixiong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-278:
--
Description: 
1. Fix lineage event for KafkaSource not send, and void resending the events by 
removing configurations with key prefix `gobblin.lineage` from the state
2. Fix `KafkaWorkUnitPacker` disregards existing configurations of work units

> Fix sending lineage event for KafkaSource
> -
>
> Key: GOBBLIN-278
> URL: https://issues.apache.org/jira/browse/GOBBLIN-278
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>
> 1. Fix lineage event for KafkaSource not send, and void resending the events 
> by removing configurations with key prefix `gobblin.lineage` from the state
> 2. Fix `KafkaWorkUnitPacker` disregards existing configurations of work units



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GOBBLIN-278) Fix sending lineage event for KafkaSource

2017-10-06 Thread Zhixiong Chen (JIRA)
Zhixiong Chen created GOBBLIN-278:
-

 Summary: Fix sending lineage event for KafkaSource
 Key: GOBBLIN-278
 URL: https://issues.apache.org/jira/browse/GOBBLIN-278
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   >