[jira] [Updated] (KAFKA-9546) Make FileStreamSourceTask extendable with generic streams

2020-02-18 Thread Csaba Galyo (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Galyo updated KAFKA-9546:
---
Labels: connect-api  (was: )

> Make FileStreamSourceTask extendable with generic streams
> -
>
> Key: KAFKA-9546
> URL: https://issues.apache.org/jira/browse/KAFKA-9546
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Csaba Galyo
>Assignee: Csaba Galyo
>Priority: Major
>  Labels: connect-api
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Use case: I want to read a ZIP compressed text file with a file connector and 
> send it to Kafka.
> Currently, we have FileStreamSourceConnector which reads a \n delimited text 
> file. This connector always returns a task of type FileStreamSourceTask.
> The FileStreamSourceTask reads from stdio or opens a file InputStream. The 
> issue with this approach is that the input needs to be a text file, otherwise 
> it won't work. 
> The code should be modified so that users could change the default 
> InputStream to eg. ZipInputStream, or any other format. The code is currently 
> written in such a way that it's not possible to extend it, we cannot use a 
> different input stream. 
> See example here where the code got copy-pasted just so it could read from a 
> ZstdInputStream (which reads ZSTD compressed files): 
> [https://github.com/gcsaba2/kafka-zstd/tree/master/src/main/java/org/apache/kafka/connect/file]
>  
> I suggest 2 changes:
>  # FileStreamSourceConnector should be extendable to return tasks of 
> different types. These types would be input by the user through the 
> configuration map
>  # FileStreamSourceTask should be modified so it could be extended and child 
> classes could define different input streams.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9546) Make FileStreamSourceTask extendable with generic streams

2020-02-13 Thread Csaba Galyo (Jira)
Csaba Galyo created KAFKA-9546:
--

 Summary: Make FileStreamSourceTask extendable with generic streams
 Key: KAFKA-9546
 URL: https://issues.apache.org/jira/browse/KAFKA-9546
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Reporter: Csaba Galyo


Use case: I want to read a ZIP compressed text file with a file connector and 
send it to Kafka.

Currently, we have FileStreamSourceConnector which reads a \n delimited text 
file. This connector always returns a task of type FileStreamSourceTask.

The FileStreamSourceTask reads from stdio or opens a file InputStream. The 
issue with this approach is that the input needs to be a text file, otherwise 
it won't work. 

The code should be modified so that users could change the default InputStream 
to eg. ZipInputStream, or any other format. The code is currently written in 
such a way that it's not possible to extend it, we cannot use a different input 
stream. 

See example here where the code got copy-pasted just so it could read from a 
ZstdInputStream (which reads ZSTD compressed files): 
[https://github.com/gcsaba2/kafka-zstd/tree/master/src/main/java/org/apache/kafka/connect/file]

 

I suggest 2 changes:
 # FileStreamSourceConnector should be extendable to return tasks of different 
types. These types would be input by the user through the configuration map
 # FileStreamSourceTask should be modified so it could be extended and child 
classes could define different input streams.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)