[ 
https://issues.apache.org/jira/browse/KAFKA-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203563#comment-17203563
 ] 

Randall Hauch commented on KAFKA-9546:
--------------------------------------

[~galyo], thanks for the suggestion and the PR.

I've added the `needs-kip` label, because the `FileStreamSourceConnector ` is 
part of the Connect API, even though it is intentionally just an example 
connector that helps demonstrate Connect. Because a KIP is required, I question 
whether changing this connector is really worth it. And because these file 
connectors are the only ones that ship with AK, extending them will undoubtably 
create issues if you're extension is installed into a different version of AK 
than the one with which it is compiled.

If you're providing a customized task class, could you not just provide your 
own `SourceConnector` class? You'd have a lot more control over, and you've 
have much more freedom to be able to deploy your connector into nearly any 
version of a Kafka Connect cluster installation. (The only limitation would be 
which of the Connect APIs you chose to use, such as the use of headers.)

As such, I think it's not worth the complication to the examples nor to your 
connector to make this change.

> Make FileStreamSourceTask extendable with generic streams
> ---------------------------------------------------------
>
>                 Key: KAFKA-9546
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9546
>             Project: Kafka
>          Issue Type: Improvement
>          Components: KafkaConnect
>            Reporter: Csaba Galyo
>            Assignee: Csaba Galyo
>            Priority: Major
>              Labels: connect-api, needs-kip
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Use case: I want to read a ZIP compressed text file with a file connector and 
> send it to Kafka.
> Currently, we have FileStreamSourceConnector which reads a \n delimited text 
> file. This connector always returns a task of type FileStreamSourceTask.
> The FileStreamSourceTask reads from stdio or opens a file InputStream. The 
> issue with this approach is that the input needs to be a text file, otherwise 
> it won't work. 
> The code should be modified so that users could change the default 
> InputStream to eg. ZipInputStream, or any other format. The code is currently 
> written in such a way that it's not possible to extend it, we cannot use a 
> different input stream. 
> See example here where the code got copy-pasted just so it could read from a 
> ZstdInputStream (which reads ZSTD compressed files): 
> [https://github.com/gcsaba2/kafka-zstd/tree/master/src/main/java/org/apache/kafka/connect/file]
>  
> I suggest 2 changes:
>  # FileStreamSourceConnector should be extendable to return tasks of 
> different types. These types would be input by the user through the 
> configuration map
>  # FileStreamSourceTask should be modified so it could be extended and child 
> classes could define different input streams.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to