nevi-me commented on pull request #8968:
URL: https://github.com/apache/arrow/pull/8968#issuecomment-748500486


   Hi @kflansburg this is some great work. I've just gone through the code 
briefly.
   
   > I really like your idea of using Kafka as a transport layer for Arrow 
Flight messages.
   
   I'd be interested in seeing how we could go about with implementing this.
   
   > I was planning to try to implement some sort of JSON parsing -> Arrow 
StructArray for the Kafka payload field, but parsing it as Arrow flight would 
be very cool as well.
   
   Our JSON reader already has the building blocks needed to trivially do this, 
and after #8938, you should be able to read all nested JSON types.
   
   I played around with converting Avro messages from Kafka into Arrow data. 
This would also be an interesting use-case for your streaming usecase.
   
   ___
   
   There is a slight downside to having the `arrow-kafka` live in this 
repository, which is that `librdkafka` isn't trivial to install in Windows (I 
use it in WSL instead). So from a development perspective, it might impose some 
load on developers (esp drive-by contributions).
   
   I'm a proponent of bundling crates into `arrow/rust` if they could benefit 
from us (i.e. the commiters and regular contributors) making some changes to 
keep them compiling. We sometimes make breaking changes to our interfaces, so 
being able to fix the crates is very useful.
   
   With the above said, I think we should use this crate as an opportunity to 
have a bigger discussion about where additional modules should live. For 
example, I recently opened a draft RFC for `arrow-sql` (#8731), with my main 
motivation of wanting to put it into `rust/arrow/arrow-sql` being that it could 
also benefit from the performance improvements that we're regularly making.
   
   We could try the `arrow-contrib` approach, where we maintain additional IO 
modules and other crates or projects in languages other than Rust.
   This would be similar to other projects like OpenTracing & OpenTelemetry 
where separate tracing libraries are maintained within the same organisation, 
but under different repos.
   This is probably a bigger mailing list discussion, but I'd like to hear your 
and @andygrove 's thoughts first.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to