JulianJaffePinterest commented on issue #9780: URL: https://github.com/apache/druid/issues/9780#issuecomment-744228708
No update from my side on this. I haven't had time to work on this and it appears that there isn't community appetite to _support_ direct Spark readers and writers (there is certainly community desire for such a feature though!). I haven't left this in a great state - the code works, and I suppose if you wanted to work backwards from the [DruidDataSourceOptionKeys](https://github.com/JulianJaffePinterest/druid/blob/spark_druid_connector/extensions-core/spark-extensions/src/main/scala/org/apache/druid/spark/utils/DruidDataSourceOptionKeys.scala) you could even make it work for you but the documentation is mostly in code comments instead of a useful readme. Even the usual first step of just working backwards from the tests doesn't help too much here since the whole point of these connectors is to hide the various APIs behind the Spark DataSourceV2 API - everything interesting happens in the options passed along in the `.options()` call (e.g. `spark.read.format('druid').options(Map(D ruidDataSourceOptionKeys.brokerHostKey -> '<my broker hostname>', ...))`) so the key piece for usability is the documentation. The best pointers I can give you there are that the reader is easier to get working, and can mostly be read off from the DruidDataSourceOptionsKeys. The writer is trickier, as discussed in the proposal and in the code. Basically, in order to effectively use the output in Druid you'll need to use a custom partitioner in Spark and pass along a map to the writer to work around the limited information Spark passes across. This pattern is pretty anti-user and so one of the aims of this proposal was to start a discussion on ways to improve the situation, but that hasn't happened yet. @averma111, @mangrrua, and any others interested in working on this proposal/packaging the code/etc., my code is licensed under the same license as the Druid project, as confirmed in the license headers on each file. I would love to see the community pick up where I left off. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org