JulianJaffePinterest commented on issue #9780:
URL: https://github.com/apache/druid/issues/9780#issuecomment-744228708


   No update from my side on this. I haven't had time to work on this and it 
appears that there isn't community appetite to _support_ direct Spark readers 
and writers (there is certainly community desire for such a feature though!). I 
haven't left this in a great state - the code works, and I suppose if you 
wanted to work backwards from the 
[DruidDataSourceOptionKeys](https://github.com/JulianJaffePinterest/druid/blob/spark_druid_connector/extensions-core/spark-extensions/src/main/scala/org/apache/druid/spark/utils/DruidDataSourceOptionKeys.scala)
 you could even make it work for you but the documentation is mostly in code 
comments instead of a useful readme. Even the usual first step of just working 
backwards from the tests doesn't help too much here since the whole point of 
these connectors is to hide the various APIs behind the Spark DataSourceV2 API 
- everything interesting happens in the options passed along in the 
`.options()` call (e.g. `spark.read.format('druid').options(Map(D
 ruidDataSourceOptionKeys.brokerHostKey -> '<my broker hostname>', ...))`) so 
the key piece for usability is the documentation. The best pointers I can give 
you there are that the reader is easier to get working, and can mostly be read 
off from the DruidDataSourceOptionsKeys. The writer is trickier, as discussed 
in the proposal and in the code. Basically, in order to effectively use the 
output in Druid you'll need to use a custom partitioner in Spark and pass along 
a map to the writer to work around the limited information Spark passes across. 
This pattern is pretty anti-user and so one of the aims of this proposal was to 
start a discussion on ways to improve the situation, but that hasn't happened 
yet.
   
   @averma111, @mangrrua, and any others interested in working on this 
proposal/packaging the code/etc., my code is licensed under the same license as 
the Druid project, as confirmed in the license headers on each file. I would 
love to see the community pick up where I left off.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Reply via email to