Thanks TD. I am going to try the python-scala hybrid approach by using scala only for custom redis sink and python for the rest of the app . I understand it might not be as efficient as purely writing the app in scala but unfortunately I am constrained on scala resources. Have you come across other use cases where people have resided to such python-scala hybrid approach?
Regards, Priyank On Wed, Jul 26, 2017 at 1:46 AM, Tathagata Das <tathagata.das1...@gmail.com> wrote: > Hello Priyank > > Writing something purely in Scale/Java would be the most efficient. Even > if we expose python APIs that allow writing custom sinks in pure Python, it > wont be as efficient as Scala/Java foreach as the data would have to go > through JVM / PVM boundary which has significant overheads. So Scala/Java > foreach is always going to be the best option. > > TD > > On Tue, Jul 25, 2017 at 6:05 PM, Priyank Shrivastava < > priy...@asperasoft.com> wrote: > >> I am trying to write key-values to redis using a DataStreamWriter object >> using pyspark structured streaming APIs. I am using Spark 2.2 >> >> Since the Foreach Sink is not supported for python; here >> <http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#using-foreach>, >> I am trying to find out some alternatives. >> >> One alternative is to write a separate Scala module only to push data >> into redis using foreach; ForeachWriter >> <http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.ForeachWriter> >> is >> supported in Scala. BUT this doesn't seem like an efficient approach and >> adds deployment overhead because now I will have to support Scala in my app. >> >> Another approach is obviously to use Scala instead of python, which is >> fine but I want to make sure that I absolutely cannot use python for this >> problem before I take this path. >> >> Would appreciate some feedback and alternative design approaches for this >> problem. >> >> Thanks. >> >> >> >> >