Hello Priyank

Writing something purely in Scale/Java would be the most efficient. Even if
we expose python APIs that allow writing custom sinks in pure Python, it
wont be as efficient as Scala/Java foreach as the data would have to go
through JVM / PVM boundary which has significant overheads. So Scala/Java
foreach is always going to be the best option.

TD

On Tue, Jul 25, 2017 at 6:05 PM, Priyank Shrivastava <priy...@asperasoft.com
> wrote:

> I am trying to write key-values to redis using a DataStreamWriter object
> using pyspark structured streaming APIs. I am using Spark 2.2
>
> Since the Foreach Sink is not supported for python; here
> <http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#using-foreach>,
> I am trying to find out some alternatives.
>
> One alternative is to write a separate Scala module only to push data into
> redis using foreach; ForeachWriter
> <http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.ForeachWriter>
>  is
> supported in Scala. BUT this doesn't seem like an efficient approach and
> adds deployment overhead because now I will have to support Scala in my app.
>
> Another approach is obviously to use Scala instead of python, which is
> fine but I want to make sure that I absolutely cannot use python for this
> problem before I take this path.
>
> Would appreciate some feedback and alternative design approaches for this
> problem.
>
> Thanks.
>
>
>
>

Reply via email to