subject:"Re\: \[SPARK STRUCTURED STREAMING\]\: Alternatives to using Foreach sink in pyspark"

Re: [SPARK STRUCTURED STREAMING]: Alternatives to using Foreach sink in pyspark

2017-07-28 Thread Priyank Shrivastava

Also, in your example doesn't the tempview need to be accessed using the same sparkSession on the scala side? Since I am not using a notebook, how can I get access to the same sparksession in scala. On Fri, Jul 28, 2017 at 3:17 PM, Priyank Shrivastava wrote: > Thanks

Re: [SPARK STRUCTURED STREAMING]: Alternatives to using Foreach sink in pyspark

2017-07-28 Thread Priyank Shrivastava

Thanks Burak. In a streaming context would I need to do any state management for the temp views? for example across sliding windows. Priyank On Fri, Jul 28, 2017 at 3:13 PM, Burak Yavuz wrote: > Hi Priyank, > > You may register them as temporary tables to use across language

Re: [SPARK STRUCTURED STREAMING]: Alternatives to using Foreach sink in pyspark

2017-07-28 Thread Burak Yavuz

Hi Priyank, You may register them as temporary tables to use across language boundaries. Python: df = spark.readStream... # Python logic df.createOrReplaceTempView("tmp1") Scala: val df = spark.table("tmp1") df.writeStream .foreach(...) On Fri, Jul 28, 2017 at 3:06 PM, Priyank Shrivastava

Re: [SPARK STRUCTURED STREAMING]: Alternatives to using Foreach sink in pyspark

2017-07-28 Thread Priyank Shrivastava

TD, For a hybrid python-scala approach, what's the recommended way of handing off a dataframe from python to scala. I would like to know especially in a streaming context. I am not using notebooks/databricks. We are running it on our own spark 2.1 cluster. Priyank On Wed, Jul 26, 2017 at

Re: [SPARK STRUCTURED STREAMING]: Alternatives to using Foreach sink in pyspark

2017-07-27 Thread Tathagata Das

For built-in SQL functions, it does not matter which language you use as the engine will use the most optimized JVM code to execute. However, in your case, you are asking for foreach in python. My interpretation was that you want to specify your python function that process the rows in python.

Re: [SPARK STRUCTURED STREAMING]: Alternatives to using Foreach sink in pyspark

2017-07-26 Thread ayan guha

Hi TD I thought structured streaming does provide similar concept of dataframes where it does not matter which language I use to invoke the APIs, with exception of udf. So, when I think of support foreach sink in python, I think it as just a wrapper api and data should remain in JVM only.

Re: [SPARK STRUCTURED STREAMING]: Alternatives to using Foreach sink in pyspark

2017-07-26 Thread Tathagata Das

We see that all the time. For example, in SQL, people can write their user-defined function in Scala/Java and use it from SQL/python/anywhere. That is the recommended way to get the best combo of performance and ease-of-use from non-jvm languages. On Wed, Jul 26, 2017 at 11:49 AM, Priyank

Re: [SPARK STRUCTURED STREAMING]: Alternatives to using Foreach sink in pyspark

2017-07-26 Thread Priyank Shrivastava

Thanks TD. I am going to try the python-scala hybrid approach by using scala only for custom redis sink and python for the rest of the app . I understand it might not be as efficient as purely writing the app in scala but unfortunately I am constrained on scala resources. Have you come across

Re: [SPARK STRUCTURED STREAMING]: Alternatives to using Foreach sink in pyspark

2017-07-26 Thread Tathagata Das

Hello Priyank Writing something purely in Scale/Java would be the most efficient. Even if we expose python APIs that allow writing custom sinks in pure Python, it wont be as efficient as Scala/Java foreach as the data would have to go through JVM / PVM boundary which has significant overheads. So

Re: [SPARK STRUCTURED STREAMING]: Alternatives to using Foreach sink in pyspark

Re: [SPARK STRUCTURED STREAMING]: Alternatives to using Foreach sink in pyspark

Re: [SPARK STRUCTURED STREAMING]: Alternatives to using Foreach sink in pyspark

Re: [SPARK STRUCTURED STREAMING]: Alternatives to using Foreach sink in pyspark

Re: [SPARK STRUCTURED STREAMING]: Alternatives to using Foreach sink in pyspark

Re: [SPARK STRUCTURED STREAMING]: Alternatives to using Foreach sink in pyspark

Re: [SPARK STRUCTURED STREAMING]: Alternatives to using Foreach sink in pyspark

Re: [SPARK STRUCTURED STREAMING]: Alternatives to using Foreach sink in pyspark

Re: [SPARK STRUCTURED STREAMING]: Alternatives to using Foreach sink in pyspark

9 matches

Site Navigation

Mail list logo

Footer information