Re: Multiple Kafka Spark Streaming Dataframe Join query

Dylan Guedes Wed, 14 Mar 2018 11:58:02 -0700

Hi,

I've been using the Kafka with pyspark since 2.1.


On Wed, Mar 14, 2018 at 3:49 PM, Aakash Basu <aakash.spark....@gmail.com>
wrote:

> Hi,
>
> I'm yet to.
>
> Just want to know, when does Spark 2.3 with 0.10 Kafka Spark Package
> allows Python? I read somewhere, as of now Scala and Java are the languages
> to be used.
>
> Please correct me if am wrong.
>
> Thanks,
> Aakash.
>
> On 14-Mar-2018 8:24 PM, "Georg Heiler" <georg.kf.hei...@gmail.com> wrote:
>
>> Did you try spark 2.3 with structured streaming? There watermarking and
>> plain sql might be really interesting for you.
>> Aakash Basu <aakash.spark....@gmail.com> schrieb am Mi. 14. März 2018 um
>> 14:57:
>>
>>> Hi,
>>>
>>>
>>>
>>> *Info (Using):Spark Streaming Kafka 0.8 package*
>>>
>>> *Spark 2.2.1*
>>> *Kafka 1.0.1*
>>>
>>> As of now, I am feeding paragraphs in Kafka console producer and my
>>> Spark, which is acting as a receiver is printing the flattened words, which
>>> is a complete RDD operation.
>>>
>>> *My motive is to read two tables continuously (being updated) as two
>>> distinct Kafka topics being read as two Spark Dataframes and join them
>>> based on a key and produce the output. *(I am from Spark-SQL
>>> background, pardon my Spark-SQL-ish writing)
>>>
>>> *It may happen, the first topic is receiving new data 15 mins prior to
>>> the second topic, in that scenario, how to proceed? I should not lose any
>>> data.*
>>>
>>> As of now, I want to simply pass paragraphs, read them as RDD, convert
>>> to DF and then join to get the common keys as the output. (Just for R&D).
>>>
>>> Started using Spark Streaming and Kafka today itself.
>>>
>>> Please help!
>>>
>>> Thanks,
>>> Aakash.
>>>
>>

Re: Multiple Kafka Spark Streaming Dataframe Join query

Reply via email to