Re: [KafkaSourceProvider] Why topic option and column without reverting to path as the least priority?

Jacek Laskowski Thu, 04 May 2017 05:17:52 -0700

https://issues.apache.org/jira/browse/SPARK-20597


I'm going to send a PR soon.

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Mon, May 1, 2017 at 8:26 PM, Cody Koeninger <[email protected]> wrote:
> Yeah, seems reasonable.
>
> On Mon, May 1, 2017 at 12:40 PM, Jacek Laskowski <[email protected]> wrote:
>> Hi,
>>
>> Thanks Cody and Michael! I didn't expect to get two answers so quickly and
>> from THE brains behind spark - Kafka integration. #impressed
>>
>> Yes, Michael has nailed it. Using save's path was so natural to me after
>> months with Spark that I was surprised to not have seen it instead of the
>> custom and surely not very obvious topic.
>>
>> Imagine my day today when I'd discovered that I could use KafkaSource in
>> batch queries and then suddenly found out about no support for path in save.
>> I'm not faint-hearted so I survived :-)
>>
>> I think that change would make KafkaSource even cooler. Please add support
>> if possible (and make it part of the upcoming 2.2.0, too!)
>>
>> Thanks.
>>
>> Jacek
>>
>> On 1 May 2017 7:26 p.m., "Michael Armbrust" <[email protected]> wrote:
>>>
>>> He's just suggesting that since the DataStreamWriter start() method can
>>> fill in an option named "path", we should make that a synonym for "topic".
>>> Then you could do something like.
>>>
>>> df.writeStream.format("kafka").start("topic")
>>>
>>> Seems reasonable if people don't think that is confusing.
>>>
>>> On Mon, May 1, 2017 at 8:43 AM, Cody Koeninger <[email protected]> wrote:
>>>>
>>>> I'm confused about what you're suggesting.  Are you saying that a
>>>> Kafka sink should take a filesystem path as an option?
>>>>
>>>> On Mon, May 1, 2017 at 8:52 AM, Jacek Laskowski <[email protected]> wrote:
>>>> > Hi,
>>>> >
>>>> > I've just found out that KafkaSourceProvider supports topic option
>>>> > that sets the Kafka topic to save a DataFrame to.
>>>> >
>>>> > You can also use topic column to assign rows to topics.
>>>> >
>>>> > Given the features, I've been wondering why "path" option is not
>>>> > supported (even of least precedence) so when no topic column or option
>>>> > are defined, save(path: String) would be the least priority.
>>>> >
>>>> > WDYT?
>>>> >
>>>> > It looks pretty trivial to support --> see KafkaSourceProvider at
>>>> > lines [1] and [2] if I'm not mistaken.
>>>> >
>>>> > [1]
>>>> > https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L145
>>>> > [2]
>>>> > https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L163
>>>> >
>>>> > Pozdrawiam,
>>>> > Jacek Laskowski
>>>> > ----
>>>> > https://medium.com/@jaceklaskowski/
>>>> > Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
>>>> > Follow me at https://twitter.com/jaceklaskowski
>>>> >
>>>> > ---------------------------------------------------------------------
>>>> > To unsubscribe e-mail: [email protected]
>>>> >
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: [email protected]
>>>>
>>>
>>

---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

Re: [KafkaSourceProvider] Why topic option and column without reverting to path as the least priority?

Reply via email to