Re: [Structured Streaming] Kafka group.id is fixed

2018-11-19 Thread Cody Koeninger
Anastasios it looks like you already identified the two lines that
need to change, the string interpolation that depends on
UUID.randomUUID and metadataPath.hashCode.

I'd factor that out into a function that returns the group id.  That
function would also need to take the "parameters" variable (the map of
user-provided options) and look for a prefix for the group id,
defaulting to the current behavior.

If you have questions, feel free to ping me on the jira, or get as far
as you can and submit a PR for more discussion.
On Mon, Nov 19, 2018 at 2:38 PM Anastasios Zouzias  wrote:
>
> Hi Tom,
>
> I initiated an issue here: https://issues.apache.org/jira/browse/SPARK-26121
>
> Feel free to edit/update the ticket. If someone familiar with the codebase 
> has any suggestion on the proper way of fixing this, I could work on it.
>
> Best,
> Anastasios
>
> On Mon, Nov 19, 2018 at 4:31 PM Tom Graves  wrote:
>>
>> This makes sense to me and was going to propose something similar in order 
>> to be able to use the kafka acls more effectively as well, can you file a 
>> jira for it?
>>
>> Tom
>>
>> On Friday, November 9, 2018, 2:26:12 AM CST, Anastasios Zouzias 
>>  wrote:
>>
>>
>> Hi all,
>>
>> I run in the following situation with Spark Structure Streaming (SS) using 
>> Kafka.
>>
>> In a project that I work on, there is already a secured Kafka setup where 
>> ops can issue an SSL certificate per "group.id", which should be predefined 
>> (or hopefully its prefix to be predefined).
>>
>> On the other hand, Spark SS fixes the group.id to
>>
>> val uniqueGroupId = 
>> s"spark-kafka-source-${UUID.randomUUID}-${metadataPath.hashCode}"
>>
>> see, i.e.,
>>
>> https://github.com/apache/spark/blob/v2.4.0/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L124
>>
>> I guess Spark developers had a good reason to fix it, but is it possible to 
>> make configurable the prefix of the above uniqueGroupId 
>> ("spark-kafka-source")? If so, I could prepare a PR on it.
>>
>> The rational is that we do not want all spark-jobs to use the same 
>> certificate on group-ids of the form (spark-kafka-source-*).
>>
>>
>> Best regards,
>> Anastasios Zouzias
>
>
>
> --
> -- Anastasios Zouzias

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [Structured Streaming] Kafka group.id is fixed

2018-11-19 Thread Anastasios Zouzias
Hi Tom,

I initiated an issue here: https://issues.apache.org/jira/browse/SPARK-26121

Feel free to edit/update the ticket. If someone familiar with the codebase
has any suggestion on the proper way of fixing this, I could work on it.

Best,
Anastasios

On Mon, Nov 19, 2018 at 4:31 PM Tom Graves  wrote:

> This makes sense to me and was going to propose something similar in order
> to be able to use the kafka acls more effectively as well, can you file a
> jira for it?
>
> Tom
>
> On Friday, November 9, 2018, 2:26:12 AM CST, Anastasios Zouzias <
> zouz...@gmail.com> wrote:
>
>
> Hi all,
>
> I run in the following situation with Spark Structure Streaming (SS) using
> Kafka.
>
> In a project that I work on, there is already a secured Kafka setup where
> ops can issue an SSL certificate per "group.id", which should be
> predefined (or hopefully its prefix to be predefined).
>
> On the other hand, Spark SS fixes the group.id to
>
> val uniqueGroupId = s"
> spark-kafka-source-${UUID.randomUUID}-${metadataPath.hashCode}"
>
> see, i.e.,
>
>
> https://github.com/apache/spark/blob/v2.4.0/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L124
>
> I guess Spark developers had a good reason to fix it, but is it possible
> to make configurable the prefix of the above uniqueGroupId
> ("spark-kafka-source")? If so, I could prepare a PR on it.
>
> The rational is that we do not want all spark-jobs to use the same
> certificate on group-ids of the form (spark-kafka-source-*).
>
>
> Best regards,
> Anastasios Zouzias
>


-- 
-- Anastasios Zouzias



Re: [Structured Streaming] Kafka group.id is fixed

2018-11-19 Thread Tom Graves
 This makes sense to me and was going to propose something similar in order to 
be able to use the kafka acls more effectively as well, can you file a jira for 
it?
Tom
On Friday, November 9, 2018, 2:26:12 AM CST, Anastasios Zouzias 
 wrote:  
 
 Hi all,
I run in the following situation with Spark Structure Streaming (SS) using 
Kafka.
In a project that I work on, there is already a secured Kafka setup where ops 
can issue an SSL certificate per "group.id", which should be predefined (or 
hopefully its prefix to be predefined).
On the other hand, Spark SS fixes the group.id to 
val uniqueGroupId = 
s"spark-kafka-source-${UUID.randomUUID}-${metadataPath.hashCode}"
see, i.e.,

https://github.com/apache/spark/blob/v2.4.0/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L124
I guess Spark developers had a good reason to fix it, but is it possible to 
make configurable the prefix of the above uniqueGroupId ("spark-kafka-source")? 
If so, I could prepare a PR on it.
The rational is that we do not want all spark-jobs to use the same certificate 
on group-ids of the form (spark-kafka-source-*).

Best regards,Anastasios Zouzias  

Re: [Structured Streaming] Kafka group.id is fixed

2018-11-09 Thread Cody Koeninger
That sounds reasonable to me
On Fri, Nov 9, 2018 at 2:26 AM Anastasios Zouzias  wrote:
>
> Hi all,
>
> I run in the following situation with Spark Structure Streaming (SS) using 
> Kafka.
>
> In a project that I work on, there is already a secured Kafka setup where ops 
> can issue an SSL certificate per "group.id", which should be predefined (or 
> hopefully its prefix to be predefined).
>
> On the other hand, Spark SS fixes the group.id to
>
> val uniqueGroupId = 
> s"spark-kafka-source-${UUID.randomUUID}-${metadataPath.hashCode}"
>
> see, i.e.,
>
> https://github.com/apache/spark/blob/v2.4.0/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L124
>
> I guess Spark developers had a good reason to fix it, but is it possible to 
> make configurable the prefix of the above uniqueGroupId 
> ("spark-kafka-source")? If so, I could prepare a PR on it.
>
> The rational is that we do not want all spark-jobs to use the same 
> certificate on group-ids of the form (spark-kafka-source-*).
>
>
> Best regards,
> Anastasios Zouzias

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org