Re: [Structured Streaming] Kafka group.id is fixed
Anastasios it looks like you already identified the two lines that need to change, the string interpolation that depends on UUID.randomUUID and metadataPath.hashCode. I'd factor that out into a function that returns the group id. That function would also need to take the "parameters" variable (the map of user-provided options) and look for a prefix for the group id, defaulting to the current behavior. If you have questions, feel free to ping me on the jira, or get as far as you can and submit a PR for more discussion. On Mon, Nov 19, 2018 at 2:38 PM Anastasios Zouzias wrote: > > Hi Tom, > > I initiated an issue here: https://issues.apache.org/jira/browse/SPARK-26121 > > Feel free to edit/update the ticket. If someone familiar with the codebase > has any suggestion on the proper way of fixing this, I could work on it. > > Best, > Anastasios > > On Mon, Nov 19, 2018 at 4:31 PM Tom Graves wrote: >> >> This makes sense to me and was going to propose something similar in order >> to be able to use the kafka acls more effectively as well, can you file a >> jira for it? >> >> Tom >> >> On Friday, November 9, 2018, 2:26:12 AM CST, Anastasios Zouzias >> wrote: >> >> >> Hi all, >> >> I run in the following situation with Spark Structure Streaming (SS) using >> Kafka. >> >> In a project that I work on, there is already a secured Kafka setup where >> ops can issue an SSL certificate per "group.id", which should be predefined >> (or hopefully its prefix to be predefined). >> >> On the other hand, Spark SS fixes the group.id to >> >> val uniqueGroupId = >> s"spark-kafka-source-${UUID.randomUUID}-${metadataPath.hashCode}" >> >> see, i.e., >> >> https://github.com/apache/spark/blob/v2.4.0/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L124 >> >> I guess Spark developers had a good reason to fix it, but is it possible to >> make configurable the prefix of the above uniqueGroupId >> ("spark-kafka-source")? If so, I could prepare a PR on it. >> >> The rational is that we do not want all spark-jobs to use the same >> certificate on group-ids of the form (spark-kafka-source-*). >> >> >> Best regards, >> Anastasios Zouzias > > > > -- > -- Anastasios Zouzias - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: [Structured Streaming] Kafka group.id is fixed
Hi Tom, I initiated an issue here: https://issues.apache.org/jira/browse/SPARK-26121 Feel free to edit/update the ticket. If someone familiar with the codebase has any suggestion on the proper way of fixing this, I could work on it. Best, Anastasios On Mon, Nov 19, 2018 at 4:31 PM Tom Graves wrote: > This makes sense to me and was going to propose something similar in order > to be able to use the kafka acls more effectively as well, can you file a > jira for it? > > Tom > > On Friday, November 9, 2018, 2:26:12 AM CST, Anastasios Zouzias < > zouz...@gmail.com> wrote: > > > Hi all, > > I run in the following situation with Spark Structure Streaming (SS) using > Kafka. > > In a project that I work on, there is already a secured Kafka setup where > ops can issue an SSL certificate per "group.id", which should be > predefined (or hopefully its prefix to be predefined). > > On the other hand, Spark SS fixes the group.id to > > val uniqueGroupId = s" > spark-kafka-source-${UUID.randomUUID}-${metadataPath.hashCode}" > > see, i.e., > > > https://github.com/apache/spark/blob/v2.4.0/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L124 > > I guess Spark developers had a good reason to fix it, but is it possible > to make configurable the prefix of the above uniqueGroupId > ("spark-kafka-source")? If so, I could prepare a PR on it. > > The rational is that we do not want all spark-jobs to use the same > certificate on group-ids of the form (spark-kafka-source-*). > > > Best regards, > Anastasios Zouzias > -- -- Anastasios Zouzias
Re: [Structured Streaming] Kafka group.id is fixed
This makes sense to me and was going to propose something similar in order to be able to use the kafka acls more effectively as well, can you file a jira for it? Tom On Friday, November 9, 2018, 2:26:12 AM CST, Anastasios Zouzias wrote: Hi all, I run in the following situation with Spark Structure Streaming (SS) using Kafka. In a project that I work on, there is already a secured Kafka setup where ops can issue an SSL certificate per "group.id", which should be predefined (or hopefully its prefix to be predefined). On the other hand, Spark SS fixes the group.id to val uniqueGroupId = s"spark-kafka-source-${UUID.randomUUID}-${metadataPath.hashCode}" see, i.e., https://github.com/apache/spark/blob/v2.4.0/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L124 I guess Spark developers had a good reason to fix it, but is it possible to make configurable the prefix of the above uniqueGroupId ("spark-kafka-source")? If so, I could prepare a PR on it. The rational is that we do not want all spark-jobs to use the same certificate on group-ids of the form (spark-kafka-source-*). Best regards,Anastasios Zouzias
Re: [Structured Streaming] Kafka group.id is fixed
That sounds reasonable to me On Fri, Nov 9, 2018 at 2:26 AM Anastasios Zouzias wrote: > > Hi all, > > I run in the following situation with Spark Structure Streaming (SS) using > Kafka. > > In a project that I work on, there is already a secured Kafka setup where ops > can issue an SSL certificate per "group.id", which should be predefined (or > hopefully its prefix to be predefined). > > On the other hand, Spark SS fixes the group.id to > > val uniqueGroupId = > s"spark-kafka-source-${UUID.randomUUID}-${metadataPath.hashCode}" > > see, i.e., > > https://github.com/apache/spark/blob/v2.4.0/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L124 > > I guess Spark developers had a good reason to fix it, but is it possible to > make configurable the prefix of the above uniqueGroupId > ("spark-kafka-source")? If so, I could prepare a PR on it. > > The rational is that we do not want all spark-jobs to use the same > certificate on group-ids of the form (spark-kafka-source-*). > > > Best regards, > Anastasios Zouzias - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org