[jira] [Commented] (SPARK-26314) support Confluent encoded Avro in Spark Structured Streaming

2023-03-02 Thread Gustavo Martin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-26314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695828#comment-17695828
 ] 

Gustavo Martin commented on SPARK-26314:


My team just stumbled upon this problem :(

I was hoping Spark would be making use of the AVRO capabilities for finding the 
right schema associated with some event when using a Schema Registy.

 

> support Confluent encoded Avro in Spark Structured Streaming
> 
>
> Key: SPARK-26314
> URL: https://issues.apache.org/jira/browse/SPARK-26314
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: David Ahern
>Priority: Major
>
> As Avro has now been added as a first class citizen,
> [https://spark.apache.org/docs/latest/sql-data-sources-avro.html]
> please make Confluent encoded avro work out of the box with Spark Structured 
> Streaming
> as described in this link, Avro messages on Kafka encoded with confluent 
> serializer also need to be decoded with confluent.  It would be great if this 
> worked out of the box
> [https://developer.ibm.com/answers/questions/321440/ibm-iidr-cdc-db2-to-kafka.html?smartspace=blockchain]
> here are details on the Confluent encoding
> [https://www.sderosiaux.com/articles/2017/03/02/serializing-data-efficiently-with-apache-avro-and-dealing-with-a-schema-registry/#encodingdecoding-the-messages-with-the-schema-id]
> It's been a year since i worked on anything to do with Avro and Spark 
> Structured Streaming, but i had to take an approach such as this when getting 
> it to work.  This is what i  used as a reference at that time
> [https://github.com/tubular/confluent-spark-avro]
> Also, here is another link i found that someone has done in the meantime
> [https://github.com/AbsaOSS/ABRiS]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26314) support Confluent encoded Avro in Spark Structured Streaming

2018-12-15 Thread Jordan Moore (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722326#comment-16722326
 ] 

Jordan Moore commented on SPARK-26314:
--

It is powered by, yes. But I say it became purely based on the issue creator, 
who is now the CEO for the company that maintains and distributes the source 
for the more popularized Avro Schema Registry when using Kafka. 

Putting that aside, how about we meet in the middle -- Append a section to the 
official Spark-Avro documentation for how to build said UDF or Encoder/Decoder 
for other forms of -Avro- binary serialization? (note: we're no longer talking 
about just Avro anymore, since as mentioned in the Structured Streaming Kafka 
docs, the key and value are always returned as binary)

The {{cast(value) as STRING}}, and {{from_json}} functions seem to be 
well-documented from what I have seen, but how to actually implement those 
functions for other non-string, non-primative datatypes seems to be missing. To 
quote the documentation "Use DataFrame operations to explicitly deserialize the 
keys". Personally, I don't know what those are, or where to immediately find 
those. Does one then go to the SparkSQL page or the Java/Scala/Python docs for 
the DataFrame object and get flooded with functions that really don't do what's 
really needed, which is deserializing a simple message? 

Let's go though a scenario where, as a developer, all I know is that I have 
Avro data in Kafka. In this scenario, I am new to Spark, maybe new to Kafka as 
well, and let's say I do not know if I am using "Confluent encoded Avro" or 
not. 
Naturally, one might go to the [SparkSQL > 
DataSources|https://spark.apache.org/docs/2.4.0/sql-data-sources.html] docs. 
Immediately, confused, we see that says "Avro *Files*"; but we are using Spark 
Structured Streaming with Kafka, and that doesn't sound like what is needed. In 
any case, we venture on anyway to find the link with to/from Avro functions 
listed. To our surprise, it shows Kafka examples! We copy-paste some code into 
our application, then are confused when/if we either get an error, or just null 
returned to us because the deserializer that expected an Avro schema as part of 
the message failed. Reading up and down this Avro SparkSQL page, I see nothing 
why that might be, and so we venture off into the depths of Google, 
StackOverflow, JIRA, Github issues, etc. still not really understanding what 
the root problem is -- that our Avro is really encoded differently. 

I hope I've highlighted the issue here - make it less error-prone and more 
beginner friendly rather than losing community trust that the product will 
solve their use-cases.

> support Confluent encoded Avro in Spark Structured Streaming
> 
>
> Key: SPARK-26314
> URL: https://issues.apache.org/jira/browse/SPARK-26314
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: David Ahern
>Priority: Major
>
> As Avro has now been added as a first class citizen,
> [https://spark.apache.org/docs/latest/sql-data-sources-avro.html]
> please make Confluent encoded avro work out of the box with Spark Structured 
> Streaming
> as described in this link, Avro messages on Kafka encoded with confluent 
> serializer also need to be decoded with confluent.  It would be great if this 
> worked out of the box
> [https://developer.ibm.com/answers/questions/321440/ibm-iidr-cdc-db2-to-kafka.html?smartspace=blockchain]
> here are details on the Confluent encoding
> [https://www.sderosiaux.com/articles/2017/03/02/serializing-data-efficiently-with-apache-avro-and-dealing-with-a-schema-registry/#encodingdecoding-the-messages-with-the-schema-id]
> It's been a year since i worked on anything to do with Avro and Spark 
> Structured Streaming, but i had to take an approach such as this when getting 
> it to work.  This is what i  used as a reference at that time
> [https://github.com/tubular/confluent-spark-avro]
> Also, here is another link i found that someone has done in the meantime
> [https://github.com/AbsaOSS/ABRiS]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26314) support Confluent encoded Avro in Spark Structured Streaming

2018-12-15 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722322#comment-16722322
 ] 

Dongjoon Hyun commented on SPARK-26314:
---

Sorry, but I totally disagree from the first point.

bq.  AVRO-1124 effectively became the Confluent Schema Registry

We are able to say in the opposite direction like `Confluent Schema Registry` 
is powered by Apache Avro.

For the second point, `--packages` allows any libraries. Again, Apache Spark's 
built-in Avro library is also working in the same way and there was no change 
from the previous Apache Spark behavior. If a company's product has some 
problems before and now, the company had better fix the problems inside its 
libraries instead of the Apache Spark community.

My suggestion is to file as a general Apache Spark issue if there is specific 
features or bugs on UDF or Encoder/Decoder in general.

> support Confluent encoded Avro in Spark Structured Streaming
> 
>
> Key: SPARK-26314
> URL: https://issues.apache.org/jira/browse/SPARK-26314
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: David Ahern
>Priority: Major
>
> As Avro has now been added as a first class citizen,
> [https://spark.apache.org/docs/latest/sql-data-sources-avro.html]
> please make Confluent encoded avro work out of the box with Spark Structured 
> Streaming
> as described in this link, Avro messages on Kafka encoded with confluent 
> serializer also need to be decoded with confluent.  It would be great if this 
> worked out of the box
> [https://developer.ibm.com/answers/questions/321440/ibm-iidr-cdc-db2-to-kafka.html?smartspace=blockchain]
> here are details on the Confluent encoding
> [https://www.sderosiaux.com/articles/2017/03/02/serializing-data-efficiently-with-apache-avro-and-dealing-with-a-schema-registry/#encodingdecoding-the-messages-with-the-schema-id]
> It's been a year since i worked on anything to do with Avro and Spark 
> Structured Streaming, but i had to take an approach such as this when getting 
> it to work.  This is what i  used as a reference at that time
> [https://github.com/tubular/confluent-spark-avro]
> Also, here is another link i found that someone has done in the meantime
> [https://github.com/AbsaOSS/ABRiS]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26314) support Confluent encoded Avro in Spark Structured Streaming

2018-12-15 Thread Jordan Moore (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722314#comment-16722314
 ] 

Jordan Moore commented on SPARK-26314:
--

[~dongjoon], I however feel like AVRO-1124 won't have any traction. Even Avro 
releases themselves have been non-existent over the last few years. My point 
was that AVRO-1124 effectively _became_ the Confluent Schema Registry, with 
Jay, being CEO of Confluent. 

My second point, while I understand Apache projects are maintained by varying 
developers, each with their own directions on the projects, those other 
projects _do support_ it. You're effectively losing a large portion of 
developers by not having it be easily available as a Spark offered library, or 
at the very, very least shown how to integrate it in the Spark documentation. 

Simply adding --packages is not enough, as it still requires a SparkSQL UDF, or 
Encoder/Decoder to wrap Confluent's KafkaAvroSerializer classes. 


> support Confluent encoded Avro in Spark Structured Streaming
> 
>
> Key: SPARK-26314
> URL: https://issues.apache.org/jira/browse/SPARK-26314
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: David Ahern
>Priority: Major
>
> As Avro has now been added as a first class citizen,
> [https://spark.apache.org/docs/latest/sql-data-sources-avro.html]
> please make Confluent encoded avro work out of the box with Spark Structured 
> Streaming
> as described in this link, Avro messages on Kafka encoded with confluent 
> serializer also need to be decoded with confluent.  It would be great if this 
> worked out of the box
> [https://developer.ibm.com/answers/questions/321440/ibm-iidr-cdc-db2-to-kafka.html?smartspace=blockchain]
> here are details on the Confluent encoding
> [https://www.sderosiaux.com/articles/2017/03/02/serializing-data-efficiently-with-apache-avro-and-dealing-with-a-schema-registry/#encodingdecoding-the-messages-with-the-schema-id]
> It's been a year since i worked on anything to do with Avro and Spark 
> Structured Streaming, but i had to take an approach such as this when getting 
> it to work.  This is what i  used as a reference at that time
> [https://github.com/tubular/confluent-spark-avro]
> Also, here is another link i found that someone has done in the meantime
> [https://github.com/AbsaOSS/ABRiS]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26314) support Confluent encoded Avro in Spark Structured Streaming

2018-12-15 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722310#comment-16722310
 ] 

Dongjoon Hyun commented on SPARK-26314:
---

[~cricket007]. Thank you for mentioning AVRO-1124 and the related stuffs. :) 
So, I believe Apache Spark will automatically support what you 
requested(`Confluent encoded avro`) if AVRO-1024 and the related sutffs are 
resolved and released as a part of Apache Avro.

> support Confluent encoded Avro in Spark Structured Streaming
> 
>
> Key: SPARK-26314
> URL: https://issues.apache.org/jira/browse/SPARK-26314
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: David Ahern
>Priority: Major
>
> As Avro has now been added as a first class citizen,
> [https://spark.apache.org/docs/latest/sql-data-sources-avro.html]
> please make Confluent encoded avro work out of the box with Spark Structured 
> Streaming
> as described in this link, Avro messages on Kafka encoded with confluent 
> serializer also need to be decoded with confluent.  It would be great if this 
> worked out of the box
> [https://developer.ibm.com/answers/questions/321440/ibm-iidr-cdc-db2-to-kafka.html?smartspace=blockchain]
> here are details on the Confluent encoding
> [https://www.sderosiaux.com/articles/2017/03/02/serializing-data-efficiently-with-apache-avro-and-dealing-with-a-schema-registry/#encodingdecoding-the-messages-with-the-schema-id]
> It's been a year since i worked on anything to do with Avro and Spark 
> Structured Streaming, but i had to take an approach such as this when getting 
> it to work.  This is what i  used as a reference at that time
> [https://github.com/tubular/confluent-spark-avro]
> Also, here is another link i found that someone has done in the meantime
> [https://github.com/AbsaOSS/ABRiS]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26314) support Confluent encoded Avro in Spark Structured Streaming

2018-12-15 Thread Jordan Moore (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722240#comment-16722240
 ] 

Jordan Moore commented on SPARK-26314:
--

Hi [~dongjoon], 

I'm sure you are speaking for the Spark Community, and perhaps think that 
"Confluent Platform" is a specific, customized version of Kafka and that the 
Confluent Schema Registry isn't using Apache Avro? Neither are true. 

The Schema Registry is an HTTP service that sits on the side of Kafka (in 
Confluent's implementation), and acts as a type of Apache Avro key-value 
database. And, as a product, branched out from Jay Krep's initial proposal at 
https://issues.apache.org/jira/browse/AVRO-1124 
>From a streaming perspective, it is more performant because every message is 
>not required to contain the full Avro schema, only a reference to an external 
>source. Therefore, I think the proposal here is to open up or extend the Avro 
>encoders in Spark to allow for these other ways to get data in that format.

Why can't Spark add support for it like other projects in this streaming space? 
When others see these projects, they may feel Spark is behind, in a way.
- Flink - 
https://github.com/apache/flink/tree/master/flink-formats/flink-avro-confluent-registry
- NiFi - 
https://github.com/apache/nifi/tree/master/nifi-nar-bundles/nifi-confluent-platform-bundle
- Streamsets - 
https://streamsets.com/blog/evolving-avro-schemas-apache-kafka-streamsets-data-collector/

Other than Hortonworks and HDInsights-derived products using their own 
Registry, I feel like the Confluent Schema Registry is the only other similar 
product out there for a well-known Avro on Kafka implementation. 

Otherwise, we are seeing these other Github issues/repos and StackOverflow 
questions pop-up where people are are basically baking in the Confluent 
SchemaRegistry HTTP client Java class themselves, fetching a schema, then 
parsing in Spark, and continuing on their way using Avro, as normal. However, 
as I mentioned above with storing the whole Schema as part of every message, 
this is not as performant as it could be, as the Kafka serializers & 
deserializers within Spark shouldn't need to care if the schema is part of the 
message content, or not. 

Thoughts?

> support Confluent encoded Avro in Spark Structured Streaming
> 
>
> Key: SPARK-26314
> URL: https://issues.apache.org/jira/browse/SPARK-26314
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: David Ahern
>Priority: Major
>
> As Avro has now been added as a first class citizen,
> [https://spark.apache.org/docs/latest/sql-data-sources-avro.html]
> please make Confluent encoded avro work out of the box with Spark Structured 
> Streaming
> as described in this link, Avro messages on Kafka encoded with confluent 
> serializer also need to be decoded with confluent.  It would be great if this 
> worked out of the box
> [https://developer.ibm.com/answers/questions/321440/ibm-iidr-cdc-db2-to-kafka.html?smartspace=blockchain]
> here are details on the Confluent encoding
> [https://www.sderosiaux.com/articles/2017/03/02/serializing-data-efficiently-with-apache-avro-and-dealing-with-a-schema-registry/#encodingdecoding-the-messages-with-the-schema-id]
> It's been a year since i worked on anything to do with Avro and Spark 
> Structured Streaming, but i had to take an approach such as this when getting 
> it to work.  This is what i  used as a reference at that time
> [https://github.com/tubular/confluent-spark-avro]
> Also, here is another link i found that someone has done in the meantime
> [https://github.com/AbsaOSS/ABRiS]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26314) support Confluent encoded Avro in Spark Structured Streaming

2018-12-09 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714133#comment-16714133
 ] 

Dongjoon Hyun commented on SPARK-26314:
---

One more thing: Apache Spark supports Apache Kafka, not a single vendor's Kafka 
distribution, too. That's the simplest way to support all Apache Kafka powered 
distributions.

> support Confluent encoded Avro in Spark Structured Streaming
> 
>
> Key: SPARK-26314
> URL: https://issues.apache.org/jira/browse/SPARK-26314
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: David Ahern
>Priority: Major
>
> As Avro has now been added as a first class citizen,
> [https://spark.apache.org/docs/latest/sql-data-sources-avro.html]
> please make Confluent encoded avro work out of the box with Spark Structured 
> Streaming
> as described in this link, Avro messages on Kafka encoded with confluent 
> serializer also need to be decoded with confluent.  It would be great if this 
> worked out of the box
> [https://developer.ibm.com/answers/questions/321440/ibm-iidr-cdc-db2-to-kafka.html?smartspace=blockchain]
> here are details on the Confluent encoding
> [https://www.sderosiaux.com/articles/2017/03/02/serializing-data-efficiently-with-apache-avro-and-dealing-with-a-schema-registry/#encodingdecoding-the-messages-with-the-schema-id]
> It's been a year since i worked on anything to do with Avro and Spark 
> Structured Streaming, but i had to take an approach such as this when getting 
> it to work.  This is what i  used as a reference at that time
> [https://github.com/tubular/confluent-spark-avro]
> Also, here is another link i found that someone has done in the meantime
> [https://github.com/AbsaOSS/ABRiS]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26314) support Confluent encoded Avro in Spark Structured Streaming

2018-12-09 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714131#comment-16714131
 ] 

Dongjoon Hyun commented on SPARK-26314:
---

Hi, [~davidahern]. 
Apache Spark supports Apache Avro. I don't see any specific reason why Spark 
should support a specific vendor's library.
If you are looking at the release note, Spark supports Avro with `--packages` 
which is the same way to support 3rd party libraries.
I'm not sure what is the problem of Confluent encoded Avro. But, in general, if 
a 3rd party library has any problem out of the box, it's the problem of that 
library.
The creator should fix and release that in their repository.

> support Confluent encoded Avro in Spark Structured Streaming
> 
>
> Key: SPARK-26314
> URL: https://issues.apache.org/jira/browse/SPARK-26314
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: David Ahern
>Priority: Major
>
> As Avro has now been added as a first class citizen,
> [https://spark.apache.org/docs/latest/sql-data-sources-avro.html]
> please make Confluent encoded avro work out of the box with Spark Structured 
> Streaming
> as described in this link, Avro messages on Kafka encoded with confluent 
> serializer also need to be decoded with confluent.  It would be great if this 
> worked out of the box
> [https://developer.ibm.com/answers/questions/321440/ibm-iidr-cdc-db2-to-kafka.html?smartspace=blockchain]
> here are details on the Confluent encoding
> [https://www.sderosiaux.com/articles/2017/03/02/serializing-data-efficiently-with-apache-avro-and-dealing-with-a-schema-registry/#encodingdecoding-the-messages-with-the-schema-id]
> It's been a year since i worked on anything to do with Avro and Spark 
> Structured Streaming, but i had to take an approach such as this when getting 
> it to work.  This is what i  used as a reference at that time
> [https://github.com/tubular/confluent-spark-avro]
> Also, here is another link i found that someone has done in the meantime
> [https://github.com/AbsaOSS/ABRiS]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26314) support Confluent encoded Avro in Spark Structured Streaming

2018-12-09 Thread David Ahern (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713986#comment-16713986
 ] 

David Ahern commented on SPARK-26314:
-

https://docs.confluent.io/current/schema-registry/docs/serializer-formatter.html

> support Confluent encoded Avro in Spark Structured Streaming
> 
>
> Key: SPARK-26314
> URL: https://issues.apache.org/jira/browse/SPARK-26314
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: David Ahern
>Priority: Major
>
> As Avro has now been added as a first class citizen,
> [https://spark.apache.org/docs/latest/sql-data-sources-avro.html]
> please make Confluent encoded avro work out of the box with Spark Structured 
> Streaming
> as described in this link, Avro messages on Kafka encoded with confluent 
> serializer also need to be decoded with confluent.  It would be great if this 
> worked out of the box
> [https://developer.ibm.com/answers/questions/321440/ibm-iidr-cdc-db2-to-kafka.html?smartspace=blockchain]
> here are details on the Confluent encoding
> [https://www.sderosiaux.com/articles/2017/03/02/serializing-data-efficiently-with-apache-avro-and-dealing-with-a-schema-registry/#encodingdecoding-the-messages-with-the-schema-id]
> It's been a year since i worked on anything to do with Avro and Spark 
> Structured Streaming, but i had to take an approach such as this when getting 
> it to work.  This is what i  used as a reference at that time
> [https://github.com/tubular/confluent-spark-avro]
> Also, here is another link i found that someone has done in the meantime
> [https://github.com/AbsaOSS/ABRiS]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26314) support Confluent encoded Avro in Spark Structured Streaming

2018-12-09 Thread David Ahern (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713985#comment-16713985
 ] 

David Ahern commented on SPARK-26314:
-

https://stackoverflow.com/questions/48882723/integrating-spark-structured-streaming-with-the-kafka-schema-registry

> support Confluent encoded Avro in Spark Structured Streaming
> 
>
> Key: SPARK-26314
> URL: https://issues.apache.org/jira/browse/SPARK-26314
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: David Ahern
>Priority: Major
>
> As Avro has now been added as a first class citizen,
> [https://spark.apache.org/docs/latest/sql-data-sources-avro.html]
> please make Confluent encoded avro work out of the box with Spark Structured 
> Streaming
> as described in this link, Avro messages on Kafka encoded with confluent 
> serializer also need to be decoded with confluent.  It would be great if this 
> worked out of the box
> [https://developer.ibm.com/answers/questions/321440/ibm-iidr-cdc-db2-to-kafka.html?smartspace=blockchain]
> here are details on the Confluent encoding
> [https://www.sderosiaux.com/articles/2017/03/02/serializing-data-efficiently-with-apache-avro-and-dealing-with-a-schema-registry/#encodingdecoding-the-messages-with-the-schema-id]
> It's been a year since i worked on anything to do with Avro and Spark 
> Structured Streaming, but i had to take an approach such as this when getting 
> it to work.  This is what i  used as a reference at that time
> [https://github.com/tubular/confluent-spark-avro]
> Also, here is another link i found that someone has done in the meantime
> [https://github.com/AbsaOSS/ABRiS]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org