Hi Becket, It is a very useful proposal, thanks for driving it. +1. I'd like to ask some questions to make sure I understand your thoughts correctly:
1. "For the batch cases, currently the BulkFormat for DataStream is missing" - true, and there is another option to leverage StreamFormatAdapter[1] 2. "The following two interfaces should probably be marked as Public for now and Deprecated once we deprecate the InputFormat / OutputFormat" - would you like to share some background info of the deprecation of the InputFormat / OutputFormat? It is for me a little bit weird to mark APIs as public that are now known to be deprecated. 3. "Remove the PublicEvolving annotation for the following deprecated classes. It does not make sense for an API to be PublicEvolving and Deprecated at the same time" - this is very common in the Flink code base to have PublicEvolving and Deprecated at the same time. APIs that do not survive the PublicEvolving phase will be marked as deprecated in addition. Removing PublicEvolving in this case will break Flink API graduation rule. Best regards, Jing [1] https://github.com/apache/flink/blob/1d1247d4ae6d4313f7d952c4b2d66351314c9432/flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/impl/StreamFormatAdapter.java#L61 On Thu, Aug 31, 2023 at 4:16 PM Becket Qin <becket....@gmail.com> wrote: > Hi Ryan, thanks for the reply. > > Verifying the component with the schemas you have would be super helpful. > > I think enum is actually a type that is generally useful. Although it is > not a part of ANSI SQL, MySQL and some other databases have this type. > BTW, ENUM_STRING proposed in this FLIP is actually not a type by itself. > The ENUM_STRING is just a syntax sugar which actually creates a "new > AtomicDataType(new VarCharType(Integer.MAX_VALUE), ENUM_CLASS)". So, we > are not really introducing a new type here. However, in order to make the > VARCHAR to ENUM conversion work, the ENUM class has to be considered as a > ConversionClass of the VARCHAR type, and a StringToEnum converter is > required. > > And yes, AvroSchemaUtils should be annotated as @PublicEvolving. > > Thanks, > > Jiangjie (Becket) Qin > > > > On Thu, Aug 31, 2023 at 5:22 PM Ryan Skraba <ryan.skr...@aiven.io.invalid> > wrote: > > > Hey -- I have a certain knowledge of Avro, and I'd be willing to help > > out with some of these enhancements, writing tests and reviewing. I > > have a *lot* of Avro schemas available for validation! > > > > The FLIP looks pretty good and covers the possible cases pretty > > rigorously. I wasn't aware of some of the gaps you've pointed out > > here! > > > > How useful do you think the new ENUM_STRING DataType would be outside > > of the Avro use case? It seems like a good enough addition that would > > solve the problem here. > > > > A small note: I assume the AvroSchemaUtils is meant to be annotated > > @PublicEvolving as well. > > > > All my best, Ryan > > > > > > On Tue, Aug 29, 2023 at 4:35 AM Becket Qin <becket....@gmail.com> wrote: > > > > > > Hi folks, > > > > > > I would like to start the discussion about FLIP-158[1] which proposes > to > > > clean up and enhance the Avro support in Flink. More specifically, it > > > proposes to: > > > > > > 1. Make it clear what are the public APIs in flink-avro components. > > > 2. Fix a few buggy cases in flink-avro > > > 3. Add more supported Avro use cases out of the box. > > > > > > Feedbacks are welcome! > > > > > > Thanks > > > > > > Jiangjie (Becket) Qin > > > > > > [1] > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-358%3A+flink-avro+enhancement+and+cleanup > > >