Hi Peter! I think this is a good additional serialization utility to Flink that may benefit different data formats / connectors in the future.
+1 Cheers, Gyula On Mon, Jan 22, 2024 at 8:04 PM Steven Wu <stevenz...@gmail.com> wrote: > I think this is a reasonable extension to `DataOutputSerializer`. Although > 64 KB is not small, it is still possible to have long strings over that > limit. There are already precedents of extended APIs > `DataOutputSerializer`. E.g. > > public void setPosition(int position) { > Preconditions.checkArgument( > position >= 0 && position <= this.position, "Position out > of bounds."); > this.position = position; > } > > public void setPositionUnsafe(int position) { > this.position = position; > } > > > On Fri, Jan 19, 2024 at 2:51 AM Péter Váry <peter.vary.apa...@gmail.com> > wrote: > > > Hi Team, > > > > During the root cause analysis of an Iceberg serialization issue [1], we > > have found that *DataOutputSerializer.writeUTF* has a hard limit on the > > length of the string (64k). This is inherited from the > > *DataOutput.writeUTF* > > method, where the JDK specifically defines this limit [2]. > > > > For our use-case we need to enable the possibility to serialize longer > UTF > > strings, so we will need to define a *writeLongUTF* method with a similar > > specification than the *writeUTF*, but without the length limit. > > > > My question is: > > - Is it something which would be useful for every Flink user? Shall we > add > > this method to *DataOutputSerializer*? > > - Is it very specific for Iceberg, and we should keep it in Iceberg > > connector code? > > > > Thanks, > > Peter > > > > [1] - https://github.com/apache/iceberg/issues/9410 > > [2] - > > > > > https://docs.oracle.com/javase/8/docs/api/java/io/DataOutput.html#writeUTF-java.lang.String- > > >