Hi Peter!

I think this is a good additional serialization utility to Flink that may
benefit different data formats / connectors in the future.

+1

Cheers,
Gyula

On Mon, Jan 22, 2024 at 8:04 PM Steven Wu <stevenz...@gmail.com> wrote:

> I think this is a reasonable extension to `DataOutputSerializer`. Although
> 64 KB is not small, it is still possible to have long strings over that
> limit. There are already precedents of extended APIs
> `DataOutputSerializer`. E.g.
>
> public void setPosition(int position) {
>     Preconditions.checkArgument(
>             position >= 0 && position <= this.position, "Position out
> of bounds.");
>     this.position = position;
> }
>
> public void setPositionUnsafe(int position) {
>     this.position = position;
> }
>
>
> On Fri, Jan 19, 2024 at 2:51 AM Péter Váry <peter.vary.apa...@gmail.com>
> wrote:
>
> > Hi Team,
> >
> > During the root cause analysis of an Iceberg serialization issue [1], we
> > have found that *DataOutputSerializer.writeUTF* has a hard limit on the
> > length of the string (64k). This is inherited from the
> > *DataOutput.writeUTF*
> > method, where the JDK specifically defines this limit [2].
> >
> > For our use-case we need to enable the possibility to serialize longer
> UTF
> > strings, so we will need to define a *writeLongUTF* method with a similar
> > specification than the *writeUTF*, but without the length limit.
> >
> > My question is:
> > - Is it something which would be useful for every Flink user? Shall we
> add
> > this method to *DataOutputSerializer*?
> > - Is it very specific for Iceberg, and we should keep it in Iceberg
> > connector code?
> >
> > Thanks,
> > Peter
> >
> > [1] - https://github.com/apache/iceberg/issues/9410
> > [2] -
> >
> >
> https://docs.oracle.com/javase/8/docs/api/java/io/DataOutput.html#writeUTF-java.lang.String-
> >
>

Reply via email to