Re: Bulk storage of protobuf records in files

2023-06-05 Thread Shammon FY
Hi Ryan,

What I usually encounter is writing Protobuf format data to systems such as
Kafka, and I have never encountered writing to a file yet.

Best,
Shammon FY


On Mon, Jun 5, 2023 at 10:50 PM Martijn Visser 
wrote:

> Hey Ryan,
>
> I've never encountered a use case for writing Protobuf encoded files to a
> filesystem.
>
> Best regards,
>
> Martijn
>
> On Fri, May 26, 2023 at 6:39 PM Ryan Skraba via user <
> user@flink.apache.org> wrote:
>
>> Hello all!
>>
>> I discovered while investigating FLINK-32008[1] that we can write to the
>> filesystem connector with the protobuf format, but today, the resulting
>> file is pretty unlikely to be useful or rereadable.
>>
>> There's no real standard for storing many protobuf messages in a single
>> file container, although the documentation mentions writing size-delimited
>> messages sequentially[2].  In practice, I've never encountered protobuf
>> binaries stored on filesystems without using some other sort of "framing"
>> (like how parquet can be accessed with either an Avro or a protobuf
>> oriented API).
>>
>> Does anyone have any use cases for bulk storage of protobuf messages on a
>> filesystem?  Should these files just be considered temporary storage for
>> Flink jobs, or do they need to be compatible with other systems?  Is there
>> a splittable / compressable file format?
>>
>> The alternative might be to just forbid file storage for protobuf
>> messages!  Any opinions?
>>
>> All my best, Ryan Skraba
>>
>> [1]: https://issues.apache.org/jira/browse/FLINK-32008
>> [2]: https://protobuf.dev/programming-guides/techniques/#streaming
>>
>


Re: Bulk storage of protobuf records in files

2023-06-05 Thread Martijn Visser
Hey Ryan,

I've never encountered a use case for writing Protobuf encoded files to a
filesystem.

Best regards,

Martijn

On Fri, May 26, 2023 at 6:39 PM Ryan Skraba via user 
wrote:

> Hello all!
>
> I discovered while investigating FLINK-32008[1] that we can write to the
> filesystem connector with the protobuf format, but today, the resulting
> file is pretty unlikely to be useful or rereadable.
>
> There's no real standard for storing many protobuf messages in a single
> file container, although the documentation mentions writing size-delimited
> messages sequentially[2].  In practice, I've never encountered protobuf
> binaries stored on filesystems without using some other sort of "framing"
> (like how parquet can be accessed with either an Avro or a protobuf
> oriented API).
>
> Does anyone have any use cases for bulk storage of protobuf messages on a
> filesystem?  Should these files just be considered temporary storage for
> Flink jobs, or do they need to be compatible with other systems?  Is there
> a splittable / compressable file format?
>
> The alternative might be to just forbid file storage for protobuf
> messages!  Any opinions?
>
> All my best, Ryan Skraba
>
> [1]: https://issues.apache.org/jira/browse/FLINK-32008
> [2]: https://protobuf.dev/programming-guides/techniques/#streaming
>