Hi Wes,
Like I said I think this approach looks good, I think what I'm looking for
is a little more documentation/examples on how additional types would be
handled.  I think Tensor would be a good example, we also had questions
about INET addresses previously, maybe this would be a another good
illustrative example.  Providing examples of serialized metadata in the
docs would be useful (clarifying that these are opaque binary blobs, that
will be passed along to extension type factories?)

In this regard, I think it might be good to provide a further
recommendations for the name of extension types:  What do you think about
recommend organization/projects namespace them to according to some
convention, so that there aren't conflicts and extensions can be shared?

Thanks,
Micah



On Sat, May 18, 2019 at 12:00 PM Wes McKinney <wesmck...@gmail.com> wrote:

>
>
> On Sat, May 18, 2019, 1:58 PM Wes McKinney <wesmck...@gmail.com> wrote:
>
>> Hi Micah,
>>
>> The use cases I'm aware of are mostly coming from proprietary
>> applications. My idea was for the extension metadata to be as unobtrusive
>> as possible. The only alternative as I see it would be to have an Extension
>> value in the Type union which would be more intrusive to applications
>> handling data for which they have no special handling. That doesn't seem
>> desirable if there are alternatives.
>>
>
> The other (3rd) option would be to add an extra member to Field. This is
> also a bit more intrusive than having fields in the custom_metadata
> dictionary.
>
>
>> As an immediate use case we could use extension types to embed Tensor
>> values in Binary arrays.
>>
>> Wes
>>
>> On Sat, May 18, 2019, 12:19 PM Micah Kornfield <emkornfi...@gmail.com>
>> wrote:
>>
>>> Hi Wes,
>>> This approach seems reasonable to me.  I'm a little concerned we haven't
>>> validated many use-cases against the approach (but I don't see any
>>> obvious
>>> flaws).
>>>
>>> Thanks,
>>> Micah
>>>
>>> On Fri, May 17, 2019 at 5:16 AM Wes McKinney <wesmck...@gmail.com>
>>> wrote:
>>>
>>> > As Micah brought up, as part of this we would like to formalize the
>>> > use of "ARROW:" as a reserved metadata key prefix. This is similar to
>>> > Apache Avro which uses "avro." as a reserved prefix [1]. If someone
>>> > has a different idea about what the prefix should be I'm open to other
>>> > ideas
>>> >
>>> > [1] :
>>> https://avro.apache.org/docs/1.8.2/spec.html#Object+Container+Files
>>> >
>>> > On Thu, May 16, 2019 at 7:29 PM Wes McKinney <wesmck...@gmail.com>
>>> wrote:
>>> > >
>>> > > hi folks,
>>> > >
>>> > > In a prior mailing list thread from February [1] I brought up some
>>> > > work I'd done in C++ to create an API to define custom data types
>>> that
>>> > > can be embedded in built-in Arrow logical types. These are serialized
>>> > > through IPC by adding special fields to the `custom_metadata` member
>>> > > of Field in the Flatbuffers metadata [2]. The idea is that if an
>>> > > implementation does not understand the custom type, then they can
>>> > > still interact with the underlying data if need be, or pass on the
>>> > > extension metadata in subsequent IPC messages.
>>> > >
>>> > > David Li has put up a WIP PR to implement this for Java [4], so to
>>> > > help the project move forward I think it's a good time to formalize
>>> > > this, and if there are disagreements to hash them out now. I have
>>> just
>>> > > opened a PR to the Arrow specification documents [3] that describes
>>> > > the current state of C++ and also the WIP Java PR.
>>> > >
>>> > > Any thought about this? If there is consensus about this solution
>>> > > approach then I can hold a vote.
>>> > >
>>> > > Thanks
>>> > > Wes
>>> > >
>>> > > [1]:
>>> >
>>> https://lists.apache.org/thread.html/f1fc039471a8a9c06f2f9600296a20d4eb3fda379b23685f809118ee@%3Cdev.arrow.apache.org%3E
>>> > > [2]:
>>> https://github.com/apache/arrow/blob/master/format/Schema.fbs#L291
>>> > > [3]: https://github.com/apache/arrow/pull/4332
>>> > > [4]: https://github.com/apache/arrow/pull/4251
>>> >
>>>
>>

Reply via email to