Re: [Format] Pointer types / span types

Brian Hulette Wed, 02 May 2018 14:03:56 -0700

List also references another (data) array which can be a different size,but rather than requiring it to be represented with a second schema, wemake it a child of the List type. We could do the same thing for a Spantype, and give it a new type of buffer that contains start/stop indicesrather than offsets.

To Antoine's point, maybe there's not enough demand to justify definingthis type right now. I definitely agree that it would be good to see anexample dataset before adding something like this.


Brian

On 05/02/2018 03:54 PM, Wes McKinney wrote:

Perhaps that could be an argument for making span a core logical type?

I think if anything, this argues that it should not be. Because "span"
references another array, which can be a different size, you need two
schemas to be able to make sense of it.

In either case, I would be interested to see what modifications would
be proposed to Schema.fbs and an example dataset described with such a
schema (that is a single array, instead of multiple -- i.e. a
non-composite representation).

For the record, if there are sufficiently common "composite" data
representations, I don't see a problem with developing community
standards based on the building blocks we already have

- Wes

On Wed, May 2, 2018 at 3:38 PM, Brian Hulette <brian.hule...@ccri.com> wrote:

If this were accomplished at the application level, how would it work with
the IPC formats? I'd think you'd need to have two separate files (or
streams), since array 1 and array 2 will be different lengths. Perhaps that
could be an argument for making span a core logical type?

Brian



On 05/02/2018 03:34 PM, Antoine Pitrou wrote:

On Wed, 2 May 2018 10:12:37 -0400
Wes McKinney <wesmck...@gmail.com> wrote:

It sounds like the "span" type could be implemented as a composite of
multiple Arrow arrays / schemas:

array 1 (data)
any schema

array 2 (view)
struct <
    start: int64,
    stop: int64

Unless I'm missing something, this feels like an application-level
concern rather than something that needs to be addressed in the
columnar format / metadata.

Well, couldn't the same theoretically be said about list arrays?
In the end, I suppose it all depends whether there's enough demand to
make it a core logical type inside Arrow, rather than something people
write custom code for in their application.

Regards

Antoine.

Re: [Format] Pointer types / span types

Reply via email to