Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

Phillip Cloud Tue, 17 Aug 2021 11:14:25 -0700

On Tue, Aug 17, 2021 at 10:56 AM Wes McKinney <[email protected]> wrote:


> Looking at Ben's alternate PR [1], having an IR that leans heavily on
> memory references to an out-of-band data sidecar seems like an
> approach that would substantially ratchet up the implementation
> complexity as producing the IR would then have the level of complexity
> of producing the Arrow IPC format — when producing the "root" Plan
> message, you must accumulate a list of the dependent serialized
> submessages, placing the appropriate Buffer memory offset in the Plan
> message, like we do when producing the RecordBatch.buffers field. This
> seems complicated to me as you must devise a custom binary protocol to
> concatenate the serialized Plan and the messages it depends on into a
> single binary payload
>
> <ROOT PLAN>
> <padding>
> <Buffer 0>
> <padding>
> <Buffer 1>
> <padding>
> ...
> <Buffer N - 1>
>
> (one purpose of FlatBufferBuilder is to spare you having to do this
> yourself — some reasons we do it for the Arrow IPC format is because
> appending Arrow memory buffers directly to a FlatBufferBuilder would
> be inefficient — internal realloc calls — and Flatbuffers are limited
> to 2GB. Neither of these things are problems here)
>
> In general, I believe the output created by an IR producer should be a
> single serialized object without any out-of-band data sidecar — this
> is much simpler for implementers and we can still provide an "escape
> hatch" for user-defined operators and functions where the required
> function/operator is passed opaquely as an embedded binary data.



The serialization format (whether it is Flatbuffers or JSON, or
> something else) should allow for data memoization, so if there is a
> user-defined operator/function, or a relation that is used multiple
> times throughout the query (potentially with a large schema), then we
> should ensure that the data need not be duplicated in the
> serialization format unnecessarily — in Flatbuffers, you can achieve
> this by reusing offsets, but we could devise the data structures to
> make the memoization of "expensive" objects more explicit.
>

I think this is something that would need to be explicitly encoded in
the structures themselves if it's a hard requirement. I don't think this
should block
a prototype producer/consumer.

Is there something in the second PR/design that precludes the reuse of
offsets?
To my eye, the flatbuffers offset reuse mechanism works just as well there.


> I additionally think that it is important to provide as much built-in
> support for "canonical" operators/functions (such as the ones
> implemented commonly by SQL engines) as possible, and to liberally
> expand the catalogue of "built-in" capabilities. I would still prefer
> to have large unions/enums of built-in operators/functions and to
> expand those unions/enums to accommodate new things when it is
> demonstrated that there is a need to standardize things between
> producers/consumers.
>

I think there's a middle ground where we add a bit of structure (something
like
a descriptor from the first PR) to indicate whether a thing is built-in vs
user-defined.
It looks like Ben has pushed something like this to his PR.

With that scheme, we have both flexibility and a small set of special
builtins that make up
a statically typed set for expressions and relational operators.

I would really like to vet this PR with a prototype this week,
to see whether we need to revisit any major choices. I don't think we'll be
able to
anticipate all the consequences until we write some code.


>
> One of the beneficial properties of the Union/Enum approach for the
> operator/function catalogues, is that when there are additions to
> those enums, the generated Flatbuffers files will cause many language
> compilers to warn or error on unhandled enum cases. If all
> function/operator names are strings, then you are essentially
> reimplementing the functionality provided by enums by hand. I
> initially used strings for all function references in my original
> prototype, but I now think that using an enum for "built-ins" would be
> superior (because of the code-generated enum interface) and not a
> premature optimization.
>
> [1]: https://github.com/apache/arrow/pull/10934
>
> On Fri, Aug 13, 2021 at 11:26 PM Phillip Cloud <[email protected]> wrote:
> >
> > Hey all,
> >
> > Just wanted to give an update on the effort here.
> >
> > Ben Kietzman has created an alternative proposal to the initial design
> [1].
> > It largely overlaps with the original, but differs in a few important
> ways:
> >
> > * A big focus of the design is on flexibility, allowing producers,
> > consumers and ultimately end users of those systems the ability to define
> > custom operations in the graph.
> > * There are very few predefined relational operations (project, filter,
> > join and a handful of others).
> > * There are only 3 types of value expressions: literals, field
> references,
> > and function calls.
> > * The model of evaluation is one that requires a final sink node, to
> > indicate where the record batches from child relations end up (a file, a
> > socket, an in-memory buffer, etc).
> >
> > I've added notes [2] to the original Google doc (under the Alternative
> > Design Notes subheading), and a few pseudocode examples.
> > Unfortunately, these went out of date as soon as Ben pushed the PR [3],
> so
> > I need to update those to reflect his changes. Regardless,
> > the design is broadly the same, so it should still give a good indication
> > of the details of the design.
> >
> > There are a decent number of review comments on the original PR that I
> plan
> > to port over where they are still relevant.
> > I also plan on adding support for window functions either tonight or on
> > Monday.
> >
> > Please review this design at your earliest convenience. Since there's a
> > fairly concrete set of types in flatbuffers that
> > we can look at, ideally we can center discussion around the details in
> the
> > PR.
> >
> > Thanks!
> >
> > [1]: https://github.com/apache/arrow/pull/10856
> > [2]:
> >
> https://docs.google.com/document/d/1C_XVOG7iFkl6cgWWMyzUoIjfKt-X2UxqagPJrla0bAE/edit#heading=h.4tfbbtaqzu13
> > [3]: https://github.com/apache/arrow/pull/10934
> >
> > On Thu, Aug 12, 2021 at 3:55 PM Julian Hyde <[email protected]>
> wrote:
> >
> > > > Wes wrote:
> > > >
> > > > Supporting this kind of intra-application engine
> > > > heterogeneity is one of the motivations for the project.
> > >
> > > +1
> > >
> > > The data format is the natural interface between tasks. (Defining
> “task”
> > > here as “something that is programmed using the IR”.) That is Arrow’s
> > > strength.
> > >
> > > So I think the IR should describe what each task should do, and tasks
> > > should be fairly small. Not whole relational operators, operating on
> whole
> > > tables, but pieces of relational operators, operating on batches or
> > > sequences of batches.
> > >
> > > Elsethread, someone mentioned the LoLePop concept and the
> Kohn/Leis/Neuman
> > > paper [1]. The LoLePop concept sounds good for our purposes.
> > >
> > > Julian
> > >
> > > [1] https://db.in.tum.de/~kohn/papers/lolepops-sigmod21.pdf
> > >
> > >
> > > > On Aug 12, 2021, at 5:19 AM, Wes McKinney <[email protected]>
> wrote:
> > > >
> > > > On Wed, Aug 11, 2021 at 11:22 PM Phillip Cloud <[email protected]>
> > > wrote:
> > > >>
> > > >> On Wed, Aug 11, 2021 at 4:48 PM Jorge Cardoso Leitão <
> > > >> [email protected]> wrote:
> > > >>
> > > >>> Couple of questions
> > > >>>
> > > >>> 1. Is the goal that IRs have equal semantics, i.e. given
> (IR,data), the
> > > >>> operation "(IR,data) - engine -> result" MUST be the same for all
> > > "engine"?
> > > >>>
> > > >>
> > > >> I think that might be a non-starter for mundane reasons: there's
> > > probably
> > > >> at least two engines
> > > >> that disagree on the result of sum(x) where x is a floating point
> > > column.
> > > >>
> > > >>
> > > >>> 2. if yes, imo we may need to worry about:
> > > >>> * a definition of equality that implementations agree on.
> > > >>> * agreement over what the semantics look like. For example, do we
> use
> > > >>> kleene logic for AND and OR?
> > > >>>
> > > >>
> > > >> WRT Kleene logic my thoughts are that the IR should support both
> Kleene
> > > and
> > > >> non-Kleene,
> > > >> and producers can choose their desired semantics.
> > > >>
> > > >> Ibis for example, would override the `&` operator in `a & b` to
> produce
> > > >> `KleeneAnd(Column(a), Column(b))`.
> > > >>
> > > >>
> > > >>>
> > > >>> To try to understand the gist, let's pick an aggregated count over
> a
> > > >>> column: engines often do partial counts over partitions followed
> by a
> > > final
> > > >>> "sum" over the partial counts. Is the idea that the query engine
> would
> > > >>> communicate with the compute engine via 2 IRs where one is "count
> me
> > > these"
> > > >>> the other is "sum me these"?
> > > >>>
> > > >>> Best,
> > > >>> Jorge
> > > >>>
> > > >>
> > > >> Not in its current incarnation.
> > > >>
> > > >> The idea is that the IR producer communicates a desire to count(x)
> to a
> > > >> consumer, and  it's up to the consumer to figure out how to turn
> that
> > > count
> > > >> into something that makes sense for itself. In your example that's a
> > > series
> > > >> of partial counts followed by a sum.
> > > >>
> > > >
> > > > That said, I think there is a valid use case here where a system
> might
> > > > make use of different engines to execute different (composable)
> layers
> > > > of a complex query.
> > > >
> > > > For example:
> > > >
> > > > * suppose you want to scan and do predicate pushdown on an unusual
> > > > data source that is only accessible from one particular engine but
> > > > * you need to do some analytical operation with the scan results that
> > > > is only supported by another engine
> > > >
> > > > You could decompose the query into two stages with an IR relational
> > > > expression for each stage and use then the engines together to
> > > > accomplish what you need (of course, you would need an orchestration
> > > > layer to handle plumbing the query engine inputs and outputs together
> > > > as Arrow streams). Supporting this kind of intra-application engine
> > > > heterogeneity is one of the motivations for the project.
> > > >
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Wed, Aug 11, 2021 at 6:10 PM Phillip Cloud <[email protected]>
> > > wrote:
> > > >>>
> > > >>>> Thanks Wes,
> > > >>>>
> > > >>>> Great to be back working on Arrow again and engaging with the
> > > community.
> > > >>> I
> > > >>>> am really excited about this effort.
> > > >>>>
> > > >>>> I think there are a number of concerns I see as important to
> address
> > > in
> > > >>> the
> > > >>>> compute IR proposal:
> > > >>>>
> > > >>>> 1. Requirement for output types.
> > > >>>>
> > > >>>> I think that so far there's been many reasons for requiring
> > > conforming IR
> > > >>>> producers and consumers to adhere to output types, but I haven't
> seen
> > > a
> > > >>>> strong rationale for keeping them optional (in the semantic
> sense, not
> > > >>> WRT
> > > >>>> any particular serialization format's representation of
> optionality).
> > > >>>>
> > > >>>> I think a design that includes unambiguous semantics for output
> types
> > > (a
> > > >>>> consumer must produce a value of the requested type or it's an
> > > >>>> error/non-conforming) is simpler to reason about for producers,
> and
> > > >>>> provides a strong guarantee for end users (humans or machines
> > > >>> constructing
> > > >>>> IR from an API and expecting the thing they ask for back from the
> IR
> > > >>>> consumer).
> > > >>>>
> > > >>>> 2. Flexibility
> > > >>>>
> > > >>>> The current PR is currently unable to support what I think are
> killer
> > > >>>> features of the IR: custom operators (relational or column) and
> UDFs.
> > > In
> > > >>> my
> > > >>>> mind, on top of the generalized compute description that the IR
> > > offers,
> > > >>> the
> > > >>>> ability for producers and consumers of IR to extend the IR without
> > > >>> needing
> > > >>>> to modify Arrow or depend on anything except the format is itself
> > > >>> something
> > > >>>> that is necessary to gain adoption.
> > > >>>>
> > > >>>> Developers will need to build custom relational operators (e.g.,
> > > scans of
> > > >>>> backends that don't exist anywhere for which a user has code to
> > > >>> implement)
> > > >>>> and custom functions (anything operating on a column that doesn't
> > > already
> > > >>>> exist, really). Furthermore, I think we can actually drive
> building an
> > > >>>> Arrow consumer using the same API that an end user would use to
> extend
> > > >>> the
> > > >>>> IR.
> > > >>>>
> > > >>>> 3. Window Functions
> > > >>>>
> > > >>>> Window functions are, I think, an important part of the IR value
> > > >>>> proposition, as they are one of the more complex operators in
> > > databases.
> > > >>> I
> > > >>>> think we need to have something in the initial IR proposal to
> support
> > > >>> these
> > > >>>> operations.
> > > >>>>
> > > >>>> 4. Non relational Joins
> > > >>>>
> > > >>>> Things like as-of join and window join operators aren't yet
> fleshed
> > > out
> > > >>> in
> > > >>>> the IR, and I'm not sure they should be in scope for the initial
> > > >>> prototype.
> > > >>>> I think once we settle on a design, we can work the design of
> these
> > > >>>> particular operators out during the initial prototype. I think the
> > > >>>> specification of these operators should basically be PR #2 after
> the
> > > >>>> initial design lands.
> > > >>>>
> > > >>>> # Order of Work
> > > >>>>
> > > >>>> 1. Nail down the design. Anything else is a non-starter.
> > > >>>>
> > > >>>> 2. Prototype an IR producer using Ibis
> > > >>>>
> > > >>>> Ibis is IMO a good candidate for a first IR producer as it has a
> > > number
> > > >>> of
> > > >>>> desirable properties that make prototyping faster and allow for
> us to
> > > >>>> refine the design of the IR as needed based on how the
> implementation
> > > >>> goes:
> > > >>>> * It's written in Python so it has native support for nearly all
> of
> > > >>>> flatbuffers' features without having to creating bindings to C++.
> > > >>>> * There's already a set of rules for type checking, as well as
> APIs
> > > for
> > > >>>> constructing expression trees, which means we don't need to worry
> > > about
> > > >>>> building a type checker for the prototype.
> > > >>>>
> > > >>>> 3. Prototype an IR consumer in C++
> > > >>>>
> > > >>>> I think in parallel to the producer prototype we can further
> inform
> > > the
> > > >>>> design from the consumer side by prototyping an IR consumer in
> C++ . I
> > > >>> know
> > > >>>> Ben Kietzman has expressed interest in working on this.
> > > >>>>
> > > >>>> Very interested to hear others' thoughts.
> > > >>>>
> > > >>>> -Phillip
> > > >>>>
> > > >>>> On Tue, Aug 10, 2021 at 10:56 AM Wes McKinney <
> [email protected]>
> > > >>> wrote:
> > > >>>>
> > > >>>>> Thank you for all the feedback and comments on the document. I'm
> on
> > > >>>>> vacation this week, so I'm delayed responding to everything, but
> I
> > > >>>>> will get to it as quickly as I can. I will be at VLDB in
> Copenhagen
> > > >>>>> next week if anyone would like to chat in person about it, and
> we can
> > > >>>>> relay the content of any discussions back to the
> document/PR/e-mail
> > > >>>>> thread.
> > > >>>>>
> > > >>>>> I know that Phillip Cloud expressed interest in working on the
> PR and
> > > >>>>> helping work through many of the details, so I'm glad to have the
> > > >>>>> help. If there are others who would like to work on the PR or dig
> > > into
> > > >>>>> the details, please let me know. We might need to figure out how
> to
> > > >>>>> accommodate "many cooks" by setting up the ComputeIR project
> > > somewhere
> > > >>>>> separate from the format/ directory to permit it to exist in a
> > > >>>>> Work-In-Progress status for a period of time until we work
> through
> > > the
> > > >>>>> various details and design concerns.
> > > >>>>>
> > > >>>>> Re Julian's comment
> > > >>>>>
> > > >>>>>> The biggest surprise is that this language does full relational
> > > >>>>> operations. I was expecting that it would do fragments of the
> > > >>> operations.
> > > >>>>>
> > > >>>>> There's a related but different (yet still interesting and
> worthy of
> > > >>>>> analysis) problem of creating an "engine language" that describes
> > > more
> > > >>>>> mechanically the constituent parts of implementing the relational
> > > >>>>> operators. To create a functional computation language with
> concrete
> > > >>>>> Arrow data structures as a top-level primitive sounds like an
> > > >>>>> interesting research area where I could see something developing
> > > >>>>> eventually.
> > > >>>>>
> > > >>>>> The main problem I'm interested in solving right now is enabling
> > > front
> > > >>>>> ends that have sufficient understanding of relational algebra and
> > > data
> > > >>>>> frame operations to talk to engines without having to go
> backwards
> > > >>>>> from their logical query plans to SQL. So as mentioned in the
> > > >>>>> document, being able to faithfully carry the relational operator
> node
> > > >>>>> information generated by Calcite or Ibis or another system would
> be
> > > >>>>> super useful. Defining the semantics of various kinds of
> user-defined
> > > >>>>> functions would also be helpful to standardize the
> > > >>>>> engine-to-user-language UDF/extension interface.
> > > >>>>>
> > > >>>>> On Tue, Aug 10, 2021 at 2:36 PM Dimitri Vorona <
> [email protected]>
> > > >>>> wrote:
> > > >>>>>>
> > > >>>>>> Hi Wes,
> > > >>>>>>
> > > >>>>>> cool initiative! Reminded me of "Building Advanced SQL Analytics
> > > From
> > > >>>>>> Low-Level Plan Operators" from SIGMOD 2021 (
> > > >>>>>> http://db.in.tum.de/~kohn/papers/lolepops-sigmod21.pdf) which
> > > >>>> proposes a
> > > >>>>>> set of building block for advanced aggregation.
> > > >>>>>>
> > > >>>>>> Cheers,
> > > >>>>>> Dimitri.
> > > >>>>>>
> > > >>>>>> On Thu, Aug 5, 2021 at 7:59 PM Julian Hyde <
> [email protected]>
> > > >>>>> wrote:
> > > >>>>>>
> > > >>>>>>> Wes,
> > > >>>>>>>
> > > >>>>>>> Thanks for this. I’ve added comments to the doc and to the PR.
> > > >>>>>>>
> > > >>>>>>> The biggest surprise is that this language does full relational
> > > >>>>>>> operations. I was expecting that it would do fragments of the
> > > >>>>> operations.
> > > >>>>>>> Consider join. A distributed hybrid hash join needs to
> partition
> > > >>> rows
> > > >>>>> into
> > > >>>>>>> output buffers based on a hash key, build hash tables, probe
> into
> > > >>>> hash
> > > >>>>>>> tables, scan hash tables for untouched “outer”rows, and so
> forth.
> > > >>>>>>>
> > > >>>>>>> I see Arrow’s compute as delivering each of those operations,
> > > >>> working
> > > >>>>> on
> > > >>>>>>> perhaps a batch at a time, or a sequence of batches, with some
> > > >>> other
> > > >>>>> system
> > > >>>>>>> coordinating those tasks. So I would expect to see Arrow’s
> compute
> > > >>>>> language
> > > >>>>>>> mainly operating on batches rather than a table abstraction.
> > > >>>>>>>
> > > >>>>>>> Julian
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>> On Aug 2, 2021, at 5:16 PM, Wes McKinney <[email protected]
> >
> > > >>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>> hi folks,
> > > >>>>>>>>
> > > >>>>>>>> This idea came up in passing in the past -- given that there
> are
> > > >>>>>>>> multiple independent efforts to develop Arrow-native query
> > > >>> engines
> > > >>>>>>>> (and surely many more to come), it seems like it would be
> > > >>> valuable
> > > >>>> to
> > > >>>>>>>> have a way to enable user languages (like Java, Python, R, or
> > > >>> Rust,
> > > >>>>>>>> for example) to communicate with backend computing engines
> (like
> > > >>>>>>>> DataFusion, or new computing capabilities being built in the
> > > >>> Arrow
> > > >>>>> C++
> > > >>>>>>>> library) in a fashion that is "lower-level" than SQL and
> > > >>>> specialized
> > > >>>>>>>> to Arrow's type system. So rather than leaving it to a SQL
> > > >>> parser /
> > > >>>>>>>> analyzer framework to generate an expression tree of
> relational
> > > >>>>>>>> operators and then translate that to an Arrow-native
> > > >>>> compute-engine's
> > > >>>>>>>> internal grammer, a user framework could provide the desired
> > > >>>>>>>> Arrow-native expression tree / data manipulations directly and
> > > >>> skip
> > > >>>>>>>> the SQL altogether.
> > > >>>>>>>>
> > > >>>>>>>> The idea of creating a "serialized intermediate representation
> > > >>>> (IR)"
> > > >>>>>>>> for Arrow compute operations would be to serve use cases large
> > > >>> and
> > > >>>>>>>> small -- from the most complex TPC-* or time series database
> > > >>> query
> > > >>>> to
> > > >>>>>>>> the most simple array predicate/filter sent with an RPC
> request
> > > >>>> using
> > > >>>>>>>> Arrow Flight. It is deliberately language- and
> engine-agnostic,
> > > >>>> with
> > > >>>>>>>> the only commonality across usages being the Arrow columnar
> > > >>> format
> > > >>>>>>>> (schemas and array types). This would be better than leaving
> it
> > > >>> to
> > > >>>>>>>> each application to develop its own bespoke expression
> > > >>>>> representations
> > > >>>>>>>> for its needs.
> > > >>>>>>>>
> > > >>>>>>>> I spent a while thinking about this and wrote up a brain dump
> RFC
> > > >>>>>>>> document [1] and accompanying pull request [2] that makes the
> > > >>>>> possibly
> > > >>>>>>>> controversial choice of using Flatbuffers to represent the
> > > >>>> serialized
> > > >>>>>>>> IR. I discuss the rationale for the choice of Flatbuffers in
> the
> > > >>>> RFC
> > > >>>>>>>> document. This PR is obviously deficient in many regards
> > > >>>> (incomplete,
> > > >>>>>>>> hacky, or unclear in places), and will need some help from
> others
> > > >>>> to
> > > >>>>>>>> flesh out. I suspect that actually implementing the IR will be
> > > >>>>>>>> necessary to work out many of the low-level details.
> > > >>>>>>>>
> > > >>>>>>>> Note that this IR is intended to be more of a "superset"
> project
> > > >>>> than
> > > >>>>>>>> a "lowest common denominator". So there may be things that it
> > > >>>>> includes
> > > >>>>>>>> which are only available in some engines (e.g. engines that
> have
> > > >>>>>>>> specialized handling of time series data).
> > > >>>>>>>>
> > > >>>>>>>> As some of my personal motivation for the project, concurrent
> > > >>> with
> > > >>>>> the
> > > >>>>>>>> genesis of Apache Arrow, I started a Python project called
> Ibis
> > > >>> [3]
> > > >>>>>>>> (which is similar to R's dplyr project) which serves as a
> "Python
> > > >>>>>>>> analytical query IR builder" that is capable of generating
> most
> > > >>> of
> > > >>>>> the
> > > >>>>>>>> SQL standard, targeting many different SQL dialects and other
> > > >>>>> backends
> > > >>>>>>>> (like pandas). Microsoft ([4]) and Google ([5]) have used this
> > > >>>>> library
> > > >>>>>>>> as a "many-SQL" middleware. As such, I would like to be able
> to
> > > >>>>>>>> translate between the in-memory "logical query" data
> structures
> > > >>> in
> > > >>>> a
> > > >>>>>>>> library like Ibis to a serialized format that can be executed
> by
> > > >>>> many
> > > >>>>>>>> different Arrow-native query engines. The expression
> primitives
> > > >>>>>>>> available in Ibis should serve as helpful test cases, too.
> > > >>>>>>>>
> > > >>>>>>>> I look forward to the community's comments on the RFC document
> > > >>> [1]
> > > >>>>> and
> > > >>>>>>>> pull request [2] -- I realize that arriving at consensus on a
> > > >>>> complex
> > > >>>>>>>> and ambitious project like this can be challenging so I
> recommend
> > > >>>>>>>> spending time on the "non-goals" section in the RFC and ask
> > > >>>> questions
> > > >>>>>>>> if you are unclear about the scope of what problems this is
> > > >>> trying
> > > >>>> to
> > > >>>>>>>> solve. I would be happy to give Edit access on the RFC
> document
> > > >>> to
> > > >>>>>>>> others and would consider ideas about how to move forward with
> > > >>>>>>>> something that is able to be implemented by different Arrow
> > > >>>> libraries
> > > >>>>>>>> in the reasonably near future.
> > > >>>>>>>>
> > > >>>>>>>> Thanks,
> > > >>>>>>>> Wes
> > > >>>>>>>>
> > > >>>>>>>> [1]:
> > > >>>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > >
> https://docs.google.com/document/d/1C_XVOG7iFkl6cgWWMyzUoIjfKt-X2UxqagPJrla0bAE/edit#
> > > >>>>>>>> [2]: https://github.com/apache/arrow/pull/10856
> > > >>>>>>>> [3]: https://ibis-project.org/
> > > >>>>>>>> [4]: http://cidrdb.org/cidr2021/papers/cidr2021_paper08.pdf
> > > >>>>>>>> [5]:
> > > >>>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > >
> https://cloud.google.com/blog/products/databases/automate-data-validation-with-dvt
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > >
> > >
>

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

Reply via email to