Hi Weston, I’d be happy to donate something like this to Sunstrait if that’s useful, I was thinking of proving out a design here before going there. However we could also just go straight there :)
Regarding infix operators and such the edge case I was thinking of is that a user could potentially add a kernel to the registry called e.g. “+”. Would the parser implicitly convert any instances of “+” to “add” and break that? Implicit typing for literals and parameters can probably also be added without issues to the current scheme. Would the parameters be passed as an std::unordered_map? > Does a field_ref have to be a field name or can it be a field index? It can be a field index or even a field path. The field ref is parsed using FieldRef::FromDotPath ([1] in my original message), which can express any FieldRef. Sasha > 6 окт. 2022 г., в 16:08, Weston Pace <weston.p...@gmail.com> написал(а): > > Currently Substrait only has a binary (protobuf) serialization (and a > protobuf JSON one but that's not really human writable and barely > human readable). Substrait does not have a text serialization. I > believe there is some desire for one (maybe Sasha wants to give it a > try?). A text format for Substrait would solve this problem because > you could go "text expression" -> "substrait expression" -> "arrow > expression". > > Since no text format exists for Substrait I think that Substrait does > not currently solve this problem or overlap with your work. However, > at some point (hopefully), it will. > > There was also a fairly recent proposal for a parser for gandiva > expressions[1]. > > Compared with [1] I think this proposal is simpler to parse but lacks > some of the shortcut conveniences (e.g. implicit types for literals, > support for common infix operators (+, -, /, ...)). > > Both are lacking parameters (e.g. "(equals(!x, %threshold%))" which I think > would be useful to have as one could then do something like `auto > arrow_expr = Parse(my_expr, threshold)`. > > Does a field_ref have to be a field name or can it be a field index? > The latter is quite useful when the schema has duplicate field names. > > I'm +0.5 on this change. I worry a bit about having (eventually) > three different syntaxes. However, at the moment we have zero. > > [1] https://lists.apache.org/thread/0oyns380hgzvl0y8kwgqoo4fp7ntt3bn > >> On Wed, Oct 5, 2022 at 1:55 PM Sasha Krassovsky >> <krassovskysa...@gmail.com> wrote: >> >> Hi David, >> Could you elaborate on which part of my proposal overlaps with Substrait? I >> don’t see anything in Substrait that allows me to do something along the >> lines of >> >> Expression e = Expression::FromString(“(add !.a $int32:1)”); >> >> in the code. >> >> Sasha >> >>>> On Oct 5, 2022, at 1:35 PM, Lee, David <david....@blackrock.com.INVALID> >>>> wrote: >>> >>> I believe this is what substrait.io <http://substrait.io/> is trying to >>> accomplish.. >>> >>> Here's some additional info: >>> https://substrait.io/ <https://substrait.io/> >>> >>> https://www.youtube.com/watch?v=5JjaB7p3Sjk >>> <https://www.youtube.com/watch?v=5JjaB7p3Sjk> >>> >>> -----Original Message----- >>> From: Sasha Krassovsky <krassovskysa...@gmail.com >>> <mailto:krassovskysa...@gmail.com>> >>> Sent: Wednesday, October 5, 2022 11:29 AM >>> To: dev@arrow.apache.org <mailto:dev@arrow.apache.org> >>> Subject: Parser for expressions >>> >>> External Email: Use caution with links and attachments >>> >>> >>> Hi everyone, >>> I’ve noticed on the mailing list a few times people asking for a more >>> convenient way to construct an Expression, namely using a string of some >>> sort. I’ve found myself wishing for something like this too when >>> constructing ExecPlans, and so I’ve gone ahead and implemented a parser >>> [0]. I was wondering if anyone had any thoughts about the design of the >>> language? >>> >>> The current implementation parses a lisp-like language. This language has >>> three types of expressions (mirroring the current Expression API): >>> >>> - A call is a normal s-expression, it has the name of the kernel and the >>> list of arguments. Its arguments can be any expression. >>> - A literal (i.e. scalar) starts with a $ and specifies a type and a value, >>> separated by a colon. For example, `$decimal(12,2):10.01` specifies a >>> literal of type decimal(12, 2) and a value of 10.01. >>> - A field_ref starts with a ! and is an identifier in the schema following >>> the DotPath syntax we already have [1]. >>> >>> So for example, the expression >>> >>> (add $int32:1 (multiply !.a !.b)) >>> >>> computes a*b+1 given a batch with columns named a and b. >>> >>> The reason I chose a lisp-like language is that it very directly translates >>> to the current Expression API and that it feels more natural to use a >>> prefix notation for a language where all functions have a name (i.e. no +, >>> -, *, etc.). >>> >>> I’m currently working on a followup PR for specifying ExecPlans from a >>> string (mainly for easier testing), and would like that language to be an >>> extension of this one. Looking forward to hearing everyone’s thoughts! >>> >>> Thanks, >>> Sasha Krassovsky >>> >>> [0] >>> https://urldefense.com/v3/__https://github.com/apache/arrow/pull/14287__;!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG_6oZdDk$ >>> >>> <https://urldefense.com/v3/__https://github.com/apache/arrow/pull/14287__;!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG_6oZdDk$> >>> >>> <https://urldefense.com/v3/__https://github.com/apache/arrow/pull/14287__;!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG_6oZdDk$ >>> >>> <https://urldefense.com/v3/__https://github.com/apache/arrow/pull/14287__;!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG_6oZdDk$> >>> > >>> [1] >>> https://urldefense.com/v3/__https://github.com/apache/arrow/blob/master/cpp/src/arrow/type.h*L1726__;Iw!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG0GkL0Mn$ >>> >>> <https://urldefense.com/v3/__https://github.com/apache/arrow/blob/master/cpp/src/arrow/type.h*L1726__;Iw!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG0GkL0Mn$> >>> >>> <https://urldefense.com/v3/__https://github.com/apache/arrow/blob/master/cpp/src/arrow/type.h*L1726__;Iw!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG0GkL0Mn$ >>> >>> <https://urldefense.com/v3/__https://github.com/apache/arrow/blob/master/cpp/src/arrow/type.h*L1726__;Iw!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG0GkL0Mn$> >>> > >>> >>> >>> >>> This message may contain information that is confidential or privileged. If >>> you are not the intended recipient, please advise the sender immediately >>> and delete this message. See >>> http://www.blackrock.com/corporate/compliance/email-disclaimers >>> <http://www.blackrock.com/corporate/compliance/email-disclaimers> for >>> further information. Please refer to >>> http://www.blackrock.com/corporate/compliance/privacy-policy >>> <http://www.blackrock.com/corporate/compliance/privacy-policy> for more >>> information about BlackRock’s Privacy Policy. >>> >>> >>> For a list of BlackRock's office addresses worldwide, see >>> http://www.blackrock.com/corporate/about-us/contacts-locations >>> <http://www.blackrock.com/corporate/about-us/contacts-locations>. >>> >>> © 2022 BlackRock, Inc. All rights reserved. >>