Yes that makes a lot of sense! I’d agree that it would probably be fine to have two different syntaxes, seeing as the use-cases are a bit different.
Did anyone else have any thoughts? Either on the lisp-style syntax for Arrow’s Expressions or on having two different syntaxes? (Weston or Antoine?) Sasha > On Oct 9, 2022, at 5:38 AM, Jin Shang <shangjin1...@gmail.com> wrote: > > Hi Sasha, > > I agree with your points. However Gandiva is kind of specialized in computing > arithmetic expressions and it offers little to none non-arithmetic > operations. So it is very helpful if its parser understands natural math > expressions. > > Considering that Gandiva is a relatively independent component within the > arrow project, and that it’s only a math expression compiler rather than a > fully functioned compute engine, maybe it’s acceptable for Gandiva to have > its own grammar different from compute/Acero/Substrait etc. > > Best, > Jin > >> 2022年10月8日 03:01,Sasha Krassovsky <krassovskysa...@gmail.com> 写道: >> >> Hi Jin, >> I agree it would be good to standardize on a syntax. To me, the advantages >> of the lisp-style syntax are: >> - don’t have to define/implement any kind of precedence rules >> - has a uniform syntax (no distinction between prefix and infix operators) >> - avoids having “special” functions that have an associated arithmetic >> symbol >> - translates directly to the underlying Expression infrastructure. >> >> The advantage of the Python-style syntax is that it’s more natural to use >> for arithmetic expressions. However, I think for non-arithmetic expressions >> this syntax would be more cumbersome. >> >> Either would work of course, I guess it just depends on the goal. I was >> thinking the string representation wouldn’t represent any significant level >> of abstraction, it is just a convenience to save on clutter when typing out >> expressions. >> >> Sasha >> >>> 6 окт. 2022 г., в 22:20, Jin Shang <shangjin1...@gmail.com> написал(а): >>> >>> Hi Sasha and Weston, >>> >>> I'm the author of the mentioned Gandiva parser. I agree that having one >>> unified syntax is ideal. I think one critical divergence between Sasha's >>> and my proposals is that mine is with C++/Python imperative style (foo(x, >>> y, z), a+b…) and Sasha's is with Lisp functional style ((foo x y z), (+ a >>> b)…). I feel like it'll be better for us to settle on one of the styles >>> before we start implementing the parsers. >>> >>> Best, >>> Jin >>> >>>> On Friday, October 7, 2022, Sasha Krassovsky <krassovskysa...@gmail.com> >>>> wrote: >>>> >>>> Hi Weston, >>>> I’d be happy to donate something like this to Sunstrait if that’s useful, >>>> I was thinking of proving out a design here before going there. However we >>>> could also just go straight there :) >>>> >>>> Regarding infix operators and such the edge case I was thinking of is that >>>> a user could potentially add a kernel to the registry called e.g. “+”. >>>> Would the parser implicitly convert any instances of “+” to “add” and break >>>> that? >>>> >>>> Implicit typing for literals and parameters can probably also be added >>>> without issues to the current scheme. Would the parameters be passed as an >>>> std::unordered_map? >>>> >>>>> Does a field_ref have to be a field name or can it be a field index? >>>> >>>> It can be a field index or even a field path. The field ref is parsed >>>> using FieldRef::FromDotPath ([1] in my original message), which can express >>>> any FieldRef. >>>> >>>> Sasha >>>> >>>>>> 6 окт. 2022 г., в 16:08, Weston Pace <weston.p...@gmail.com> написал(а): >>>>> >>>>> Currently Substrait only has a binary (protobuf) serialization (and a >>>>> protobuf JSON one but that's not really human writable and barely >>>>> human readable). Substrait does not have a text serialization. I >>>>> believe there is some desire for one (maybe Sasha wants to give it a >>>>> try?). A text format for Substrait would solve this problem because >>>>> you could go "text expression" -> "substrait expression" -> "arrow >>>>> expression". >>>>> >>>>> Since no text format exists for Substrait I think that Substrait does >>>>> not currently solve this problem or overlap with your work. However, >>>>> at some point (hopefully), it will. >>>>> >>>>> There was also a fairly recent proposal for a parser for gandiva >>>> expressions[1]. >>>>> >>>>> Compared with [1] I think this proposal is simpler to parse but lacks >>>>> some of the shortcut conveniences (e.g. implicit types for literals, >>>>> support for common infix operators (+, -, /, ...)). >>>>> >>>>> Both are lacking parameters (e.g. "(equals(!x, %threshold%))" which I >>>> think >>>>> would be useful to have as one could then do something like `auto >>>>> arrow_expr = Parse(my_expr, threshold)`. >>>>> >>>>> Does a field_ref have to be a field name or can it be a field index? >>>>> The latter is quite useful when the schema has duplicate field names. >>>>> >>>>> I'm +0.5 on this change. I worry a bit about having (eventually) >>>>> three different syntaxes. However, at the moment we have zero. >>>>> >>>>> [1] https://lists.apache.org/thread/0oyns380hgzvl0y8kwgqoo4fp7ntt3bn >>>>> >>>>>> On Wed, Oct 5, 2022 at 1:55 PM Sasha Krassovsky >>>>>> <krassovskysa...@gmail.com> wrote: >>>>>> >>>>>> Hi David, >>>>>> Could you elaborate on which part of my proposal overlaps with >>>> Substrait? I don’t see anything in Substrait that allows me to do something >>>> along the lines of >>>>>> >>>>>> Expression e = Expression::FromString(“(add !.a $int32:1)”); >>>>>> >>>>>> in the code. >>>>>> >>>>>> Sasha >>>>>> >>>>>>>> On Oct 5, 2022, at 1:35 PM, Lee, David >>>>>>>> <david....@blackrock.com.INVALID> >>>> wrote: >>>>>>> >>>>>>> I believe this is what substrait.io <http://substrait.io/> is trying >>>> to accomplish.. >>>>>>> >>>>>>> Here's some additional info: >>>>>>> https://substrait.io/ <https://substrait.io/> >>>>>>> >>>>>>> https://www.youtube.com/watch?v=5JjaB7p3Sjk <https://www.youtube.com/ >>>> watch?v=5JjaB7p3Sjk> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Sasha Krassovsky <krassovskysa...@gmail.com <mailto: >>>> krassovskysa...@gmail.com>> >>>>>>> Sent: Wednesday, October 5, 2022 11:29 AM >>>>>>> To: dev@arrow.apache.org <mailto:dev@arrow.apache.org> >>>>>>> Subject: Parser for expressions >>>>>>> >>>>>>> External Email: Use caution with links and attachments >>>>>>> >>>>>>> >>>>>>> Hi everyone, >>>>>>> I’ve noticed on the mailing list a few times people asking for a more >>>> convenient way to construct an Expression, namely using a string of some >>>> sort. I’ve found myself wishing for something like this too when >>>> constructing ExecPlans, and so I’ve gone ahead and implemented a parser >>>> [0]. I was wondering if anyone had any thoughts about the design of the >>>> language? >>>>>>> >>>>>>> The current implementation parses a lisp-like language. This language >>>> has three types of expressions (mirroring the current Expression API): >>>>>>> >>>>>>> - A call is a normal s-expression, it has the name of the kernel and >>>> the list of arguments. Its arguments can be any expression. >>>>>>> - A literal (i.e. scalar) starts with a $ and specifies a type and a >>>> value, separated by a colon. For example, `$decimal(12,2):10.01` specifies >>>> a literal of type decimal(12, 2) and a value of 10.01. >>>>>>> - A field_ref starts with a ! and is an identifier in the schema >>>> following the DotPath syntax we already have [1]. >>>>>>> >>>>>>> So for example, the expression >>>>>>> >>>>>>> (add $int32:1 (multiply !.a !.b)) >>>>>>> >>>>>>> computes a*b+1 given a batch with columns named a and b. >>>>>>> >>>>>>> The reason I chose a lisp-like language is that it very directly >>>> translates to the current Expression API and that it feels more natural to >>>> use a prefix notation for a language where all functions have a name (i.e. >>>> no +, -, *, etc.). >>>>>>> >>>>>>> I’m currently working on a followup PR for specifying ExecPlans from a >>>> string (mainly for easier testing), and would like that language to be an >>>> extension of this one. Looking forward to hearing everyone’s thoughts! >>>>>>> >>>>>>> Thanks, >>>>>>> Sasha Krassovsky >>>>>>> >>>>>>> [0] https://urldefense.com/v3/__https://github.com/apache/ >>>> arrow/pull/14287__;!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3 >>>> Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG_6oZdDk$ < >>>> https://urldefense.com/v3/__https://github.com/apache/ >>>> arrow/pull/14287__;!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3 >>>> Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG_6oZdDk$> < >>>> https://urldefense.com/v3/__https://github.com/apache/ >>>> arrow/pull/14287__;!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3 >>>> Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG_6oZdDk$ < >>>> https://urldefense.com/v3/__https://github.com/apache/ >>>> arrow/pull/14287__;!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3 >>>> Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG_6oZdDk$> > >>>>>>> [1] https://urldefense.com/v3/__https://github.com/apache/ >>>> arrow/blob/master/cpp/src/arrow/type.h*L1726__;Iw!!KSjYCgUGsB4! >>>> enYRTooMrwyJKJzgTlQMdMhpfT7ys3Ol8a8HcHUvxRYRN-a-Up_ >>>> axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG0GkL0Mn$ <https://urldefense.com/v3/__ >>>> https://github.com/apache/arrow/blob/master/cpp/src/ >>>> arrow/type.h*L1726__;Iw!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3 >>>> Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG0GkL0Mn$> < >>>> https://urldefense.com/v3/__https://github.com/apache/ >>>> arrow/blob/master/cpp/src/arrow/type.h*L1726__;Iw!!KSjYCgUGsB4! >>>> enYRTooMrwyJKJzgTlQMdMhpfT7ys3Ol8a8HcHUvxRYRN-a-Up_ >>>> axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG0GkL0Mn$ <https://urldefense.com/v3/__ >>>> https://github.com/apache/arrow/blob/master/cpp/src/ >>>> arrow/type.h*L1726__;Iw!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3 >>>> Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG0GkL0Mn$> > >>>>>>> >>>>>>> >>>>>>> >>>>>>> This message may contain information that is confidential or >>>> privileged. If you are not the intended recipient, please advise the sender >>>> immediately and delete this message. See http://www.blackrock.com/ >>>> corporate/compliance/email-disclaimers <http://www.blackrock.com/ >>>> corporate/compliance/email-disclaimers> for further information. Please >>>> refer to http://www.blackrock.com/corporate/compliance/privacy-policy < >>>> http://www.blackrock.com/corporate/compliance/privacy-policy> for more >>>> information about BlackRock’s Privacy Policy. >>>>>>> >>>>>>> >>>>>>> For a list of BlackRock's office addresses worldwide, see >>>> http://www.blackrock.com/corporate/about-us/contacts-locations < >>>> http://www.blackrock.com/corporate/about-us/contacts-locations>. >>>>>>> >>>>>>> © 2022 BlackRock, Inc. All rights reserved. >>>>>> >>>> >