Yes that makes a lot of sense! I’d agree that it would probably be fine to have 
two different syntaxes, seeing as the use-cases are a bit different. 

Did anyone else have any thoughts? Either on the lisp-style syntax for Arrow’s 
Expressions or on having two different syntaxes? (Weston or Antoine?)

Sasha

> On Oct 9, 2022, at 5:38 AM, Jin Shang <shangjin1...@gmail.com> wrote:
> 
> Hi Sasha,
> 
> I agree with your points. However Gandiva is kind of specialized in computing 
> arithmetic expressions and it offers little to none non-arithmetic 
> operations. So it is very helpful if its parser understands natural math 
> expressions. 
> 
> Considering that Gandiva is a relatively independent component within the 
> arrow project, and that it’s only a math expression compiler rather than a 
> fully functioned compute engine, maybe it’s acceptable for Gandiva to have 
> its own grammar different from compute/Acero/Substrait etc.
> 
> Best,
> Jin
> 
>> 2022年10月8日 03:01,Sasha Krassovsky <krassovskysa...@gmail.com> 写道:
>> 
>> Hi Jin,
>> I agree it would be good to standardize on a syntax. To me, the advantages 
>> of the lisp-style syntax are:
>> - don’t have to define/implement any kind of precedence rules 
>> - has a uniform syntax (no distinction between prefix and infix operators)
>> - avoids having “special” functions that have an associated arithmetic 
>> symbol 
>> - translates directly to the underlying Expression infrastructure. 
>> 
>> The advantage of the Python-style syntax is that it’s more natural to use 
>> for arithmetic expressions. However, I think for non-arithmetic expressions 
>> this syntax would be more cumbersome. 
>> 
>> Either would work of course, I guess it just depends on the goal. I was 
>> thinking the string representation wouldn’t represent any significant level 
>> of abstraction, it is just a convenience to save on clutter when typing out 
>> expressions. 
>> 
>> Sasha 
>> 
>>> 6 окт. 2022 г., в 22:20, Jin Shang <shangjin1...@gmail.com> написал(а):
>>> 
>>> Hi Sasha and Weston,
>>> 
>>> I'm the author of the mentioned Gandiva parser. I agree that having one
>>> unified syntax is ideal. I think one critical divergence between Sasha's
>>> and my proposals is that mine is with C++/Python imperative style (foo(x,
>>> y, z), a+b…) and Sasha's is with Lisp functional style ((foo x y z), (+ a
>>> b)…). I feel like it'll be better for us to settle on one of the styles
>>> before we start implementing the parsers.
>>> 
>>> Best,
>>> Jin
>>> 
>>>> On Friday, October 7, 2022, Sasha Krassovsky <krassovskysa...@gmail.com>
>>>> wrote:
>>>> 
>>>> Hi Weston,
>>>> I’d be happy to donate something like this to Sunstrait if that’s useful,
>>>> I was thinking of proving out a design here before going there. However we
>>>> could also just go straight there :)
>>>> 
>>>> Regarding infix operators and such the edge case I was thinking of is that
>>>> a user could potentially add a kernel to the registry called e.g. “+”.
>>>> Would the parser implicitly convert any instances of “+” to “add” and break
>>>> that?
>>>> 
>>>> Implicit typing for literals and parameters can probably also be added
>>>> without issues to the current scheme. Would the parameters be passed as an
>>>> std::unordered_map?
>>>> 
>>>>> Does a field_ref have to be a field name or can it be a field index?
>>>> 
>>>> It can be a field index or even a field path. The field ref is parsed
>>>> using FieldRef::FromDotPath ([1] in my original message), which can express
>>>> any FieldRef.
>>>> 
>>>> Sasha
>>>> 
>>>>>> 6 окт. 2022 г., в 16:08, Weston Pace <weston.p...@gmail.com> написал(а):
>>>>> 
>>>>> Currently Substrait only has a binary (protobuf) serialization (and a
>>>>> protobuf JSON one but that's not really human writable and barely
>>>>> human readable).  Substrait does not have a text serialization.  I
>>>>> believe there is some desire for one (maybe Sasha wants to give it a
>>>>> try?).  A text format for Substrait would solve this problem because
>>>>> you could go "text expression" -> "substrait expression" -> "arrow
>>>>> expression".
>>>>> 
>>>>> Since no text format exists for Substrait I think that Substrait does
>>>>> not currently solve this problem or overlap with your work.  However,
>>>>> at some point (hopefully), it will.
>>>>> 
>>>>> There was also a fairly recent proposal for a parser for gandiva
>>>> expressions[1].
>>>>> 
>>>>> Compared with [1] I think this proposal is simpler to parse but lacks
>>>>> some of the shortcut conveniences (e.g. implicit types for literals,
>>>>> support for common infix operators (+, -, /, ...)).
>>>>> 
>>>>> Both are lacking parameters (e.g. "(equals(!x, %threshold%))" which I
>>>> think
>>>>> would be useful to have as one could then do something like `auto
>>>>> arrow_expr = Parse(my_expr, threshold)`.
>>>>> 
>>>>> Does a field_ref have to be a field name or can it be a field index?
>>>>> The latter is quite useful when the schema has duplicate field names.
>>>>> 
>>>>> I'm +0.5 on this change.  I worry a bit about having (eventually)
>>>>> three different syntaxes.  However, at the moment we have zero.
>>>>> 
>>>>> [1] https://lists.apache.org/thread/0oyns380hgzvl0y8kwgqoo4fp7ntt3bn
>>>>> 
>>>>>> On Wed, Oct 5, 2022 at 1:55 PM Sasha Krassovsky
>>>>>> <krassovskysa...@gmail.com> wrote:
>>>>>> 
>>>>>> Hi David,
>>>>>> Could you elaborate on which part of my proposal overlaps with
>>>> Substrait? I don’t see anything in Substrait that allows me to do something
>>>> along the lines of
>>>>>> 
>>>>>> Expression e = Expression::FromString(“(add !.a $int32:1)”);
>>>>>> 
>>>>>> in the code.
>>>>>> 
>>>>>> Sasha
>>>>>> 
>>>>>>>> On Oct 5, 2022, at 1:35 PM, Lee, David 
>>>>>>>> <david....@blackrock.com.INVALID>
>>>> wrote:
>>>>>>> 
>>>>>>> I believe this is what substrait.io <http://substrait.io/> is trying
>>>> to accomplish..
>>>>>>> 
>>>>>>> Here's some additional info:
>>>>>>> https://substrait.io/ <https://substrait.io/>
>>>>>>> 
>>>>>>> https://www.youtube.com/watch?v=5JjaB7p3Sjk <https://www.youtube.com/
>>>> watch?v=5JjaB7p3Sjk>
>>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: Sasha Krassovsky <krassovskysa...@gmail.com <mailto:
>>>> krassovskysa...@gmail.com>>
>>>>>>> Sent: Wednesday, October 5, 2022 11:29 AM
>>>>>>> To: dev@arrow.apache.org <mailto:dev@arrow.apache.org>
>>>>>>> Subject: Parser for expressions
>>>>>>> 
>>>>>>> External Email: Use caution with links and attachments
>>>>>>> 
>>>>>>> 
>>>>>>> Hi everyone,
>>>>>>> I’ve noticed on the mailing list a few times people asking for a more
>>>> convenient way to construct an Expression, namely using a string of some
>>>> sort. I’ve found myself wishing for something like this too when
>>>> constructing ExecPlans, and so I’ve gone ahead and implemented a parser
>>>> [0]. I was wondering if anyone had any thoughts about the design of the
>>>> language?
>>>>>>> 
>>>>>>> The current implementation parses a lisp-like language. This language
>>>> has three types of expressions (mirroring the current Expression API):
>>>>>>> 
>>>>>>> - A call is a normal s-expression, it has the name of the kernel and
>>>> the list of arguments. Its arguments can be any expression.
>>>>>>> - A literal (i.e. scalar) starts with a $ and specifies a type and a
>>>> value, separated by a colon. For example, `$decimal(12,2):10.01` specifies
>>>> a literal of type decimal(12, 2) and a value of 10.01.
>>>>>>> - A field_ref starts with a ! and is an identifier in the schema
>>>> following the DotPath syntax we already have [1].
>>>>>>> 
>>>>>>> So for example, the expression
>>>>>>> 
>>>>>>> (add $int32:1 (multiply !.a !.b))
>>>>>>> 
>>>>>>> computes a*b+1 given a batch with columns named a and b.
>>>>>>> 
>>>>>>> The reason I chose a lisp-like language is that it very directly
>>>> translates to the current Expression API and that it feels more natural to
>>>> use a prefix notation for a language where all functions have a name (i.e.
>>>> no +, -, *, etc.).
>>>>>>> 
>>>>>>> I’m currently working on a followup PR for specifying ExecPlans from a
>>>> string (mainly for easier testing), and would like that language to be an
>>>> extension of this one. Looking forward to hearing everyone’s thoughts!
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Sasha Krassovsky
>>>>>>> 
>>>>>>> [0] https://urldefense.com/v3/__https://github.com/apache/
>>>> arrow/pull/14287__;!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3
>>>> Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG_6oZdDk$ <
>>>> https://urldefense.com/v3/__https://github.com/apache/
>>>> arrow/pull/14287__;!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3
>>>> Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG_6oZdDk$>   <
>>>> https://urldefense.com/v3/__https://github.com/apache/
>>>> arrow/pull/14287__;!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3
>>>> Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG_6oZdDk$ <
>>>> https://urldefense.com/v3/__https://github.com/apache/
>>>> arrow/pull/14287__;!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3
>>>> Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG_6oZdDk$>  >
>>>>>>> [1] https://urldefense.com/v3/__https://github.com/apache/
>>>> arrow/blob/master/cpp/src/arrow/type.h*L1726__;Iw!!KSjYCgUGsB4!
>>>> enYRTooMrwyJKJzgTlQMdMhpfT7ys3Ol8a8HcHUvxRYRN-a-Up_
>>>> axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG0GkL0Mn$ <https://urldefense.com/v3/__
>>>> https://github.com/apache/arrow/blob/master/cpp/src/
>>>> arrow/type.h*L1726__;Iw!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3
>>>> Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG0GkL0Mn$>   <
>>>> https://urldefense.com/v3/__https://github.com/apache/
>>>> arrow/blob/master/cpp/src/arrow/type.h*L1726__;Iw!!KSjYCgUGsB4!
>>>> enYRTooMrwyJKJzgTlQMdMhpfT7ys3Ol8a8HcHUvxRYRN-a-Up_
>>>> axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG0GkL0Mn$ <https://urldefense.com/v3/__
>>>> https://github.com/apache/arrow/blob/master/cpp/src/
>>>> arrow/type.h*L1726__;Iw!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3
>>>> Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG0GkL0Mn$>  >
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> This message may contain information that is confidential or
>>>> privileged. If you are not the intended recipient, please advise the sender
>>>> immediately and delete this message. See http://www.blackrock.com/
>>>> corporate/compliance/email-disclaimers <http://www.blackrock.com/
>>>> corporate/compliance/email-disclaimers> for further information.  Please
>>>> refer to http://www.blackrock.com/corporate/compliance/privacy-policy <
>>>> http://www.blackrock.com/corporate/compliance/privacy-policy> for more
>>>> information about BlackRock’s Privacy Policy.
>>>>>>> 
>>>>>>> 
>>>>>>> For a list of BlackRock's office addresses worldwide, see
>>>> http://www.blackrock.com/corporate/about-us/contacts-locations <
>>>> http://www.blackrock.com/corporate/about-us/contacts-locations>.
>>>>>>> 
>>>>>>> © 2022 BlackRock, Inc. All rights reserved.
>>>>>> 
>>>> 
> 

Reply via email to