Hi Weston,
I’d be happy to donate something like this to Sunstrait if that’s useful, I was 
thinking of proving out a design here before going there. However we could also 
just go straight there :)

Regarding infix operators and such the edge case I was thinking of is that a 
user could potentially add a kernel to the registry called e.g. “+”. Would the 
parser implicitly convert any instances of “+” to “add” and break that?

Implicit typing for literals and parameters can probably also be added without 
issues to the current scheme. Would the parameters be passed as an 
std::unordered_map? 

> Does a field_ref have to be a field name or can it be a field index?

It can be a field index or even a field path. The field ref is parsed using 
FieldRef::FromDotPath ([1] in my original message), which can express any 
FieldRef. 

Sasha

> 6 окт. 2022 г., в 16:08, Weston Pace <weston.p...@gmail.com> написал(а):
> 
> Currently Substrait only has a binary (protobuf) serialization (and a
> protobuf JSON one but that's not really human writable and barely
> human readable).  Substrait does not have a text serialization.  I
> believe there is some desire for one (maybe Sasha wants to give it a
> try?).  A text format for Substrait would solve this problem because
> you could go "text expression" -> "substrait expression" -> "arrow
> expression".
> 
> Since no text format exists for Substrait I think that Substrait does
> not currently solve this problem or overlap with your work.  However,
> at some point (hopefully), it will.
> 
> There was also a fairly recent proposal for a parser for gandiva 
> expressions[1].
> 
> Compared with [1] I think this proposal is simpler to parse but lacks
> some of the shortcut conveniences (e.g. implicit types for literals,
> support for common infix operators (+, -, /, ...)).
> 
> Both are lacking parameters (e.g. "(equals(!x, %threshold%))" which I think
> would be useful to have as one could then do something like `auto
> arrow_expr = Parse(my_expr, threshold)`.
> 
> Does a field_ref have to be a field name or can it be a field index?
> The latter is quite useful when the schema has duplicate field names.
> 
> I'm +0.5 on this change.  I worry a bit about having (eventually)
> three different syntaxes.  However, at the moment we have zero.
> 
> [1] https://lists.apache.org/thread/0oyns380hgzvl0y8kwgqoo4fp7ntt3bn
> 
>> On Wed, Oct 5, 2022 at 1:55 PM Sasha Krassovsky
>> <krassovskysa...@gmail.com> wrote:
>> 
>> Hi David,
>> Could you elaborate on which part of my proposal overlaps with Substrait? I 
>> don’t see anything in Substrait that allows me to do something along the 
>> lines of
>> 
>> Expression e = Expression::FromString(“(add !.a $int32:1)”);
>> 
>> in the code.
>> 
>> Sasha
>> 
>>>> On Oct 5, 2022, at 1:35 PM, Lee, David <david....@blackrock.com.INVALID> 
>>>> wrote:
>>> 
>>> I believe this is what substrait.io <http://substrait.io/> is trying to 
>>> accomplish..
>>> 
>>> Here's some additional info:
>>> https://substrait.io/ <https://substrait.io/>
>>> 
>>> https://www.youtube.com/watch?v=5JjaB7p3Sjk 
>>> <https://www.youtube.com/watch?v=5JjaB7p3Sjk>
>>> 
>>> -----Original Message-----
>>> From: Sasha Krassovsky <krassovskysa...@gmail.com 
>>> <mailto:krassovskysa...@gmail.com>>
>>> Sent: Wednesday, October 5, 2022 11:29 AM
>>> To: dev@arrow.apache.org <mailto:dev@arrow.apache.org>
>>> Subject: Parser for expressions
>>> 
>>> External Email: Use caution with links and attachments
>>> 
>>> 
>>> Hi everyone,
>>> I’ve noticed on the mailing list a few times people asking for a more 
>>> convenient way to construct an Expression, namely using a string of some 
>>> sort. I’ve found myself wishing for something like this too when 
>>> constructing ExecPlans, and so I’ve gone ahead and implemented a parser 
>>> [0]. I was wondering if anyone had any thoughts about the design of the 
>>> language?
>>> 
>>> The current implementation parses a lisp-like language. This language has 
>>> three types of expressions (mirroring the current Expression API):
>>> 
>>> - A call is a normal s-expression, it has the name of the kernel and the 
>>> list of arguments. Its arguments can be any expression.
>>> - A literal (i.e. scalar) starts with a $ and specifies a type and a value, 
>>> separated by a colon. For example, `$decimal(12,2):10.01` specifies a 
>>> literal of type decimal(12, 2) and a value of 10.01.
>>> - A field_ref starts with a ! and is an identifier in the schema following 
>>> the DotPath syntax we already have [1].
>>> 
>>> So for example, the expression
>>> 
>>> (add $int32:1 (multiply !.a !.b))
>>> 
>>> computes a*b+1 given a batch with columns named a and b.
>>> 
>>> The reason I chose a lisp-like language is that it very directly translates 
>>> to the current Expression API and that it feels more natural to use a 
>>> prefix notation for a language where all functions have a name (i.e. no +, 
>>> -, *, etc.).
>>> 
>>> I’m currently working on a followup PR for specifying ExecPlans from a 
>>> string (mainly for easier testing), and would like that language to be an 
>>> extension of this one. Looking forward to hearing everyone’s thoughts!
>>> 
>>> Thanks,
>>> Sasha Krassovsky
>>> 
>>> [0] 
>>> https://urldefense.com/v3/__https://github.com/apache/arrow/pull/14287__;!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG_6oZdDk$
>>>  
>>> <https://urldefense.com/v3/__https://github.com/apache/arrow/pull/14287__;!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG_6oZdDk$>
>>>    
>>> <https://urldefense.com/v3/__https://github.com/apache/arrow/pull/14287__;!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG_6oZdDk$
>>>  
>>> <https://urldefense.com/v3/__https://github.com/apache/arrow/pull/14287__;!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG_6oZdDk$>
>>>   >
>>> [1] 
>>> https://urldefense.com/v3/__https://github.com/apache/arrow/blob/master/cpp/src/arrow/type.h*L1726__;Iw!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG0GkL0Mn$
>>>  
>>> <https://urldefense.com/v3/__https://github.com/apache/arrow/blob/master/cpp/src/arrow/type.h*L1726__;Iw!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG0GkL0Mn$>
>>>    
>>> <https://urldefense.com/v3/__https://github.com/apache/arrow/blob/master/cpp/src/arrow/type.h*L1726__;Iw!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG0GkL0Mn$
>>>  
>>> <https://urldefense.com/v3/__https://github.com/apache/arrow/blob/master/cpp/src/arrow/type.h*L1726__;Iw!!KSjYCgUGsB4!enYRTooMrwyJKJzgTlQMdMhpfT7ys3Ol8a8HcHUvxRYRN-a-Up_axLfPGOpUtEDCDs0ee7lHPAzVdz-dooULG0GkL0Mn$>
>>>   >
>>> 
>>> 
>>> 
>>> This message may contain information that is confidential or privileged. If 
>>> you are not the intended recipient, please advise the sender immediately 
>>> and delete this message. See 
>>> http://www.blackrock.com/corporate/compliance/email-disclaimers 
>>> <http://www.blackrock.com/corporate/compliance/email-disclaimers> for 
>>> further information.  Please refer to 
>>> http://www.blackrock.com/corporate/compliance/privacy-policy 
>>> <http://www.blackrock.com/corporate/compliance/privacy-policy> for more 
>>> information about BlackRock’s Privacy Policy.
>>> 
>>> 
>>> For a list of BlackRock's office addresses worldwide, see 
>>> http://www.blackrock.com/corporate/about-us/contacts-locations 
>>> <http://www.blackrock.com/corporate/about-us/contacts-locations>.
>>> 
>>> © 2022 BlackRock, Inc. All rights reserved.
>> 

Reply via email to