Looking at the code, i am still not sure without trying.

but i am more inclined to think now that this specific combination, A'B
with A and B non-int row keys, is not supported.

As a general principle, we followed where our guinea pigs get us, and were
not trying to fill all possible gaps and holes, with the belief that will
get us 80/20 caps in shortest time.

As for the rest, we wait for somebody to ask for it because they need it.

But that example is legal and patch should be fundamentally possible and
easy enough to handle this case within this architecture.




On Wed, Jun 18, 2014 at 6:29 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:

> also, if something is not supported, such as your example, (if it is not
> supported), optimizer would simply state so with rejection. But if it takes
> it in, then I am pretty sure it will do the right job (or at least there's
> a unit test for that case that is asserted on a trivial example).
>
> Here, by trivial i mean local pipelines for 2-split inputs, that's the
> general rule i used.
>
>
> On Wed, Jun 18, 2014 at 6:26 PM, Dmitriy Lyubimov <dlie...@gmail.com>
> wrote:
>
>> a little bit of additional information is that for rewriting rules stage
>> optimizer does 3 passes over semantic tree, each pass matching a tree
>> fragment using Scala case class matching and rewriting. This allows to
>> match and rewrite pretty elaborate tree structure fragments, although at
>> the moment i don't think we dig farther than immediate children, and
>> perhaps some their known attributes, in most cases.
>>
>> More detailed description that that i think is only in reading the source.
>>
>>
>> On Wed, Jun 18, 2014 at 6:19 PM, Dmitriy Lyubimov <dlie...@gmail.com>
>> wrote:
>>
>>> E.g. i know for sure A %.% B is legal where A is string-keyed and b is
>>> int-keyed.
>>>
>>> This is kind of not the point. the point is that you can easily modify
>>> rewriting rules and operators to cover misses. (there shouldn't be many,
>>> since we've already written quite a bit of expressions out there).
>>>
>>>
>>> On Wed, Jun 18, 2014 at 6:15 PM, Dmitriy Lyubimov <dlie...@gmail.com>
>>> wrote:
>>>
>>>> I am not sure. There are more rewriting rules than i can remember, and
>>>> i did not write an algorithm ( i think) that would involve this
>>>> combination. I guess the best thing is to try in a shell or a unit test. if
>>>> it falls thru, perhaps a new plan element needs to be added (although I am
>>>> not very sure there isn't already). I know that there are join-based
>>>> multiplicative operators there.
>>>>
>>>>
>>>> On Wed, Jun 18, 2014 at 6:11 PM, Ted Dunning <ted.dunn...@gmail.com>
>>>> wrote:
>>>>
>>>>> On Wed, Jun 18, 2014 at 6:07 PM, Dmitriy Lyubimov <dlie...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> > in simple terms, if non-integer row keying is used anywhere, it
>>>>> tries to
>>>>> > rewrite pipelines so that product orientations never require non-int
>>>>> keys
>>>>> > to denote columns. In case pipeline makes it impossible, optimizer
>>>>> will
>>>>> > refuse to produce a plan.
>>>>> >
>>>>> > e.g. suppose A is distributed string-keyed.
>>>>> >
>>>>> > (A.t %.% A) collect  // ok
>>>>> >
>>>>>
>>>>> What happens with the important case of  B.t %.% A where both A and B
>>>>> are
>>>>> string keyed?
>>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to