Hi Roman,

Thanks a lot for the answer (and the pull request). As I said initially, I was 
under the impression that the reason was the lack of affinity.
I understand the reason and the current design and I think we all agreed that 
this is not optimal and that it should be reworked in the new design. 
Especially the sort of silent behavior. That being said, more than a warning : 
having joins in // inter partitions would be very helpful but I understand that 
it is not straightforward.

As always you guys are very reactive and helpful. Keep up the great work. 
Appreciate it.

-----Original Message-----
From: Roman Kondakov <kondako...@mail.ru.INVALID> 
Sent: Monday, November 18, 2019 11:04 AM
To: dev@ignite.apache.org
Subject: Re: New SQL execution engine

Hi, Steve

This behavior is actually not a bug, but this is not obvious. I'll try to 
explain.

When query parallelism = N is turned on, it means that each cache is divided 
into N parts from the SQL point of view. Every SQL query is executed 
independently over each particular part, and then results are merged together 
during the reducer step.

This is absolutely identical to the distributed query execution, where instead 
of a single node with query parallelism = N, we have N nodes with query 
parallelism = 1. SQL query is executed over each partition of data on all nodes 
and then results are merged on reducer.

As we can see, query parallelism is equivalent to the distributed query 
execution. When we do joins over distributed tables, we need to think about the 
collocation of data [1]. If data is not collocated, we get a wrong result. This 
happens silently, which is not good, IMO.

I reworked your example a bit in order to impose collocation on the joining key 
and now join returns correct result [2].

Current approach in configuration and query execution looks very uncomfortable 
and should be completely redesigned in the new engine.

[1] 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapacheignite-sql.readme.io%2Fdocs%2Fdistributed-joins&amp;data=02%7C01%7CSteve.Hostettler%40wolterskluwer.com%7C68a93ad417fc4e70ed1808d76c0e9f53%7C8ac76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C637096682368420072&amp;sdata=82bDWI1PHUOzNz95A5F%2Flyiqlrb9aQ2vadxhE%2FK47LM%3D&amp;reserved=0

[2] 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fhostettler%2FigniteParallelQueries%2Fpull%2F1&amp;data=02%7C01%7CSteve.Hostettler%40wolterskluwer.com%7C68a93ad417fc4e70ed1808d76c0e9f53%7C8ac76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C637096682368420072&amp;sdata=QCvNEKqGGyZYOXQbF0sG0DUCzYJCnKoWleFTMtngcsc%3D&amp;reserved=0


--
Kind Regards
Roman Kondakov

On 16.11.2019 12:50, steve.hostett...@gmail.com wrote:
> Actually I am now wondering whether this is not just a bug and that I 
> should record it as such. As the behavior is different with and 
> without the parallelism and there is no warning during execution or in the 
> api.
>
> Any thought?
>
>
>
> --
> Sent from: 
> https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fapach
> e-ignite-developers.2346864.n4.nabble.com%2F&amp;data=02%7C01%7CSteve.
> Hostettler%40wolterskluwer.com%7C68a93ad417fc4e70ed1808d76c0e9f53%7C8a
> c76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C637096682368420072&amp;sdata=
> LzUii%2BuNqHhS1YbFLNwpe7cn6XRRpKrrSO6wS5zNlSU%3D&amp;reserved=0

Reply via email to