I did some basic testing of multi-source queries with the most recent Spark: https://github.com/GavinRay97/spark-playground/blob/44a756acaee676a9b0c128466e4ab231a7df8d46/src/main/scala/Application.scala#L46-L115
The output of "spark.time()" surprised me: SELECT p.id, p.name, t.id, t.title FROM db1.public.person p JOIN db2.public.todos t ON p.id = t.person_id WHERE p.id = 1 +---+----+---+------+ | id|name| id| title| +---+----+---+------+ | 1| Bob| 1|Todo 1| | 1| Bob| 2|Todo 2| +---+----+---+------+ Time taken: 168 ms SELECT p.id, p.name, t.id, t.title FROM db1.public.person p JOIN db2.public.todos t ON p.id = t.person_id WHERE p.id = 2 LIMIT 1 +---+-----+---+------+ | id| name| id| title| +---+-----+---+------+ | 2|Alice| 3|Todo 3| +---+-----+---+------+ Time taken: 228 ms Calcite and Teiid manage to do this on the order of 5-50ms for basic queries, so I'm curious about the technical specifics on why Spark appears to be so much slower here?