Hi Imran, It seems that you do not cache your underlying DataFrame. I would suggest to force a cache with tweets.cache() and then tweets.count(). Let us know if your problem persists.
Best, Anastasios On Wed, Jul 19, 2017 at 2:49 PM, Imran Rajjad <raj...@gmail.com> wrote: > Greetings, > > We are trying out Spark 2 + ThriftServer to join multiple > collections from a Solr Cloud (6.4.x). I have followed this blog > https://lucidworks.com/2015/08/20/solr-spark-sql-datasource/ > > I understand that initially spark populates the temporary table with 18633014 > records and takes its due time, however any following SQLs on the > temporary table take the same amount of time . It seems the temporary > tables is not being re-used or cached. The fields in the solr collection do > not have the docValue enabled, could that be the reason? Apparently I have > missed a trick > > regards, > Imran > > -- > I.R > -- -- Anastasios Zouzias <a...@zurich.ibm.com>