Hi Tomas,
One option is to cache your table as Parquet files into Alluxio (which can
serve as an in-memory distributed caching layer for Spark in your case).
The code on Spark will be like
> df.write.parquet("alluxio://master:19998/data.parquet")> df =
>
I believe the in-memory solution misses the storage indexes that parquet / orc
have.
The in-memory solution is more suitable if you iterate in the whole set of data
frequently.
> Am 15.01.2019 um 19:20 schrieb Tomas Bartalos :
>
> Hello,
>
> I'm using spark-thrift server and I'm searching
Hi Tomas,
Have you considered using something like https://www.alluxio.org/ for you
cache? Seems like a possible solution for what your trying to do.
-Todd
On Tue, Jan 15, 2019 at 11:24 PM 大啊 wrote:
> Hi ,Tomas.
> Thanks for your question give me some prompt.But the best way use cache
>
Hello,
I'm using spark-thrift server and I'm searching for best performing
solution to query hot set of data. I'm processing records with nested
structure, containing subtypes and arrays. 1 record takes up several KB.
I tried to make some improvement with cache table:
cache table event_jan_01