Performance Improvement with Hive/Thrift Server

Artemis User Mon, 12 Jul 2021 08:15:41 -0700

We are trying to switch from Postgres to the Spark's built-in Hive withThrift server as the data sink to persist the ML result data, with thehope that Hive would improve the ML pipeline performance. However, itturned out that it took significantly longer for Hive to persistdataframes (via the SQL's saveAsTable API) for Postgres using JDBC. Does anyone have experienced similar problems with Hive? Anyrecommendations in performance improvement would be highly appreciated.

We are using Spark in standalone mode. I would assume that runningSpark on a real Hive database or on simply on Hadoop would be moredesired. Has anyone done any performance comparison between runningSpark with built-in Hive (with just the metastore) vs Spark on afull-fledged Hive DB vs Spark with built-in Hive on Hadoop? Thanks!


-- ND



---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Performance Improvement with Hive/Thrift Server

Reply via email to