[ https://issues.apache.org/jira/browse/SPARK-21198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16062070#comment-16062070 ]
Saif Addin commented on SPARK-21198: ------------------------------------ Thanks [~viirya] My program lists on a webpage the list of available tables by clicking a dropdown list of databases. Tables also show schema and metadata. Each time people click dropdown, I re-request the table list to ensure I keep the list up-to-date. As database list and each table schema takes too long to request dynamically, I store them in a cache as people use them. But I would love this process took less time (Schema, isCached, isTemporary). If you may take other suggestions, since TempViews always appear in a list tables, I have to do some manual logic to extract TempViews from requested tables list. Also, isCached comes from SparkSession, not from the same place where catalog information is requested. Our amount of tables is not insane (about 20 dbs and tops 200 tables per db, with some dbs only with a bunch of tables instead) Best Saif > SparkSession catalog is terribly slow > ------------------------------------- > > Key: SPARK-21198 > URL: https://issues.apache.org/jira/browse/SPARK-21198 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.1.0 > Reporter: Saif Addin > > We have a considerably large Hive metastore and a Spark program that goes > through Hive data availability. > In spark 1.x, we were using sqlConext.tableNames, sqlContext.sql() and > sqlContext.isCached() to go throgh Hive metastore information. > Once migrated to spark 2.x we switched over SparkSession.catalog instead, but > it turns out that both listDatabases() and listTables() take between 5 to 20 > minutes depending on the database to return results, using operations such as > the following one: > spark.catalog.listTables(db).filter(__.isTemporary).map(__.name).collect > and made the program unbearably slow to return a list of tables. > I know we still have spark.sqlContext.tableNames as workaround but I am > assuming this is going to be deprecated anytime soon? -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org