Re: SparkSQL 'describe table' tries to look at all records
Have you seen https://issues.apache.org/jira/browse/SPARK-6910I opened https://issues.apache.org/jira/browse/SPARK-6984 which I think is related to this as well. There are a bunch of issues attached to it but basically yes, Spark interactions with a large metastore are bad...very bad if your metastore is large. On Sun, Jul 12, 2015 at 11:39 PM, Jerrick Hoang jerrickho...@gmail.com wrote: Sorry all for not being clear. I'm using spark 1.4 and the table is a hive table, and the table is partitioned. On Sun, Jul 12, 2015 at 6:36 PM, Yin Huai yh...@databricks.com wrote: Jerrick, Let me ask a few clarification questions. What is the version of Spark? Is the table a hive table? What is the format of the table? Is the table partitioned? Thanks, Yin On Sun, Jul 12, 2015 at 6:01 PM, ayan guha guha.a...@gmail.com wrote: Describe computes statistics, so it will try to query the table. The one you are looking for is df.printSchema() On Mon, Jul 13, 2015 at 10:03 AM, Jerrick Hoang jerrickho...@gmail.com wrote: Hi all, I'm new to Spark and this question may be trivial or has already been answered, but when I do a 'describe table' from SparkSQL CLI it seems to try looking at all records at the table (which takes a really long time for big table) instead of just giving me the metadata of the table. Would appreciate if someone can give me some pointers, thanks! -- Best Regards, Ayan Guha
Re: SparkSQL 'describe table' tries to look at all records
Sorry all for not being clear. I'm using spark 1.4 and the table is a hive table, and the table is partitioned. On Sun, Jul 12, 2015 at 6:36 PM, Yin Huai yh...@databricks.com wrote: Jerrick, Let me ask a few clarification questions. What is the version of Spark? Is the table a hive table? What is the format of the table? Is the table partitioned? Thanks, Yin On Sun, Jul 12, 2015 at 6:01 PM, ayan guha guha.a...@gmail.com wrote: Describe computes statistics, so it will try to query the table. The one you are looking for is df.printSchema() On Mon, Jul 13, 2015 at 10:03 AM, Jerrick Hoang jerrickho...@gmail.com wrote: Hi all, I'm new to Spark and this question may be trivial or has already been answered, but when I do a 'describe table' from SparkSQL CLI it seems to try looking at all records at the table (which takes a really long time for big table) instead of just giving me the metadata of the table. Would appreciate if someone can give me some pointers, thanks! -- Best Regards, Ayan Guha
Re: SparkSQL 'describe table' tries to look at all records
Which Spark release do you use ? Cheers On Sun, Jul 12, 2015 at 5:03 PM, Jerrick Hoang jerrickho...@gmail.com wrote: Hi all, I'm new to Spark and this question may be trivial or has already been answered, but when I do a 'describe table' from SparkSQL CLI it seems to try looking at all records at the table (which takes a really long time for big table) instead of just giving me the metadata of the table. Would appreciate if someone can give me some pointers, thanks!
Re: SparkSQL 'describe table' tries to look at all records
Describe computes statistics, so it will try to query the table. The one you are looking for is df.printSchema() On Mon, Jul 13, 2015 at 10:03 AM, Jerrick Hoang jerrickho...@gmail.com wrote: Hi all, I'm new to Spark and this question may be trivial or has already been answered, but when I do a 'describe table' from SparkSQL CLI it seems to try looking at all records at the table (which takes a really long time for big table) instead of just giving me the metadata of the table. Would appreciate if someone can give me some pointers, thanks! -- Best Regards, Ayan Guha
Re: SparkSQL 'describe table' tries to look at all records
Jerrick, Let me ask a few clarification questions. What is the version of Spark? Is the table a hive table? What is the format of the table? Is the table partitioned? Thanks, Yin On Sun, Jul 12, 2015 at 6:01 PM, ayan guha guha.a...@gmail.com wrote: Describe computes statistics, so it will try to query the table. The one you are looking for is df.printSchema() On Mon, Jul 13, 2015 at 10:03 AM, Jerrick Hoang jerrickho...@gmail.com wrote: Hi all, I'm new to Spark and this question may be trivial or has already been answered, but when I do a 'describe table' from SparkSQL CLI it seems to try looking at all records at the table (which takes a really long time for big table) instead of just giving me the metadata of the table. Would appreciate if someone can give me some pointers, thanks! -- Best Regards, Ayan Guha