Re: SparkSQL 'describe table' tries to look at all records

2015-07-13 Thread Yana Kadiyska
Have you seen https://issues.apache.org/jira/browse/SPARK-6910I opened
https://issues.apache.org/jira/browse/SPARK-6984 which I think is related
to this as well. There are a bunch of issues attached to it but basically
yes, Spark interactions with a large metastore are bad...very bad if your
metastore is large.

On Sun, Jul 12, 2015 at 11:39 PM, Jerrick Hoang jerrickho...@gmail.com
wrote:

 Sorry all for not being clear. I'm using spark 1.4 and the table is a hive
 table, and the table is partitioned.

 On Sun, Jul 12, 2015 at 6:36 PM, Yin Huai yh...@databricks.com wrote:

 Jerrick,

 Let me ask a few clarification questions. What is the version of Spark?
 Is the table a hive table? What is the format of the table? Is the table
 partitioned?

 Thanks,

 Yin

 On Sun, Jul 12, 2015 at 6:01 PM, ayan guha guha.a...@gmail.com wrote:

 Describe computes statistics, so it will try to query the table. The one
 you are looking for is df.printSchema()

 On Mon, Jul 13, 2015 at 10:03 AM, Jerrick Hoang jerrickho...@gmail.com
 wrote:

 Hi all,

 I'm new to Spark and this question may be trivial or has already been
 answered, but when I do a 'describe table' from SparkSQL CLI it seems to
 try looking at all records at the table (which takes a really long time for
 big table) instead of just giving me the metadata of the table. Would
 appreciate if someone can give me some pointers, thanks!




 --
 Best Regards,
 Ayan Guha






Re: SparkSQL 'describe table' tries to look at all records

2015-07-12 Thread Jerrick Hoang
Sorry all for not being clear. I'm using spark 1.4 and the table is a hive
table, and the table is partitioned.

On Sun, Jul 12, 2015 at 6:36 PM, Yin Huai yh...@databricks.com wrote:

 Jerrick,

 Let me ask a few clarification questions. What is the version of Spark? Is
 the table a hive table? What is the format of the table? Is the table
 partitioned?

 Thanks,

 Yin

 On Sun, Jul 12, 2015 at 6:01 PM, ayan guha guha.a...@gmail.com wrote:

 Describe computes statistics, so it will try to query the table. The one
 you are looking for is df.printSchema()

 On Mon, Jul 13, 2015 at 10:03 AM, Jerrick Hoang jerrickho...@gmail.com
 wrote:

 Hi all,

 I'm new to Spark and this question may be trivial or has already been
 answered, but when I do a 'describe table' from SparkSQL CLI it seems to
 try looking at all records at the table (which takes a really long time for
 big table) instead of just giving me the metadata of the table. Would
 appreciate if someone can give me some pointers, thanks!




 --
 Best Regards,
 Ayan Guha





Re: SparkSQL 'describe table' tries to look at all records

2015-07-12 Thread Ted Yu
Which Spark release do you use ?

Cheers

On Sun, Jul 12, 2015 at 5:03 PM, Jerrick Hoang jerrickho...@gmail.com
wrote:

 Hi all,

 I'm new to Spark and this question may be trivial or has already been
 answered, but when I do a 'describe table' from SparkSQL CLI it seems to
 try looking at all records at the table (which takes a really long time for
 big table) instead of just giving me the metadata of the table. Would
 appreciate if someone can give me some pointers, thanks!



Re: SparkSQL 'describe table' tries to look at all records

2015-07-12 Thread ayan guha
Describe computes statistics, so it will try to query the table. The one
you are looking for is df.printSchema()

On Mon, Jul 13, 2015 at 10:03 AM, Jerrick Hoang jerrickho...@gmail.com
wrote:

 Hi all,

 I'm new to Spark and this question may be trivial or has already been
 answered, but when I do a 'describe table' from SparkSQL CLI it seems to
 try looking at all records at the table (which takes a really long time for
 big table) instead of just giving me the metadata of the table. Would
 appreciate if someone can give me some pointers, thanks!




-- 
Best Regards,
Ayan Guha


Re: SparkSQL 'describe table' tries to look at all records

2015-07-12 Thread Yin Huai
Jerrick,

Let me ask a few clarification questions. What is the version of Spark? Is
the table a hive table? What is the format of the table? Is the table
partitioned?

Thanks,

Yin

On Sun, Jul 12, 2015 at 6:01 PM, ayan guha guha.a...@gmail.com wrote:

 Describe computes statistics, so it will try to query the table. The one
 you are looking for is df.printSchema()

 On Mon, Jul 13, 2015 at 10:03 AM, Jerrick Hoang jerrickho...@gmail.com
 wrote:

 Hi all,

 I'm new to Spark and this question may be trivial or has already been
 answered, but when I do a 'describe table' from SparkSQL CLI it seems to
 try looking at all records at the table (which takes a really long time for
 big table) instead of just giving me the metadata of the table. Would
 appreciate if someone can give me some pointers, thanks!




 --
 Best Regards,
 Ayan Guha



SparkSQL 'describe table' tries to look at all records

2015-07-12 Thread Jerrick Hoang
Hi all,

I'm new to Spark and this question may be trivial or has already been
answered, but when I do a 'describe table' from SparkSQL CLI it seems to
try looking at all records at the table (which takes a really long time for
big table) instead of just giving me the metadata of the table. Would
appreciate if someone can give me some pointers, thanks!