Re: Ephemeral Hive metastore for HiveContext?

2014-10-27 Thread Cheng Lian
I have never tried this yet, but maybe you can use an in-memory Derby 
database as metastore 
https://db.apache.org/derby/docs/10.7/devguide/cdevdvlpinmemdb.html


I'll investigate this when free, guess we can use this for Spark SQL 
Hive support testing.


On 10/27/14 4:38 PM, Jianshi Huang wrote:

There's an annoying small usability issue in HiveContext.

By default, it creates a local metastore which forbids other processes 
using HiveContext to be launched from the same directory.


How can I make the metastore local to each HiveContext? Is there an 
in-memory metastore configuration? /tmp/ temp folders is one 
solution, but it's not elegant and I still need to clean up the files...


I can add hive-site.xml and use a shared metastore, however they'll 
still operate in the same catalog space.


(Simple) SQLContext by default uses in-memory catalog which is bound 
to each context. Since HiveContext is a subclass, we should make the 
same semantics as default. Make sense?


Spark is very much functional and shared nothing, these are wonderful 
features. Let's not have something global as a dependency.



Cheers,
--
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github  Blog: http://huangjs.github.com/



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Ephemeral Hive metastore for HiveContext?

2014-10-27 Thread Ted Yu
Please see 
https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin#AdminManualMetastoreAdmin-EmbeddedMetastore

Cheers

On Oct 27, 2014, at 6:20 AM, Cheng Lian lian.cs@gmail.com wrote:

 I have never tried this yet, but maybe you can use an in-memory Derby 
 database as metastore 
 https://db.apache.org/derby/docs/10.7/devguide/cdevdvlpinmemdb.html
 
 I'll investigate this when free, guess we can use this for Spark SQL Hive 
 support testing.
 
 On 10/27/14 4:38 PM, Jianshi Huang wrote:
 There's an annoying small usability issue in HiveContext.
 
 By default, it creates a local metastore which forbids other processes using 
 HiveContext to be launched from the same directory.
 
 How can I make the metastore local to each HiveContext? Is there an 
 in-memory metastore configuration? /tmp/ temp folders is one solution, 
 but it's not elegant and I still need to clean up the files...
 
 I can add hive-site.xml and use a shared metastore, however they'll still 
 operate in the same catalog space.
 
 (Simple) SQLContext by default uses in-memory catalog which is bound to each 
 context. Since HiveContext is a subclass, we should make the same semantics 
 as default. Make sense?
 
 Spark is very much functional and shared nothing, these are wonderful 
 features. Let's not have something global as a dependency.
 
 
 Cheers,
 -- 
 Jianshi Huang
 
 LinkedIn: jianshi
 Twitter: @jshuang
 Github  Blog: http://huangjs.github.com/
 
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 


Re: Ephemeral Hive metastore for HiveContext?

2014-10-27 Thread Cheng Lian
Thanks Ted, this is exactly what Spark SQL LocalHiveContext does. To 
make an embedded metastore local to a single HiveContext, we must 
allocate different Derby database directories for each HiveContext, and 
Jianshi is also trying to avoid that.


On 10/27/14 9:44 PM, Ted Yu wrote:
Please see 
https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin#AdminManualMetastoreAdmin-EmbeddedMetastore


Cheers

On Oct 27, 2014, at 6:20 AM, Cheng Lian lian.cs@gmail.com 
mailto:lian.cs@gmail.com wrote:


I have never tried this yet, but maybe you can use an in-memory Derby 
database as metastore 
https://db.apache.org/derby/docs/10.7/devguide/cdevdvlpinmemdb.html


I'll investigate this when free, guess we can use this for Spark SQL 
Hive support testing.


On 10/27/14 4:38 PM, Jianshi Huang wrote:

There's an annoying small usability issue in HiveContext.

By default, it creates a local metastore which forbids other 
processes using HiveContext to be launched from the same directory.


How can I make the metastore local to each HiveContext? Is there an 
in-memory metastore configuration? /tmp/ temp folders is one 
solution, but it's not elegant and I still need to clean up the files...


I can add hive-site.xml and use a shared metastore, however they'll 
still operate in the same catalog space.


(Simple) SQLContext by default uses in-memory catalog which is bound 
to each context. Since HiveContext is a subclass, we should make the 
same semantics as default. Make sense?


Spark is very much functional and shared nothing, these are 
wonderful features. Let's not have something global as a dependency.



Cheers,
--
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github  Blog: http://huangjs.github.com/



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
mailto:user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org 
mailto:user-h...@spark.apache.org






Re: Ephemeral Hive metastore for HiveContext?

2014-10-27 Thread Jianshi Huang
Thanks Ted and Cheng for the in memory derby solution. I'll check it out. :)

And to me, using in-mem by default makes sense, if user wants a shared
metastore, it needs to be specified. An 'embedded' local metastore in the
working directory barely has a use case.

Jianshi



On Mon, Oct 27, 2014 at 9:57 PM, Cheng Lian lian.cs@gmail.com wrote:

  Thanks Ted, this is exactly what Spark SQL LocalHiveContext does. To make
 an embedded metastore local to a single HiveContext, we must allocate
 different Derby database directories for each HiveContext, and Jianshi is
 also trying to avoid that.


 On 10/27/14 9:44 PM, Ted Yu wrote:

 Please see
 https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin#AdminManualMetastoreAdmin-EmbeddedMetastore

  Cheers

 On Oct 27, 2014, at 6:20 AM, Cheng Lian lian.cs@gmail.com wrote:

  I have never tried this yet, but maybe you can use an in-memory Derby
 database as metastore
 https://db.apache.org/derby/docs/10.7/devguide/cdevdvlpinmemdb.html

 I'll investigate this when free, guess we can use this for Spark SQL Hive
 support testing.

 On 10/27/14 4:38 PM, Jianshi Huang wrote:

 There's an annoying small usability issue in HiveContext.


  By default, it creates a local metastore which forbids other processes
 using HiveContext to be launched from the same directory.


  How can I make the metastore local to each HiveContext? Is there an
 in-memory metastore configuration? /tmp/ temp folders is one solution,
 but it's not elegant and I still need to clean up the files...


  I can add hive-site.xml and use a shared metastore, however they'll
 still operate in the same catalog space.


  (Simple) SQLContext by default uses in-memory catalog which is bound to
 each context. Since HiveContext is a subclass, we should make the same
 semantics as default. Make sense?


  Spark is very much functional and shared nothing, these are wonderful
 features. Let's not have something global as a dependency.



  Cheers,

 --

 Jianshi Huang


  LinkedIn: jianshi

 Twitter: @jshuang

 Github  Blog: http://huangjs.github.com/



 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github  Blog: http://huangjs.github.com/