Re: Ephemeral Hive metastore for HiveContext?
I have never tried this yet, but maybe you can use an in-memory Derby database as metastore https://db.apache.org/derby/docs/10.7/devguide/cdevdvlpinmemdb.html I'll investigate this when free, guess we can use this for Spark SQL Hive support testing. On 10/27/14 4:38 PM, Jianshi Huang wrote: There's an annoying small usability issue in HiveContext. By default, it creates a local metastore which forbids other processes using HiveContext to be launched from the same directory. How can I make the metastore local to each HiveContext? Is there an in-memory metastore configuration? /tmp/ temp folders is one solution, but it's not elegant and I still need to clean up the files... I can add hive-site.xml and use a shared metastore, however they'll still operate in the same catalog space. (Simple) SQLContext by default uses in-memory catalog which is bound to each context. Since HiveContext is a subclass, we should make the same semantics as default. Make sense? Spark is very much functional and shared nothing, these are wonderful features. Let's not have something global as a dependency. Cheers, -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github Blog: http://huangjs.github.com/ - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Ephemeral Hive metastore for HiveContext?
Please see https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin#AdminManualMetastoreAdmin-EmbeddedMetastore Cheers On Oct 27, 2014, at 6:20 AM, Cheng Lian lian.cs@gmail.com wrote: I have never tried this yet, but maybe you can use an in-memory Derby database as metastore https://db.apache.org/derby/docs/10.7/devguide/cdevdvlpinmemdb.html I'll investigate this when free, guess we can use this for Spark SQL Hive support testing. On 10/27/14 4:38 PM, Jianshi Huang wrote: There's an annoying small usability issue in HiveContext. By default, it creates a local metastore which forbids other processes using HiveContext to be launched from the same directory. How can I make the metastore local to each HiveContext? Is there an in-memory metastore configuration? /tmp/ temp folders is one solution, but it's not elegant and I still need to clean up the files... I can add hive-site.xml and use a shared metastore, however they'll still operate in the same catalog space. (Simple) SQLContext by default uses in-memory catalog which is bound to each context. Since HiveContext is a subclass, we should make the same semantics as default. Make sense? Spark is very much functional and shared nothing, these are wonderful features. Let's not have something global as a dependency. Cheers, -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github Blog: http://huangjs.github.com/ - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Ephemeral Hive metastore for HiveContext?
Thanks Ted, this is exactly what Spark SQL LocalHiveContext does. To make an embedded metastore local to a single HiveContext, we must allocate different Derby database directories for each HiveContext, and Jianshi is also trying to avoid that. On 10/27/14 9:44 PM, Ted Yu wrote: Please see https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin#AdminManualMetastoreAdmin-EmbeddedMetastore Cheers On Oct 27, 2014, at 6:20 AM, Cheng Lian lian.cs@gmail.com mailto:lian.cs@gmail.com wrote: I have never tried this yet, but maybe you can use an in-memory Derby database as metastore https://db.apache.org/derby/docs/10.7/devguide/cdevdvlpinmemdb.html I'll investigate this when free, guess we can use this for Spark SQL Hive support testing. On 10/27/14 4:38 PM, Jianshi Huang wrote: There's an annoying small usability issue in HiveContext. By default, it creates a local metastore which forbids other processes using HiveContext to be launched from the same directory. How can I make the metastore local to each HiveContext? Is there an in-memory metastore configuration? /tmp/ temp folders is one solution, but it's not elegant and I still need to clean up the files... I can add hive-site.xml and use a shared metastore, however they'll still operate in the same catalog space. (Simple) SQLContext by default uses in-memory catalog which is bound to each context. Since HiveContext is a subclass, we should make the same semantics as default. Make sense? Spark is very much functional and shared nothing, these are wonderful features. Let's not have something global as a dependency. Cheers, -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github Blog: http://huangjs.github.com/ - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org mailto:user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org mailto:user-h...@spark.apache.org
Re: Ephemeral Hive metastore for HiveContext?
Thanks Ted and Cheng for the in memory derby solution. I'll check it out. :) And to me, using in-mem by default makes sense, if user wants a shared metastore, it needs to be specified. An 'embedded' local metastore in the working directory barely has a use case. Jianshi On Mon, Oct 27, 2014 at 9:57 PM, Cheng Lian lian.cs@gmail.com wrote: Thanks Ted, this is exactly what Spark SQL LocalHiveContext does. To make an embedded metastore local to a single HiveContext, we must allocate different Derby database directories for each HiveContext, and Jianshi is also trying to avoid that. On 10/27/14 9:44 PM, Ted Yu wrote: Please see https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin#AdminManualMetastoreAdmin-EmbeddedMetastore Cheers On Oct 27, 2014, at 6:20 AM, Cheng Lian lian.cs@gmail.com wrote: I have never tried this yet, but maybe you can use an in-memory Derby database as metastore https://db.apache.org/derby/docs/10.7/devguide/cdevdvlpinmemdb.html I'll investigate this when free, guess we can use this for Spark SQL Hive support testing. On 10/27/14 4:38 PM, Jianshi Huang wrote: There's an annoying small usability issue in HiveContext. By default, it creates a local metastore which forbids other processes using HiveContext to be launched from the same directory. How can I make the metastore local to each HiveContext? Is there an in-memory metastore configuration? /tmp/ temp folders is one solution, but it's not elegant and I still need to clean up the files... I can add hive-site.xml and use a shared metastore, however they'll still operate in the same catalog space. (Simple) SQLContext by default uses in-memory catalog which is bound to each context. Since HiveContext is a subclass, we should make the same semantics as default. Make sense? Spark is very much functional and shared nothing, these are wonderful features. Let's not have something global as a dependency. Cheers, -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github Blog: http://huangjs.github.com/ - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github Blog: http://huangjs.github.com/