Re: Questions regarding Hive metadata schema
Hi Alan, The objects are very closely associated with the Thrift API objects defined in src/contrib/hive/metastore/if/hive_metastore.thrift . It contains descriptions as to what each field is and it should most of your questions. ORM for this is at s/c/h/metastore/src/java/model/package.jdo. 2) SD is storage descriptor (look at SDS table) 3) SERDES contains information for Hive serializers and deserializers 5) Tables and Partitions have Storage Descriptors. Storage Descriptors contain physical storage info and how to read the data (serde info). Storage Description object actually contains the columns. This means that different partitions can have different column sets 6) 1-1 Thanks, Prasad From: Alan Gates <[EMAIL PROTECTED]> Reply-To: Date: Tue, 7 Oct 2008 15:28:50 -0700 To: Subject: Questions regarding Hive metadata schema Hi, I've been looking over the db schema that hive uses to store it's metadata (package.jdo) and I had some questions: 1. What do the field names in the TYPES table mean? TYPE1, TYPE2, and TYPE_FIELDS are all unclear to me. 2. In the TBLS (tables) table, what is sd? 3. What does the SERDES table store? 4. What does the SORT_ORDER table store? It appears to describe the ordering within a storage descriptor, which in turn appears to be related to a partition. Do you envision having a table where different partitions have different orders? 5. SDS (storage descriptor) table has a list of columns. Does this imply that columnar storage is supported? 6. What is the relationship between a storage descriptor and a partition? 1-1, 1-n? Thanks. Alan.
Re: Hive questions about the meta db
Below property is not needed. Keep that to default value. (Also, you can create hive-site.xml and leave the hive-default.xml as it is) hive.metastore.uris jdbc:derby://nyhadoop1:1527/metastore_db Comma separated list of URIs of metastore servers. The first server that can be connected to will be used. Set local to true. hive.metastore.local true controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM If you are still getting error, check the logs (/tmp/${USER}/hive.log). In conf directory there is hive-log4j.properties where you can control the logging level. Prasad From: Edward Capriolo <[EMAIL PROTECTED]> Reply-To: Date: Thu, 2 Oct 2008 12:33:20 -0700 To: Subject: Re: Hive questions about the meta db I am doing a lot of testing with Hive, I will be sure to add this information to the wiki once I get it going. Thus far I downloaded the same version of derby that hive uses. I have verified that the connections is up and running. ij version 10.4 ij> connect 'jdbc:derby://nyhadoop1:1527/metastore_db;create=true'; ij> show tables TABLE_SCHEM |TABLE_NAME|REMARKS SYS |SYSALIASES| SYS |SYSCHECKS | ... vi hive-default.conf ... hive.metastore.local false controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM javax.jdo.option.ConnectionURL jdbc:derby://nyhadoop1:1527/metastore_db;create=true JDBC connect string for a JDBC metastore javax.jdo.option.ConnectionDriverName org.apache.derby.jdbc.ClientDriver Driver class name for a JDBC metastore hive.metastore.uris jdbc:derby://nyhadoop1:1527/metastore_db Comma separated list of URIs of metastore servers. The first server that can be connected to will be used. ... javax.jdo.PersistenceManagerFactoryClass=org.jpox.PersistenceManagerFactoryI mpl org.jpox.autoCreateSchema=false org.jpox.validateTables=false org.jpox.validateColumns=false org.jpox.validateConstraints=false org.jpox.storeManagerType=rdbms org.jpox.autoCreateSchema=true org.jpox.autoStartMechanismMode=checked org.jpox.transactionIsolation=read_committed javax.jdo.option.DetachAllOnCommit=true javax.jdo.option.NontransactionalRead=true javax.jdo.option.ConnectionDriverName=org.apache.derby.jdbc.ClientDriver javax.jdo.option.ConnectionURL=jdbc:derby://nyhadoop1:1527/metastore_db;crea te=true javax.jdo.option.ConnectionUserName= javax.jdo.option.ConnectionPassword= hive> show tables; 08/10/02 15:17:12 INFO hive.metastore: Trying to connect to metastore with URI jdbc:derby://nyhadoop1:1527/metastore_db FAILED: Error in semantic analysis: java.lang.NullPointerException 08/10/02 15:17:12 ERROR ql.Driver: FAILED: Error in semantic analysis: java.lang.NullPointerException I must have a setting wrong. Any ideas?
Re: Hive questions about the meta db
Hi Edward, By default, the embedded version of apache derby database is used as a metadb. You can run multiple queries against same metadb by providing a jdbc connection (where the metadata is located) to a mysql/derby or any other relational database in the options 'javax.jdo.option.ConnectionURL' & 'javax.jdo.option.ConnectionDriverName' . If you want to use derby start a networked server using the instructions here http://db.apache.org/derby/papers/DerbyTut/ns_intro.html and provide address to that server. For mysql you can do something like below javax.jdo.option.ConnectionURL jdbc:mysql:///?createDatabaseIfNotExist=true javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver It is generally good to have data and metadata at the same place. But there are couple of reasons that we couldn't use HDFS to put the metadata the foremost being that updates are not allowed on HDFS. By putting the data in an SQL system, it is easy to query and build other applications around it. But if you have ideas about how to put it on HDFS, please share them with us. Let me know if you need more help Prasad