Re: Questions regarding Hive metadata schema

2008-10-07 Thread Prasad Chakka
Hi Alan,

The objects are very closely associated with the Thrift API objects defined
in src/contrib/hive/metastore/if/hive_metastore.thrift . It contains
descriptions as to what each field is and it should most of your questions.
ORM for this is at s/c/h/metastore/src/java/model/package.jdo.

2) SD is storage descriptor (look at SDS table)
3) SERDES contains information for Hive serializers and deserializers
5) Tables and Partitions have Storage Descriptors. Storage Descriptors
contain physical storage info and how to read the data (serde info). Storage
Description object actually contains the columns. This means that different
partitions can have different column sets
6) 1-1

Thanks,
Prasad

From: Alan Gates [EMAIL PROTECTED]
Reply-To: core-user@hadoop.apache.org
Date: Tue, 7 Oct 2008 15:28:50 -0700
To: core-user@hadoop.apache.org
Subject: Questions regarding Hive metadata schema

Hi,

I've been looking over the db schema that hive uses to store it's
metadata (package.jdo) and I had some questions:

   1.  What do the field names in the TYPES table mean? TYPE1, TYPE2,
and TYPE_FIELDS are all unclear to me.
   2. In the TBLS (tables) table, what is sd?
   3. What does the SERDES table store?
   4. What does the SORT_ORDER table store? It appears to describe the
ordering within a storage descriptor, which in turn appears to be
related to a partition. Do you envision having a table where different
partitions have different orders?
   5. SDS (storage descriptor) table has a list of columns. Does this
imply that columnar storage is supported?
   6. What is the relationship between a storage descriptor and a
partition? 1-1, 1-n?

Thanks.

Alan.




Re: Hive questions about the meta db

2008-10-02 Thread Prasad Chakka

Below property is not needed. Keep that to default value. (Also, you can
create hive-site.xml and leave the hive-default.xml as it is)

property
  namehive.metastore.uris/name
  valuejdbc:derby://nyhadoop1:1527/metastore_db/value
  descriptionComma separated list of URIs of metastore servers. The
first server that can be connected to will be used./description
/property

Set local to true.
property
  namehive.metastore.local/name
  valuetrue/value
  descriptioncontrols whether to connect to remove metastore server
or open a new metastore server in Hive Client JVM/description
/property

If you are still getting error, check the logs (/tmp/${USER}/hive.log). In
conf directory there is hive-log4j.properties where you can control the
logging level.

Prasad



From: Edward Capriolo [EMAIL PROTECTED]
Reply-To: core-user@hadoop.apache.org
Date: Thu, 2 Oct 2008 12:33:20 -0700
To: core-user@hadoop.apache.org
Subject: Re: Hive questions about the meta db

I am doing a lot of testing with Hive, I will be sure to add this
information to the wiki once I get it going.

Thus far I downloaded the same version of derby that hive uses. I have
verified that the connections is up and running.

ij version 10.4
ij connect 'jdbc:derby://nyhadoop1:1527/metastore_db;create=true';
ij show tables
TABLE_SCHEM |TABLE_NAME|REMARKS

SYS |SYSALIASES|
SYS |SYSCHECKS |
...

vi hive-default.conf
...
property
  namehive.metastore.local/name
  valuefalse/value
  descriptioncontrols whether to connect to remove metastore server
or open a new metastore server in Hive Client JVM/description
/property

property
  namejavax.jdo.option.ConnectionURL/name
  valuejdbc:derby://nyhadoop1:1527/metastore_db;create=true/value
  descriptionJDBC connect string for a JDBC metastore/description
/property

property
  namejavax.jdo.option.ConnectionDriverName/name
  valueorg.apache.derby.jdbc.ClientDriver/value
  descriptionDriver class name for a JDBC metastore/description
/property

property
  namehive.metastore.uris/name
  valuejdbc:derby://nyhadoop1:1527/metastore_db/value
  descriptionComma separated list of URIs of metastore servers. The
first server that can be connected to will be used./description
/property
...

javax.jdo.PersistenceManagerFactoryClass=org.jpox.PersistenceManagerFactoryI
mpl
org.jpox.autoCreateSchema=false
org.jpox.validateTables=false
org.jpox.validateColumns=false
org.jpox.validateConstraints=false
org.jpox.storeManagerType=rdbms
org.jpox.autoCreateSchema=true
org.jpox.autoStartMechanismMode=checked
org.jpox.transactionIsolation=read_committed
javax.jdo.option.DetachAllOnCommit=true
javax.jdo.option.NontransactionalRead=true
javax.jdo.option.ConnectionDriverName=org.apache.derby.jdbc.ClientDriver
javax.jdo.option.ConnectionURL=jdbc:derby://nyhadoop1:1527/metastore_db;crea
te=true
javax.jdo.option.ConnectionUserName=
javax.jdo.option.ConnectionPassword=

hive show tables;
08/10/02 15:17:12 INFO hive.metastore: Trying to connect to metastore
with URI jdbc:derby://nyhadoop1:1527/metastore_db
FAILED: Error in semantic analysis: java.lang.NullPointerException
08/10/02 15:17:12 ERROR ql.Driver: FAILED: Error in semantic analysis:
java.lang.NullPointerException

I must have a setting wrong. Any ideas?




Re: Hive questions about the meta db

2008-10-01 Thread Prasad Chakka
Hi Edward,

By default, the embedded version of apache derby database is used as a
metadb. You can run multiple queries against same metadb by providing a jdbc
connection (where the metadata is located) to a mysql/derby or any other
relational database in the options 'javax.jdo.option.ConnectionURL' 
'javax.jdo.option.ConnectionDriverName' . If you want to use derby start a
networked server using the instructions here
http://db.apache.org/derby/papers/DerbyTut/ns_intro.html  and provide
address to that server.

For mysql you can do something like below

  namejavax.jdo.option.ConnectionURL/name
  valuejdbc:mysql://mysql server hostname/database
name?createDatabaseIfNotExist=true/value
  namejavax.jdo.option.ConnectionDriverName/name
  valuecom.mysql.jdbc.Driver/value


It is generally good to have data and metadata at the same place. But there
are couple of reasons that we couldn't use HDFS to put the metadata the
foremost being that updates are not allowed on HDFS. By putting the data in
an SQL system, it is easy to query and build other applications around it.
But if you have ideas about how to put it on HDFS, please share them with
us.

Let me know if you need more help

Prasad