*Please update solution:
Instead of passing the dbLocation, the database name will be passed in the
Register DDL*
CarbonData table backup and recovery
Background
Customer has created one CarbonData table which is already loaded very huge
data, and now they install another cluster which want to use the same data
as this table and don’t want load again, because load data cost long time,
so they want can directly backup this table data and recover it in another
cluster. After recovery the data in the CarbonData user can use it as a
normal CarbonData table.
Requirement Description
A CarbonData table’s data can support backup the data and recover the data
which no need load data again.
To reuse the CarbonData table of another cluster a DDL should be provided to
create the CarbonData table from the existing carbon table schema.
Solution
Currently CarbonData has below three types of tables
1.   Normal table
2.   Pre Aggregate table
CarbonData should provide a DDL command to create the table from existing
table data.
Below DDL command could be used to create the table from existing table
data.

  REGISTER TABLES FOR $DBName;
 
           i.  The database path will be retrived from hive catalog &
               The database path will be scanned to get all table.
          ii.  The table schema will be read to get columns details.
         iii.  The table will be registered to the hive catalog with below
details
CREATE TABLE $tbName USING carbondata OPTIONS (tableName "$dbName.$tbName",
dbName "$dbName",
tablePath "$tablePath",
path "$tablePath" )

Precondition:
    i. The user has to create the database and Before executing this command
the old table schema and 
       data should be copied into the database location.
   ii. If the table is aggregate table then all the aggregate tables should
be copied to the  in database 
       location .
 
Validation:
   1. If database does not exist then the registration will fail.
   2. The table will be registered only if same table name is not already
registered.
   3. If the table contains the aggregate tables then all the aggregate
tables should be registered to hive 
       catalog and if any of the aggregate table does not exist then the
table creation operation should fail.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Reply via email to