MySQL to Oracle schema conversion utility

Tim Bunce Mon, 06 Aug 2001 06:24:15 -0700

Here's a MySQL to Oracle schema conversion utility that I've written.
It's rather more functional than the m2o one posted recently.

Tim.

p.s. I could have implemented the SQL 'create table' parser using
Parse::YAPP and the yacc syntax in the mysql source code but I wanted
to use this as an excuse to play with Parse::RecDescent.

#!/home/value/apps/perl -w

=head1 NAME

  mysql2ora - Convert MySQL schema into Oracle schema (including triggers etc)

=head1 SYNOPSIS

  mysql2ora [-options] mysql_schema.sql > oracle_schema.sql

=head1 DESCRIPTION

Converts MySQL schema into Oracle schema, including emulation of some
MySQL features...

Handles the emulation of an auto_increment field by defining a
sequence and a trigger that uses the sequence to automatically set the
auto_increment field. (Has an option to specify the initial value of
the sequence.)

Handles the emulation of a timestamp field by defining a
and a trigger that automatically sets the timestamp field.

Automatically detects all Oracle reserved words for both table and
field names, and supports multiple methods to deal with them.

Automatically detects long names (>30) characters and truncates them
intelligently to ensure that they remain unique. (It applies the same
techniques to the names of automatically generated sequences and triggers.)

Automatically generates CHECK clauses for enumerated types (and sets
the length of the field to the length of the longest valid value).

For SET types it automatically sets the length of the field to the sum
of the length of the valid values allowing for a comma delimiter
between each. (It doesn't generate a CHECK clause for SET types. That'll
be a little task for someone.)

For partial-field indices it has some huristics to do reasonable things
in some cases. (But typically you'll need to reconsider those for Oracle anyway.)

=head2 OPTIONS

=over 4

=item B<--only>

Ignore any tables with names that don't match the specified regular expression.

=item B<--drop>

Add DROP statements into the generated SQL before each new schema
object is created.

=item B<--autoinc>

Specify the start value of all generated auto_increment sequences. Defaults to 1.

If you are migrating existing data then this should be greater than the
highest value in any auto_increment field in your database.

=item B<--unreserve>

Specify an sprintf() format string to use to I<unreserve> a reserved word.

The default is 'C<"%s">', in other words, don't change the name just
put double quotes around it to make it acceptable to Oracle.  If you
use this approach then you'll need to quote all references to the field
in your SQL statements. If you run your mysqld in ANSI mode then you
can use double quotes for both MySQL and Oracle SQL and thus use the
same SQL statements for both.

Another common alternative is 'C<%s_>', which just appends an
underscore character to the name.

=back

=head2 LIMITATIONS

The parser has been tested on the output of C<mysqldump -d> and not on
hand-written SQL.  So it's possible that some legal SQL will not parse
correctly.

Also, it has only been tested on the schemas that I have available to
me. It's quite probable that for some less common types it doesn't do
the right thing.

The parser error handling isn't good. I know that Parse::RecDescent can
generate good errors but I don't seem to be using it the right way to
get them.

=head2 TO DO (by someone)

Add optional support for the auto_increment trigger storing the last value
into a session global to simplify emulation of last_insert_id.

CHECK clause for SET types.

Handle embedded quotes etc in default values etc.

Query mysql db to set initial start value of auto_increment sequence

=head2 AUTHOR

Copyright Tim Bunce, 2001. Released under the same terms as Perl.

=cut

use strict;

use Carp;
use Parse::RecDescent;
use Data::Dumper;
use Getopt::Long;

use vars qw(%VARIABLE %opt);

%opt = (
        maxname => 30,  # oracle names must be <=30 chars
        d => 0,
        quiet => 0,
        autoinc => 1,
        unreserve => '"%s"',
);
GetOptions(\%opt,
        'd!',
        'quiet!',
        'drop!',
        'prefix!',
        'only=s',
        'unreserve=s',
        'autoinc=s',
        'maxname=i',
) or die;

# Enable warnings within the Parse::RecDescent module.
$::RD_ERRORS = 1; # Make sure the parser dies when it encounters an error
$::RD_WARN   = 1; # Enable warnings. This will warn on unused rules &c.
$::RD_HINT   = 1; # Give out hints to help fix problems.
$::RD_TRACE  = $opt{d}?90:undef; # Trace behaviour to help fix problems.

my $grammar = <<'_EOGRAMMAR_';

{       my @col_defn;
        my @key_defn;
        my $schema_obj_name = '(initial)';
}

VALUE   : /[-+]?\.?\d+(?:[eE]\d+)?/
        { $item[1] }
        | /'.*?'/       # XXX doesn't handle embedded quotes
        { $item[1] }
        | /NULL/
        { 'NULL' }

NAME    : "`" /\w+/ "`"
        { $item[2] }
        | /\w+/
        { $item[1] }

parens_value_list : '(' VALUE(s /,/) ')'
        { $item[2] }

parens_name_list : '(' NAME(s /,/) ')'
        { $item[2] }

name_with_opt_paren : NAME parens_value_list(s?)
        { $item[2][0] ? "$item[1]($item[2][0][0])" : $item[1] }

index_field_list : '(' name_with_opt_paren(s /,/) ')'
        { $item[2] }


# --- specific productions

table_options : /[^;]*/
        { $item[1] }

type_qualifier : /(BINARY|UNSIGNED|ZEROFILL)/i
        { uc $item[1] }

type_spec : /\w+/ parens_value_list(?) type_qualifier(s?)
        { { type_name=>$item[1], list=>$item[2][0], type_qual=>$item[3] } }

default_value : /DEFAULT/i VALUE
        { $item[2] }

auto_increment : /AUTO_INCREMENT/i 
        { 1 }

primary_key : /PRIMARY KEY/i
        { 1 }

primary_key_definition : /PRIMARY KEY/i <commit> parens_name_list
        {
            push @key_defn, {
                type => uc $item[1],
                name=>'PRIMARY KEY',
                fields=>$item[-1],
            }
        }

index_definition : /(KEY|INDEX)|(UNIQUE|FULLTEXT)(\s+(KEY|INDEX))?/i <commit> NAME(?) 
index_field_list
        {
            push @key_defn, {
                type    => uc $item[1],
                name    => $item[3][0],
                name_ur => main::unreserve($item[3][0]||'', $schema_obj_name),
                fields  => $item[-1],
            }
        }

column_definition : NAME <commit> type_spec /((NOT\s+)?NULL)?/i default_value(?) 
/((NOT\s+)?NULL)?/i auto_increment(?) primary_key(?)
        {
            push @col_defn, {
                name    => $item[1],
                name_ur => main::unreserve($item[1]||'', $schema_obj_name),
                type    => $item[3],
                nullable=> $item[4] || $item[6],
                default => $item[5][0],
                is_pk   => $item[8][0],
                is_auto_increment => $item[7][0],
            }
        }

create_definition : primary_key_definition | index_definition | column_definition

schema_obj_name : NAME
        {
          $schema_obj_name = $item[1];
          warn "\u$arg[0] $schema_obj_name...\n" unless $::opt{quiet};
          @col_defn = ();
          @key_defn = ();
          $item[1];
        }

create_table : /CREATE TABLE/i <commit> schema_obj_name['table'] '(' 
create_definition(s /,/) ')' table_options ';'
        { main::create_table($item[3], \@col_defn, \@key_defn, $item[-2]) }

comment : /#[^\n]*/
        {
            ($item[1] =~ m/^#\s*$/) ? print "/* # */\n" : print "/* $item[1]\t*/\n";
        }

statement: comment(s?) create_table
        | <error:near "[$text]">

eofile: /^\Z/

startrule: statement(s?) eofile

_EOGRAMMAR_


# Oracle reserved words from:
#    
http://technet.oracle.com/docs/products/oracle8i/doc_library/817_doc/server.817/a85397/ap_keywd.htm
my @ora_reserved = qw(
    ACCESS ADD ALL ALTER AND ANY AS ASC AUDIT BETWEEN BY
    CHAR CHECK CLUSTER COLUMN COMMENT COMPRESS CONNECT CREATE CURRENT
    DATE DECIMAL DEFAULT DELETE DESC DISTINCT DROP
    ELSE EXCLUSIVE EXISTS 
    FILE FLOAT FOR FROM
    GRANT GROUP HAVING
    IDENTIFIED IMMEDIATE IN INCREMENT INDEX INITIAL INSERT
    INTEGER INTERSECT INTO IS
    LEVEL LIKE LOCK LONG 
    MAXEXTENTS MINUS MLSLABEL MODE MODIFY 
    NOAUDIT NOCOMPRESS NOT NOWAIT NULL NUMBER 
    OF OFFLINE ON ONLINE OPTION OR ORDER
    PCTFREE PRIOR PRIVILEGES PUBLIC
    RAW RENAME RESOURCE REVOKE ROW ROWID ROWNUM ROWS
    SELECT SESSION SET SHARE SIZE SMALLINT START 
    SUCCESSFUL SYNONYM SYSDATE 
    TABLE THEN TO TRIGGER 
    UID UNION UNIQUE UPDATE USER
    VALIDATE VALUES VARCHAR VARCHAR2 VIEW
    WHENEVER WHERE WITH
);
my %ora_reserved = map { $_ => 1 } @ora_reserved;

my %global_names;
my %unreserve;
my %truncated;


my $parser = Parse::RecDescent->new($grammar);

my $example = q{
CREATE TABLE check (
  id int(7) unsigned zerofill NOT NULL default '0000000' auto_increment primary key,
  successful date NOT NULL default '0000-00-00',
  unsuccessful date default '0000-00-00',
  i1 int(11) default '0' not null,
  s1 set('a','b','c') default 'b',
  e1 enum('a','b','c') default 'c',
  name varchar(30) default NULL,
  foo_type enum('vk','ck') NOT NULL default 'vk',
  date timestamp,
  time_stamp2 timestamp,
  KEY (i1),
  UNIQUE (date, i1),
  KEY date_idx (date),
  KEY name_idx (name(10))
) TYPE=MyISAM PACK_KEYS=1;
};


my $mysql_schema = $example;
my $mysql_schema_name = 'example';
if (@ARGV) {
    $mysql_schema_name = $ARGV[0];
    warn "Reading mysql schema from $mysql_schema_name\n";
    local $/;
    open FH, "<$ARGV[0]" or die "Can't open $mysql_schema_name: $!\n";
    $mysql_schema = <FH>;
}

print "/* Oracle schema generated by $0 on ".localtime()."      */\n";
print "\n";
print "/* We assume that default NLS_DATE_FORMAT has been changed       */\n";
print "/* but we set it here anyway to be self-consistent.              */\n";
print "ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD HH24:MI:SS';\n";
print "\n";

my $result = $parser->startrule($mysql_schema);

print Dumper($result) if $opt{d};

if (%truncated) {
    warn "Truncated ".keys(%truncated)." names:\n";
    warn "\t".join("\n\t", sort keys %truncated)."\n";
}
if (%unreserve) {
    warn "Encounted ".keys(%unreserve)." unsafe names in $mysql_schema_name (reserved 
or invalid names):\n";
    warn "\t".join("\n\t", sort keys %unreserve)."\n";
}

die "Parse failed.\n" unless defined $result;
exit 0;


# ------


sub unreserve {
    my ($name, $schema_obj_name) = @_;
    my ($suffix) = ($name =~ s/(\W.*)$//) ? $1 : '';
    return $_[0] if !$ora_reserved{uc $name}
        && $name =~ /^[a-z]/i; # also trap fields that don't beging with a letter
    $schema_obj_name ? ++$unreserve{"$schema_obj_name.$name"} : ++$unreserve{"$name 
(table name)"};
    my $unreserve = sprintf $opt{unreserve}, $name; # eg '%s_' or '"%s"' etc
    return $unreserve.$suffix;
}


sub mk_name {
    my ($basename, $type, $scope, $critical) = @_;
    my $basename_orig = $basename;
    my $max_name = $opt{maxname} - (length($type)+1);
    $basename = substr($basename, 0, $max_name) if length($basename) > $max_name;
    my $name = ($type) ? ($opt{prefix} ? "${type}_$basename" : "${basename}_$type") : 
$basename;
    if ($basename ne $basename_orig and $critical) {
        my $show_type = $type ? "+'$type'" : "";
        warn "Truncating '$basename_orig'$show_type to $opt{maxname} character limit 
to make '$name'\n";
        $truncated{$basename_orig} = $name;
    }
    $scope ||= \%global_names;
    return $name unless $scope->{$name}++;
    my $name_orig = $name;
    $name .= "02";
    substr($name, $opt{maxname}-3) = "00" if length($name) > $opt{maxname};
    ++$name while $scope->{$name};
    warn "The name '$name_orig' has been changed to '$name' to make it unique\n";
    return $name;
}


sub create_table {
    my ($tablename, $rows, $keys, $options) = @_;

    if ($opt{only} && $tablename !~ /$opt{only}/o) {
        warn "Skipped $tablename\n";
        return 1;
    }

    $tablename = mk_name($tablename,'',undef,1);
    my $tablename_ur = unreserve($tablename);
    print "create_table:", Dumper( [ ($tablename, $rows, $keys, $options) ] ) if 
$opt{d};

    my %field_name_scope;
    my @ora_col_defn;
    foreach (@$rows) {
        my $field_name = mk_name($_->{name}, '', \%field_name_scope, 1);
        my $field_name_ur = (unreserve($field_name,$tablename));
        my @defn = ($field_name_ur);
        push @defn, mysql2ora_type($field_name_ur, $_, $tablename); # also does 
default clause
        if ($_->{is_pk}) {
            push @defn, "CONSTRAINT ".mk_name($tablename,'pk')." PRIMARY KEY";
            warn "@defn can't be nullable under Oracle"
                if $_->{nullable} && $_->{nullable} !~ /not/i;
        }
        push @ora_col_defn, join " ", @defn;
    }

    my $idx_name_default = "idx00";
    my @ora_key_defn;
    foreach my $key (@$keys) {
        my $idx_fields = join ",", map { unreserve($_, $tablename) } @{ $key->{fields} 
};
        if ($idx_fields =~ m/\(\d+\)/) {
            my $orig = $idx_fields;
            if (@{ $key->{fields} } == 1) {
                # if this is the only field in the index then drop it entirely
                warn "Warning: Removed partial-field index on $idx_fields of 
$tablename (not supported by Oracle)\n";
                next;
            }
            if ($idx_fields =~ s/\(\d+\)*$//) { # make LAST field non-partial
                warn "Warning: Changed index on $tablename from '$orig' to 
'$idx_fields' (will be more expensive)\n";
            }
            # if more (or other) than the last field are partial then handle it 
manually
            warn "Warning: Partial-field index on $tablename of '$orig' will need 
manual fixup for Oracle\n"
                if $idx_fields =~ m/\(/;
        }
        my $type = $key->{type}; # KEY, PRIMARY KEY, UNIQUE, UNIQUE INDEX
        if ($type eq 'PRIMARY KEY') {
            push @ora_col_defn, "CONSTRAINT ".mk_name($tablename,'pk')." PRIMARY KEY 
($idx_fields)";
            next;
        }
        $type =~ s/\s*(INDEX|KEY)\s*//;
        my $idx_name = mk_name($tablename, $key->{name} || ++$idx_name_default);
        push @ora_key_defn, "CREATE $type INDEX $idx_name ON $tablename_ur 
($idx_fields);";
    }

    print "DROP TABLE $tablename_ur;\n" if $opt{drop};
    print "CREATE TABLE $tablename_ur (\n  ";
    print join ",\n  ", @ora_col_defn;
    print "\n);\n";
    print join "\n", @ora_key_defn;
    print "\n\n";

    foreach (@$rows) {
        next unless $_->{is_auto_increment};
        my $base_name = $tablename . "_". $_->{name};
        my $sq_name   = mk_name($base_name,'sq');
        my $trig_name = mk_name($base_name,'ai');
        print "DROP SEQUENCE $sq_name;\n" if $opt{drop};
        print <<"EOS";
CREATE SEQUENCE $sq_name START WITH $opt{autoinc};
CREATE OR REPLACE TRIGGER $trig_name
BEFORE INSERT ON $tablename_ur
FOR EACH ROW WHEN (new.$_->{name} IS NULL OR new.$_->{name} = 0)
BEGIN
   SELECT $sq_name.nextval INTO :new.$_->{name_ur} FROM dual;
END;
/
EOS
        print "\n";
    }

    foreach (@$rows) {
        next unless lc $_->{type}->{type_name} eq 'timestamp';
        my $base_name = $tablename . "_". $_->{name};
        my $trig_name = mk_name($base_name,'ts');
        print <<"EOS";
CREATE OR REPLACE TRIGGER $trig_name
BEFORE INSERT OR UPDATE ON $tablename_ur
FOR EACH ROW WHEN (new.$_->{name_ur} IS NULL)
BEGIN
  SELECT sysdate INTO :new.$_->{name_ur} FROM dual;
END;
/
EOS
        print "\n";
        last; # only first timestamp column is auto-updated
    }

    return 1;
}


sub mysql2ora_type {
    my ($field_name, $col_defn, $tablename) = @_;
    my $spec = $col_defn->{type} || die;
    my $qual = $spec->{type_qual};      # binary/unsigned/zerofill
    my $type = uc $spec->{type_name};
    my $list = $spec->{list} || [];
    my $commalist = join ",", @$list;
    my $default = ($col_defn->{default}) ? "DEFAULT $col_defn->{default}" : "";
    my $not_null  = ($col_defn->{nullable} =~ /not/i) ? "NOT NULL" : "";

    warn "Warning: $tablename.$col_defn->{name}: zerofill attribute ignored.\n"
        if $qual =~ /zerofill/i;

    if ($type eq 'ENUM') {
        my $len = 0;
        $len = ($len < length($_)) ? length($_) : $len for (@$list);
        my $check = "CHECK ($field_name IN ($commalist))";
        return "VARCHAR2($len) $default $not_null $check";
    }
    elsif ($type eq 'SET') {
        my $len = length $commalist;
        # XXX add a CHECK constraint maybe (more tricky, and slower, than enum :)
        return "VARCHAR2($len) /* set $commalist */ $default $not_null";
    }
    elsif ($type =~ /^INT/ or $type =~ /INT$/) {
        return "INTEGER $default $not_null";
    }
    elsif ($type =~ /^(REAL|DOUBLE|FLOAT)$/) {
        return "NUMBER $default $not_null";
    }
    elsif ($type =~ /^(DECIMAL|NUMERIC)$/) {
        return "$type($commalist) $default $not_null" if @$list;
        return "$type $default $not_null";
    }
    elsif ($type =~ /^(DATE|TIME|DATETIME|TIMESTAMP)$/) {
        if ($not_null) { $default =~ s/'0000-00-00/'0001-01-01/; }
        else {           $default =~ s/'0000-00-00.*?'/NULL/;    }
        return "DATE /* $type */ $default $not_null";
    }
    elsif ($type =~ /^TINY(BLOB|TEXT)$/) {
        return "VARCHAR2(255) /* $type */ $default $not_null";
    }
    elsif ($type =~ /BLOB$/) {
        return "BLOB /* $type */ $default $not_null";
    }
    elsif ($type =~ /TEXT$/) {
        return "CLOB /* $type */ $default $not_null";
    }
    elsif ($type =~ /CHAR$/) {  # CHAR and VARCHAR
        warn "Note: $tablename.$col_defn->{name} \L$type doesn't have BINARY qualifier 
for case-sensitivity.\n"
                unless $type =~ /binary/i;
        return "$type($commalist) $default $not_null";
    }
    warn "Unhandled MySQL type '$type' (on $col_defn->{name} column)\n";
    return "$type $default $not_null";
}

__END__

---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

MySQL to Oracle schema conversion utility

Reply via email to