Someone asked me to do this, and I have a draft now.   While it is clearly lacking in 
some areas, it might be of help generally.  

Ian

Ian A. Harding
Programmer/Analyst II
Tacoma-Pierce County Health Department
(253) 798-3549
mailto: [EMAIL PROTECTED]

The Microsoft SQL Server to PostgreSQL Migration HOWTO
Ian A. Harding <[EMAIL PROTECTED]>
v1.00, 23 July 2000

How to move a database from a popular proprietary database to the world's most 
powerful open source database.
______________________________________________________________________

Table of Contents


1. Disclaimer

2. Introduction

3. Considerations

4. Tables

5. Data

6. Views

7. Summary


 ______________________________________________________________________

1.Disclaimer

The following document is offered in good faith as comprising only safe programming 
and procedures. No responsibility is accepted by the author for any loss or damage 
caused in any way to any person or equipment, as a direct or indirect consequence of 
following these instructions.


2.Introduction

The most recent version of this document can always be found at 
http://www.tpchd.org/????.html

Microsoft SQL Server is very popular relational database management systems (RDBMS) 
with highly restrictive licensing and high cost of ownership if the database is of 
significant size, or is used by a significant number of clients.  It does, however, 
provide a very user-friendly interface, is easy to learn and use, and has low cost 
entry level configurations.  This has resulted in a very large installed user base.

PostgreSQL now challenges MS SQL Server in basic feature set, reliability and 
performance, has a much less restrictive license, and is open source.  As a matter of 
course, users are migrating to PostgreSQL from MS SQL Server as the cost of ownership 
becomes an issue, and as their knowledge of relational database systems increases.

This HOW-TO is intended for the MS SQL Server user who is now ready to migrate 
databases to PostgreSQL.

3.Considerations

RDBMS features are implemented differently and to different degrees by programmers.  
Some applications rely heavily on so-called middleware, or on the client application 
to handle business logic.  Others attempt to put as much logic as possible in the 
database.  Your migration will be far more difficult if your application is in the 
latter group.  While it is a sound design choice to put logic in the database server, 
it will require programming in a vendor specific Structured Query Language (SQL) 
extension such as Microsoft's Transact SQL (T-SQL).  This is also the case with 
PostgreSQL.  There is no easy way to migrate stored procedures, triggers, or rules.  
On the bright side, PostgreSQL provides several language options, all of which are 
more graceful than T-SQL. 

RDBMS all provide built-in functions.  However, like procedural extensions to SQL, 
they are not portable.  Fortunately, there is some overlap, and the simple syntax 
makes migration relatively easy.

Finally, the programmer's choice of SQL syntax can affect this process.  Most RDBMS 
are approaching the evolving SQL standards.  That is, they are leaning away from 
vendor specific syntax such as the '*=' syntax for a left outer join.  This syntax is 
still supported in MS SQL Server as of version 7.0, but was never supported in 
PostgreSQL.

This process will require either a mind-numbing amount of hand editing of script and 
data files, or use of a scripting language to programmatically modify these files, 
followed by a somewhat less enormous amount of editing.  I am not smart enough to 
identify every possible option for the migration, or to accomodate them in a script.  
I have done this migration on a relatively complex database application in a 
reasonable amount of time.  This, rather than a technically flawless script, should be 
your goal.

I use Tool Command Language (TCL) for almost everything, so I use it here.You can use 
whatever language you like.

4.  Tables

Dump the table defininitions with the MS SQL Server scripting tool.  From the 
Enterprise Manager, right click on your database and select 'All Tasks', then 
'Generate SQL Scripts' from the context menu.  Uncheck 'Script All Objects', and 
select 'All Tables'.  On the 'Formatting' tab, de-select 'Generate DROP...'.  On the 
'Options' tab, select 'Script indexes' and Script PRIMARY KEYS...'.  Select the 
'MS-DOS' file format, and make sure 'Create one file' is checked.  Click OK, give it a 
name, and put it somewhere you can find it.

A brief look at this file will show you what we are up against.  MS uses square 
brackets around all identifiers, to protect you from poor design choices such as using 
reserved keywords so crazy things like:

CREATE TABLE [dbo].[Select] ([Union] [int])

are possible.  PostgreSQL uses double quotes instead.  MS uses the object owner 
qualification for all objects, 'dbo' in this case.  PostgreSQL has no such 
qualifications in object names.

Another thing to note is that MS SQL identifiers are case preserved, but in practice 
most installations are installed as case insensitive.PostgreSQL is case agnostic in 
the case of SQL keywords and unquoted identifiers, forcing all queries to lower case.  
It is not the same as being case insensitive, in that you can create tables using the 
double quote protection mentioned above, such that they can only be accessed using the 
same double quoting method.I find it is best to abandon case in object identifiers 
when migrating to PostgreSQL.  Also, it is safest to avoid any identifiers that 
require quoting to avoid problems down the road.

It is worth noting that for data comparisons, PostgreSQL is case sensitive and there 
is no option to change this behaviour.You will have to force data to upper or lower on 
both sides of text comparisons if case is not important to the operation and there is 
a chance of it being different.  This conversion might be a good time to force data 
used in joins and comparisons to all upper or lower case.  You will also need to look 
at the application for code that does comparisons of user-entered information taking 
advantage of MS SQL Server's typical case insensitivity.

Another issue that is not immediately evident is that MS SQL Server supports ALTER 
TABLE statements that contain comma separated lists of alterations.  PostgreSQL 
currently does not.  This is important since the MS SQL Server scripting program 
creates all constraints in ALTER TABLE statements.  If any tables have more than one 
constraint, you will have to surgically separate them into their own ALTER TABLE 
statements, or move them in the CREATE TABLE statement.  

Indexes are a bright spot, mostly.  The CLUSTER keyword in PostgreSQL is not the same 
as the CLUSTERED keyword in a MS SQL Server index creation.  PostgreSQL will allow you 
to 'cluster' a table, that is, rearranging the tuples in a table in order for that 
field.  This sounds good, except that the cluster is not maintained for updates and 
inserts, and the fact that it will break all your other indexes whenever you generate 
the clustering.

Having said all that, here is a partial list of things to correct:

  1.  Force to lower case.

  2.  Remove all square brackets.

  3.  Remove all object owner prefixes (i.e. "dbo.")

  4.  Remove all reference to filegroup (i.e. "ON PRIMARY") 

  5.  Remove all non-supported optional keywords (i.e. "WITH NOCHECK", "CLUSTERED")

  6.  Update all non-supported data types (i.e. "DATETIME" becomes "TIMESTAMP") Also, 
this is a good time to get away from MONEY.It is supported in PostgreSQL, but is on 
its way out.Use NUMERIC(19,4).

7.  Replace the T-SQL batch terminator "GO" with the PostgreSQL batch terminator ";"

Put this file somewhere safe, and now let's get the data. 

5.Data

Data is data.  It is brought over in text form and cast into it's proper form by the 
database according to the datatypes you used in creating your tables.  If you have 
binary data, I am the wrong guy to ask.

There are a couple gotchas here too, of course.  Since we use the COPY command, and it 
interprets a newline as the end of a tuple, you need to clean out all those newlines 
lurking in your text fields in MS SQL Server.  This is easy enough to do.  Also, the 
data dump from MS SQL Server will use the standard cr/lf line terminator, which needs 
to be changed to lf or it will cause havoc in comparisons of strings, among other 
problems.  I did this the easy way, downloading the dumps to my machine running my 
favorite Unix-like operating system via ftp, which does this translation for you.  

The first step in dumping the data out of MS SQL Server is to type all the names of 
your fields into a text file on the Win32 machine.  You can cheat and issue:

"select name from sysobjects where type = 'U'" 

in Query Analyzer (ISQL-W) to get the list, then save the results to a file.  Then, 
write a handy little script to call bcp, the Bulk Copy Program.  Mine looks like this:

set file [open "C:\\inetpub\\ftproot\\tablelist.txt" r]
while {![eof $file]} {
    set table [gets $file]
    exec bcp <database>..$table out $table -c -k -S192.168.100.1 -Usa -Ppassword -r ~
}
close $file

This will dump all the listed tables into files of the same name in the current 
directory.  The -c flag means to use plain character format.  The -k flag tells bcp to 
"keep nulls".  This is important later when we import the data.  The -r is the "row 
terminator".  To make cleaning up the carriage returns easier, I use this to signal 
the end of a row.  I put this script in the C:\InetPub\ftproot directory, so I can go 
right to the next step.

From the Unix-like machine, start ftp and get the file listing you created earlier.  
Put it in a work directory.  Change to the new work directory and get the files:

ftp> lcd /home/homer/workdir
Local directory now /home/homer/workdir
ftp> fget tablelist.txt

This should download all of the data files to the work directory, magically converting 
line terminators to Unix compatible format.  If you can't use FTP, there are other 
ways to get files from here to there.  Just be advised that you may need a little sed 
script to fix the cr/lf problem.

Now, let's fix the embedded line feed issue.

#!/usr/pkg/bin/tclsh
set file [open tblnames r]
set flist [read -nonewline $file]
close $file
set flist [split $flist \n]
foreach f $flist {
    set file [open $f r]
    set data [read -nonewline $file]
    close $file
    regsub -all {\000} $data {} data
    regsub -all {\n} $data \\\n data
    regsub -all {~} $data \n data
    set file [open $f w]
    puts -nonewline $file $data
    close $file
}

The regsub lines are where the work gets done.  They replace all nulls (\000) with an 
empty string, then all linefeeds with a literal "\n" which will tell COPY what to do 
when we import the file, then my line terminators get replaced with a linefeed, which 
is what COPY is expecting.  There are cleaner and easier ways to do this, but you get 
the point.

Now, go back to the sql file you edited to create your database objects.  I assume it 
is on the Unix-like box at this point.  It should have a series of CREATE TABLE 
statements, followed by ALTER TABLE and CREATE INDEX, etc statements.  What we need to 
do now is tell it we want to load data after the tables are created, but before 
anything else.

For each CREATE TABLE statement, follow it with a COPY statment.  Something like

COPY tablename FROM '/home/homer/workdir/tablename' with null as '';

Once you have this done, execute it against your PostgreSQL database, something like

$ psql newdb < modifiedscript.sql &> outfile

should work.The output file is good to have for looking for problems.  It gets messy 
so

$ cat outfile | grep ERROR 

can give you an idea how things went.  I guarantee you have some troubleshooting to 
do.  

6.  Views

Views are pretty easy, as long as you didn't use too many functions in them.  A 
favorite of mine is isnull().  Like most functions, it has a PostgreSQL counterpart, 
coalesce().  A surprising number of functions will work just fine.  For example, 
round() is exactly the same.  datepart() becomes date_part(), but the arguments are 
the same, althought PostgreSQL may be more particular about format strings.  For 
example, SQL Server will accept datepart(yyyy, mydatefield) as well as datepart(year, 
mydatefield)  .  PostgreSQL wants to see date_part('year', mydatefield) (note single 
quotes).

Generating sql for views is pretty much the same as for tables.  From the Enterprise 
Manager, right click on your database and select 'All Tasks', then 'Generate SQL 
Scripts' from the context menu.  Uncheck 'Script All Objects', and select 'All Views'. 
 On the 'Formatting' tab, de-select 'Generate DROP...'.  On the 'Options' tab, Select 
the 'MS-DOS' file format, and make sure 'Create one file' is checked.  Click OK, give 
it a name, and put it somewhere you can find it.  

Run this file through the same script you created to clean the sql for your tables, 
and see if it will work on PostgreSQL.  If not, you will have to do some fixing of 
functions.

7.  Summary

Converting a database from MS SQL Server is not always easy.  It is, however, always 
worth it.  You will find PostgreSQL to be an extremely powerful and flexible product, 
with the best tech support in the world, the actual developers and users of the 
product.  If you spent days trying to get xp_sendmail to work on SQL Server version 
7.0, or wondered what was in those enormous "Service Packs" then you will appreciate 
this.

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Reply via email to