Re: [osmosis-dev] Slow to import into a postgres api schema

Brett Henderson Sun, 21 Feb 2010 03:43:59 -0800

Hi Kai,

Nice work on this.  Some comments below.

On Sun, Feb 21, 2010 at 9:31 PM, Kai Krueger <kakrue...@gmail.com> wrote:

> Hello,
>
> importing large quantities of data (e.g. a country extract or the whole
> planet) into the postgres api schema is quite slow. The import of a
> planet on the dev server has been running nearly a week.
>
> I have tried to see if performance can be improved and the two main
> things I noticed were that the postgres DatabaseContext doesn't support
> disabling the indices, and doesn't use the Copy command  that is
> supposed to be faster for bulk imports.
>

Most of the original code was targeted at MySQL, and in the port across some
of the performance features were lost.  Among them was the ability to
disable indexes because PostgreSQL doesn't appear to provide a nice way to
disable and re-enable without doing a full drop and create.

I haven't had a chance to look at direct COPY support.  It sounds like the
right way to go.  The --write-pgsql-dump command (used for a difference
schema) is capable of producing COPY compatible dump files so could provide
some re-usable code.

>
> As a proof of concept, I added statements into disableIndexes to
> manually drop each index and then recreate them in enableIndexes.
> Together with using the Copy command (supported in the postgres 8.4 JDBC
> driver), my initial experiments show a speedup of 3 - 4 times on the
> initial population of the tables (i.e. without populating the current
> tables, but I suspect that this step can be similarly sped up). These
> numbers were obtained using small country extracts (e.g. 1 - 20 Mb in
> bz2 size), but I would guess that they hold up with the full planet
> imports too.
>
> The main benefit comes from disabling the indecies, and the copy command
> seems less important.
>

>
> The patch I have is quite ugly (and untested for correctness), as it
> breaks the levels of abstraction and has to hard code all the available
> indecies. So my question is, what would be the best way to do this in a
> clean way? Looking at the speedups obtained and the time involved in
> imports, it seems like it might be worth it.
>

If it's truly 3-4 times faster then it's worth a lot of effort.  I don't
have a lot of time to get involved in this myself though, so if you have
some time to write a maintainable patch, then I'd be very grateful.  My only
ask is that you stick around to get it working and provide support until it
is proven stable.

A few comments:

   - I'm not too concerned about MySQL support any more, so a lot of the old
   code still dealing with MySQL could be dropped in my opinion.  There are a
   few users still using the MySQL database, but the MySQL schema is not
   receiving further updates, and it is supported by old versions of Osmosis
   anyway.
   - Keeping aligned with the indexes is the biggest issue because they do
   change from time to time.  Currently Osmosis is not very tied to the schema
   version, but this could change that.  At a minimum I'd like to see the index
   names and index creation SQL commands separated into a properties file
   within Osmosis and not buried in code to make it simpler to keep up to
   date.  For extra points, it would be good to be able to override this
   property file on the command line and provide your own at runtime in case
   the schema changes or you wish to experiment with different types of index
   modifications.
   - I'd prefer to make this a new task within Osmosis.  The current one is
   called --write-apidb I think.  You could create a new one called something
   like --write-apidb-fast.  If you can get it stable and it works well, then
   we can point --write-apidb at your task and delete the current one.

As for the patch to add progress information to XmlReader, can you just use
the --log-progress task instead?  I'd strongly prefer to keep each task
doing one thing only.  If you modified XmlReader to log progress, then you'd
have to modify all other similar classes for consistency.

Brett

_______________________________________________
osmosis-dev mailing list
osmosis-...@openstreetmap.org
http://lists.openstreetmap.org/listinfo/osmosis-dev

Re: [osmosis-dev] Slow to import into a postgres api schema

Reply via email to