[Bacula-users] Migrating form mysql to postgresql: Duplicate filename entries?

2011-06-27 Thread Gavin McCullagh
Hi,

I've been experimenting with a migration from MySQL to Postgres.

One problem I've come across is that there are a handful of duplicate files
in the Filename table 

mysql> select count(*) as filecount, Filename.name from Filename  GROUP BY 
Filename.Name ORDER BY filecount DESC LIMIT 30;
+---+---+
| filecount | name  |
+---+---+
| 2 | 0 |
| 2 | 1 |
| 2 | 2 |
| 2 | tt172.16.17.36-www_gmail_com.html |

mysql> select * from Filename WHERE name IN 
('0','1','2','tt172.16.17.36-www_gmail_com.html');
++---+
| FilenameId | Name  |
++---+
|   9247 | 0 |
| 101380 | 0 |
|   9248 | 1 |
| 101381 | 1 |
|   9249 | 2 |
| 101382 | 2 |
| 575752 | tt172.16.17.36-www_gmail_com.html |
| 625369 | tt172.16.17.36-www_gmail_com.html |
++---+

and there doesn't appear to be any constraint preventing this.

mysql> describe Filename;
++--+--+-+-++
| Field  | Type | Null | Key | Default | Extra  |
++--+--+-+-++
| FilenameId | int(10) unsigned | NO   | PRI | NULL| auto_increment |
| Name   | blob | NO   | MUL | NULL||
++--+--+-+-++
2 rows in set (0.00 sec)

mysql> show create table Filename;
+--+-
| Filename | CREATE TABLE `Filename` (
  `FilenameId` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `Name` blob NOT NULL,
  PRIMARY KEY (`FilenameId`),
  KEY `Name` (`Name`(255))
) ENGINE=MyISAM AUTO_INCREMENT=4277244 DEFAULT CHARSET=latin1 |
+--+-

However, the postgresql table does have this constraint and complains when
I try to insert the data.

bacula=# COPY Filename FROM '/var/tmp/bacula-backup/Filename.txt';
ERROR:  duplicate key value violates unique constraint "filename_name_idx"
CONTEXT:  COPY filename, line 101380: "101380   0"

I could of course prune four duplicate lines from the data before
inserting, but I'm afraid of the possible effect on a future restore.

It appears there are duplicate entries in the File database for each time there
is a duplicate in the Filename table:

mysql> SELECT * FROM File WHERE FilenameId IN (select FilenameId from Filename 
WHERE name IN ('0','1','2','tt172.16.17.36-www_gmail_com.html')) ;
+---+---+---+-+++---++
| FileId| FileIndex | JobId | PathId  | FilenameId | MarkId | LStat 
| MD5|
+---+---+---+-+++---++
| 245079464 |207178 | 13471 |3870 |   9249 |  0 | gB iwKO IGk B 
Po Po A Hg BAA I BNTy22 BKRzfV BKR4jB A A E | R13Gb8BoM3TpapoeZW0FVQ |
| 245079465 |207178 | 13471 |3870 | 101382 |  0 | gB iwKO IGk B 
Po Po A Hg BAA I BNTy22 BKRzfV BKR4jB A A E | R13Gb8BoM3TpapoeZW0FVQ |
| 215364675 |173643 | 11277 |3870 |   9249 |  0 | gB iwKO IGk B 
Po Po A Hg BAA I BM/DvL BKRzfV BKR4jB A A E | R13Gb8BoM3TpapoeZW0FVQ |
| 215364676 |173643 | 11277 |3870 | 101382 |  0 | gB iwKO IGk B 
Po Po A Hg BAA I BM/DvL BKRzfV BKR4jB A A E | R13Gb8BoM3TpapoeZW0FVQ |
| 259000887 |290784 | 14315 |  745797 |   9248 |  0 | A A IH/ B A A 
A A A A BBg44b 22f3Y BBg44b A A M   | UumkJoY8ZKJd6/HfnwvCDg |
| 259000888 |290784 | 14315 |  745797 | 101381 |  0 | A A IH/ B A A 
A A A A BBg44b 22f3Y BBg44b A A M   | UumkJoY8ZKJd6/HfnwvCDg |
| 258833500 |123399 | 14315 |  807939 |   9248 |  0 | A A IH/ B A A 
A A A A BBamBf 22f3Y BBamBf A A M   | G4vlZvkmC8DFxl8y0RsnfA |
| 258833501 |123399 | 14315 |  807939 | 101381 |  0 | A A IH/ B A A 
A A A A BBamBf 22f3Y BBamBf A A M   | G4vlZvkmC8DFxl8y0RsnfA |
| 244900411 | 28129 | 13471 |1752 |   9249 |  0 | gB GgUm IGk B 
Po Po A DH BAA I BNTyw4 BJlpIE BJlpIE A A E | ohnd0rdHuoOlhnsfdNGaAw |
| 244900412 | 28129 | 13471 |1752 | 101382 |  0 | gB GgUm IGk B 
Po Po A DH BAA I BNTyw4 BJlpIE BJlpIE A A E | ohnd0rdHuoOlhnsfdNGaAw |
| 224916087 | 41066 | 

Re: [Bacula-users] Migrating form mysql to postgresql: Duplicate filename entries?

2011-06-29 Thread Martin Simmons
> On Mon, 27 Jun 2011 14:03:24 +0100, Gavin McCullagh said:
> 
> I've been experimenting with a migration from MySQL to Postgres.
> 
> One problem I've come across is that there are a handful of duplicate files
> in the Filename table 
> ...
> I could of course prune four duplicate lines from the data before
> inserting, but I'm afraid of the possible effect on a future restore.
> 
> It appears there are duplicate entries in the File database for each time 
> there
> is a duplicate in the Filename table:

These duplicates in the File table are probably generated by the batch insert
code.  Since each pair has the same FileIndex, it should be safe to elide them
them.


> so perhaps this is safe enough.  Does anyone know how these duplicates may
> have arisen and what the best way to proceed is?

It is probably safe the remove the duplicates.

It isn't clear how the original duplicates in the Filename arose though, but
possibly you had table corruption at some point?  Was this database created by
an old version of Bacula?

__Martin

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Migrating form mysql to postgresql: Duplicate filename entries?

2011-06-29 Thread Gavin McCullagh
Hi,

On Wed, 29 Jun 2011, Martin Simmons wrote:

> These duplicates in the File table are probably generated by the batch insert
> code.  Since each pair has the same FileIndex, it should be safe to elide them
> them.

Fair enough, thanks.

> > so perhaps this is safe enough.  Does anyone know how these duplicates may
> > have arisen and what the best way to proceed is?
> 
> It is probably safe the remove the duplicates.
> 
> It isn't clear how the original duplicates in the Filename arose though, but
> possibly you had table corruption at some point?  Was this database created by
> an old version of Bacula?

The database was originally made in Ubuntu Hardy which was v2.x as I recall
now.  We then upgraded to v3 (ubuntu packages built using the debian
package git archive as we needed volume shadow copy support under Win64)
and since then moved to v5 using the Lucid packages.

So, yes, made by old versions.  Am I missing a constraint I should have?

Gavin


--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Migrating form mysql to postgresql: Duplicate filename entries?

2011-06-30 Thread Martin Simmons
> On Wed, 29 Jun 2011 22:40:16 +0100, Gavin McCullagh said:
> 
> The database was originally made in Ubuntu Hardy which was v2.x as I recall
> now.  We then upgraded to v3 (ubuntu packages built using the debian
> package git archive as we needed volume shadow copy support under Win64)
> and since then moved to v5 using the Lucid packages.
> 
> So, yes, made by old versions.  Am I missing a constraint I should have?

No, the current MySQL table definitions don't have that constraint either.  I
think Bacula has very few constraints for performance reasons.

I was wondering about old versions because that increases the chance that a
bug in Bacula caused it long ago.  Bacula's dbcheck program has an option to
eliminate the duplicates, which suggests that it was a problem at some time in
the past.

__Martin

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users