Re: [Bacula-users] Migrating form mysql to postgresql: Duplicate filename entries?
> On Wed, 29 Jun 2011 22:40:16 +0100, Gavin McCullagh said: > > The database was originally made in Ubuntu Hardy which was v2.x as I recall > now. We then upgraded to v3 (ubuntu packages built using the debian > package git archive as we needed volume shadow copy support under Win64) > and since then moved to v5 using the Lucid packages. > > So, yes, made by old versions. Am I missing a constraint I should have? No, the current MySQL table definitions don't have that constraint either. I think Bacula has very few constraints for performance reasons. I was wondering about old versions because that increases the chance that a bug in Bacula caused it long ago. Bacula's dbcheck program has an option to eliminate the duplicates, which suggests that it was a problem at some time in the past. __Martin -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Migrating form mysql to postgresql: Duplicate filename entries?
Hi, On Wed, 29 Jun 2011, Martin Simmons wrote: > These duplicates in the File table are probably generated by the batch insert > code. Since each pair has the same FileIndex, it should be safe to elide them > them. Fair enough, thanks. > > so perhaps this is safe enough. Does anyone know how these duplicates may > > have arisen and what the best way to proceed is? > > It is probably safe the remove the duplicates. > > It isn't clear how the original duplicates in the Filename arose though, but > possibly you had table corruption at some point? Was this database created by > an old version of Bacula? The database was originally made in Ubuntu Hardy which was v2.x as I recall now. We then upgraded to v3 (ubuntu packages built using the debian package git archive as we needed volume shadow copy support under Win64) and since then moved to v5 using the Lucid packages. So, yes, made by old versions. Am I missing a constraint I should have? Gavin -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Migrating form mysql to postgresql: Duplicate filename entries?
> On Mon, 27 Jun 2011 14:03:24 +0100, Gavin McCullagh said: > > I've been experimenting with a migration from MySQL to Postgres. > > One problem I've come across is that there are a handful of duplicate files > in the Filename table > ... > I could of course prune four duplicate lines from the data before > inserting, but I'm afraid of the possible effect on a future restore. > > It appears there are duplicate entries in the File database for each time > there > is a duplicate in the Filename table: These duplicates in the File table are probably generated by the batch insert code. Since each pair has the same FileIndex, it should be safe to elide them them. > so perhaps this is safe enough. Does anyone know how these duplicates may > have arisen and what the best way to proceed is? It is probably safe the remove the duplicates. It isn't clear how the original duplicates in the Filename arose though, but possibly you had table corruption at some point? Was this database created by an old version of Bacula? __Martin -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] Migrating form mysql to postgresql: Duplicate filename entries?
Hi, I've been experimenting with a migration from MySQL to Postgres. One problem I've come across is that there are a handful of duplicate files in the Filename table mysql> select count(*) as filecount, Filename.name from Filename GROUP BY Filename.Name ORDER BY filecount DESC LIMIT 30; +---+---+ | filecount | name | +---+---+ | 2 | 0 | | 2 | 1 | | 2 | 2 | | 2 | tt172.16.17.36-www_gmail_com.html | mysql> select * from Filename WHERE name IN ('0','1','2','tt172.16.17.36-www_gmail_com.html'); ++---+ | FilenameId | Name | ++---+ | 9247 | 0 | | 101380 | 0 | | 9248 | 1 | | 101381 | 1 | | 9249 | 2 | | 101382 | 2 | | 575752 | tt172.16.17.36-www_gmail_com.html | | 625369 | tt172.16.17.36-www_gmail_com.html | ++---+ and there doesn't appear to be any constraint preventing this. mysql> describe Filename; ++--+--+-+-++ | Field | Type | Null | Key | Default | Extra | ++--+--+-+-++ | FilenameId | int(10) unsigned | NO | PRI | NULL| auto_increment | | Name | blob | NO | MUL | NULL|| ++--+--+-+-++ 2 rows in set (0.00 sec) mysql> show create table Filename; +--+- | Filename | CREATE TABLE `Filename` ( `FilenameId` int(10) unsigned NOT NULL AUTO_INCREMENT, `Name` blob NOT NULL, PRIMARY KEY (`FilenameId`), KEY `Name` (`Name`(255)) ) ENGINE=MyISAM AUTO_INCREMENT=4277244 DEFAULT CHARSET=latin1 | +--+- However, the postgresql table does have this constraint and complains when I try to insert the data. bacula=# COPY Filename FROM '/var/tmp/bacula-backup/Filename.txt'; ERROR: duplicate key value violates unique constraint "filename_name_idx" CONTEXT: COPY filename, line 101380: "101380 0" I could of course prune four duplicate lines from the data before inserting, but I'm afraid of the possible effect on a future restore. It appears there are duplicate entries in the File database for each time there is a duplicate in the Filename table: mysql> SELECT * FROM File WHERE FilenameId IN (select FilenameId from Filename WHERE name IN ('0','1','2','tt172.16.17.36-www_gmail_com.html')) ; +---+---+---+-+++---++ | FileId| FileIndex | JobId | PathId | FilenameId | MarkId | LStat | MD5| +---+---+---+-+++---++ | 245079464 |207178 | 13471 |3870 | 9249 | 0 | gB iwKO IGk B Po Po A Hg BAA I BNTy22 BKRzfV BKR4jB A A E | R13Gb8BoM3TpapoeZW0FVQ | | 245079465 |207178 | 13471 |3870 | 101382 | 0 | gB iwKO IGk B Po Po A Hg BAA I BNTy22 BKRzfV BKR4jB A A E | R13Gb8BoM3TpapoeZW0FVQ | | 215364675 |173643 | 11277 |3870 | 9249 | 0 | gB iwKO IGk B Po Po A Hg BAA I BM/DvL BKRzfV BKR4jB A A E | R13Gb8BoM3TpapoeZW0FVQ | | 215364676 |173643 | 11277 |3870 | 101382 | 0 | gB iwKO IGk B Po Po A Hg BAA I BM/DvL BKRzfV BKR4jB A A E | R13Gb8BoM3TpapoeZW0FVQ | | 259000887 |290784 | 14315 | 745797 | 9248 | 0 | A A IH/ B A A A A A A BBg44b 22f3Y BBg44b A A M | UumkJoY8ZKJd6/HfnwvCDg | | 259000888 |290784 | 14315 | 745797 | 101381 | 0 | A A IH/ B A A A A A A BBg44b 22f3Y BBg44b A A M | UumkJoY8ZKJd6/HfnwvCDg | | 258833500 |123399 | 14315 | 807939 | 9248 | 0 | A A IH/ B A A A A A A BBamBf 22f3Y BBamBf A A M | G4vlZvkmC8DFxl8y0RsnfA | | 258833501 |123399 | 14315 | 807939 | 101381 | 0 | A A IH/ B A A A A A A BBamBf 22f3Y BBamBf A A M | G4vlZvkmC8DFxl8y0RsnfA | | 244900411 | 28129 | 13471 |1752 | 9249 | 0 | gB GgUm IGk B Po Po A DH BAA I BNTyw4 BJlpIE BJlpIE A A E | ohnd0rdHuoOlhnsfdNGaAw | | 244900412 | 28129 | 13471 |1752 | 101382 | 0 | gB GgUm IGk B Po Po A DH BAA I BNTyw4 BJlpIE BJlpIE A A E | ohnd0rdHuoOlhnsfdNGaAw | | 224916087 | 41066 |