I ran the DB check last week via this script, if you add the two indexes
at once it only has to rebuild the table once is takes about 50% of the
time.
#!/bin/sh
echo 'alter table File add index idxFI (FilenameId), add index idxPI
(PathId);' | /usr/bin/mysql -u bacula --password=xxxxxxxxxx bacula
/sbin/dbcheck -c /etc/bacula/bacula-dir.conf -b -f
echo 'alter table File drop index idxFI, drop index idxPIchk;'
| /usr/bin/mysql -u bacula --password=xxxxxxxxxxx bacula
Does anybody know how the temp batch table is created and if it has any
indexes on it to start? I don't know how to view the batch table as
it's a temp table and is only available to the session that created it.
I have no issues recompiling a new version of bacula if some debug or
timing code could be added.
That said the backs are working very well with batch mode disabled.
Thanks -Jason
On Mon, 2009-06-22 at 21:05 +0200, Bruno Friedmann wrote:
> Hi Jason, I'm looking around conf a table status, comparing it to some config
> here.
> But nothing special "wrong" appears.
> I'm just using the utf8_general_ci as default encoding for db & table.
>
>
> I've just two questions : did you already run the dbcheck ?
> (be carefull if not it could take days ... especially against your db )
>
> I have added this indexes ( taken from the wiki )
> CREATE INDEX idxPathId ON File (PathId);
> CREATE INDEX idxFilenameId ON File (FilenameId);
> CREATE INDEX idxJPF ON File (JobId, PathId, FilenameId);
> CREATE INDEX idxFJ ON File (FileId,JobId);
> CREATE INDEX idxPJ ON File (PathId,JobId);
> to have a dbcheck running quickly.
> But with the db size you have, building the indexes could be long ...
>
> Beside that, I don't explain why you have trouble.
> Perharps it's could be useful ( when your backup server are not backing :-)
> to retry the batch-insert query with explain to see how mysql would do it.
> ( the trouble is to capture the batch table of course ).
>
> CALL For mysql administrator who have an Enterprise DB contract.
> If someone has this type, then he can ask the question directly to mysql
> engineers. ?
>
>
>
>
> Jason A. Kates wrote:
> > Bruno,
> > Thanks for taking a look at the batch mysql issue. We started with inodb
> > and moved to MyISAM. MyISAM seemed marginally faster. I have included
> > the output of show table status.
> >
> > Without the batch my troubled backup is no longer troubled it's working
> > at a decent rate based on the number of small files it has. If this
> > was the same size with big files the transfer rates would be faster etc.
> >
> > 22-Jun 06:58 backup-server JobId 70468: Sending spooled attrs to the
> > Director. Despooling 2,897,722,241 bytes ...
> > 22-Jun 07:33 backup-server JobId 70468: Bacula qwest-netbackup-dir 3.0.2
> > (18Jun09): 22-Jun-2009 07:33:15
> > Build OS: x86_64-unknown-linux-gnu redhat Enterprise
> > release
> > JobId: 70468
> > Job: backup-client1.2009-06-22_01.03.05_56
> > Backup Level: Full
> > Client: "backup-client1" 3.0.1 (30Apr09)
> > x86_64-unknown-linux-gnu,redhat,Enterprise release
> > FileSet: "LINUX_ALL_LOCAL_FS" 2009-05-17 11:11:12
> > Pool: "Daily" (From Job resource)
> > Catalog: "MyCatalog" (From Client resource)
> > Storage: "Autochanger" (From Job resource)
> > Scheduled time: 22-Jun-2009 01:03:05
> > Start time: 22-Jun-2009 01:03:20
> > End time: 22-Jun-2009 07:33:15
> > Elapsed time: 6 hours 29 mins 55 secs
> > Priority: 20
> > FD Files Written: 9,448,749
> > SD Files Written: 9,448,749
> > FD Bytes Written: 242,193,953,548 (242.1 GB)
> > SD Bytes Written: 243,431,447,949 (243.4 GB)
> > Rate: 10352.4 KB/s
> > Software Compression: None
> > VSS: no
> > Encryption: no
> > Accurate: yes
> > Volume name(s): CA3833L4|CA3831L4
> > Volume Session Id: 213
> > Volume Session Time: 1245547751
> > Last Volume Bytes: 871,293,198,336 (871.2 GB)
> > Non-fatal FD errors: 1
> > SD Errors: 0
> > FD termination status: OK
> > SD termination status: OK
> > Termination: Backup OK -- with warnings
> >
> > -Jason
> >
> > On Mon, 2009-06-22 at 09:02 +0200, Bruno Friedmann wrote:
> >> hi Jason, hardware should be more than sufficient.
> >>
> >> Could you post your my.cnf ?
> >>
> >> and tell what type of db is used (mysisam / inodb )
> >>
> >> ps : I'm going to work, so it could take time before I answer ..
> >>
> >> Jason A. Kates wrote:
> >>> In my /etc/my.cnf I have this set:
> >>> tmpdir = /tmp
> >>>
> >>> /tmp is a memory file system to avoid I/O contention.
> >>>
> >>> When looking you can see that temp tables are create in /tmp
> >>> thus /var/tmp should be out of the loop.
> >>>
> >>> The db resides in /export/mysql which is part of a 6 disk 15K rpm
> >>> hardware raid 10 disk set.
> >>>
> >>> df -Th
> >>> Filesystem Type Size Used Avail Use% Mounted on
> >>> /dev/cciss/c0d0p1
> >>> ext3 996M 489M 456M 52% /
> >>> /dev/mapper/Volume00-vartest
> >>> ext3 2.0G 368M 1.5G 20% /var
> >>> /dev/mapper/Volume00-export
> >>> ext3 186G 47G 130G 27% /export
> >>> /dev/mapper/Volume00-usr
> >>> ext3 3.9G 1.8G 2.0G 47% /usr
> >>> none tmpfs 8.0G 8.0K 8.0G 1% /tmp
> >>> /dev/cciss/c1d0p1
> >>> ext3 943G 316G 579G 36% /var/spool/bacula
> >>>
> >>>
> >>> We are running Redhat ES 5 with Kernel 2.6.18-128.el5.
> >>> If you would like to test a mount option I would be glad to give it a
> >>> test.
> >>>
> >>> -Jason
> >>>
> >>> On Mon, 2009-06-22 at 07:38 +0200, Bruno Friedmann wrote:
> >>>> Hi Jason, can you tell us if the place mysql works typically
> >>>> /var/lib/mysql
> >>>> are on the same partition of /tmp or /var/tmp ?
> >>>>
> >>>> I've found (in another project) with mysql myisam table you can kill
> >>>> disk io
> >>>> if the two are on same disk. It's worst when you insert lot of data from
> >>>> the same.
> >>>>
> >>>> In a perfect world after tunning carefully the /etc/my.cnf ( huge isn't
> >>>> suffisant in many big case )
> >>>> you should have one full io for mysql temp files, one full io for
> >>>> indexes one full io for tables
> >>>> AND one full io for import data ( completely separate /tmp for example ).
> >>>>
> >>>> In many installation, /tmp and /var are on the same device if you add
> >>>> the fact of lvm or raid 1 configuration
> >>>> you get some io trouble.
> >>>>
> >>>> I wouldn't explain all fine tuning of mysql here, there's a lot and each
> >>>> off them should be review & tested.
> >>>>
> >>>> Next there's also ext3 tuning for those who use it as many distro are
> >>>> very conservative.
> >>>> (Newer kernel >2.6.30 would reduce the impact)
> >>>> atime,diratime,data=ordered etc ... could change life :-)
> >>>>
> >>>>
> >>>> First time bacula came with batch-insert enabled, I've test it, and have
> >>>> really poor performances.
> >>>> So I disable it. ( lvm on top of soft raid 1 ) my mistake to not mention
> >>>> it on user/devel list.
> >>>>
> >>>> In some future here we decide to migrate all possible application to
> >>>> postgresql.
> >>>>
> >>>> For debugging and making some sort of test case, it would be nice to
> >>>> have a "joke" dump database
> >>>> containing lot of dir & files.
> >>>> so we can test large queries ....
> >>>>
> >>>> Perhaps someone with very large records could do it using the md5() on
> >>>> Filename and directory ?
> >>>>
> >>>>
> >>>>
> >>>> Jason A. Kates wrote:
> >>>>> Based on following the thread I recompile bacula with the following
> >>>>> options:
> >>>>> ./configure --enable-smartalloc --with-mysql
> >>>>> --with-tcp-wrappers=/usr/lib/libwrap.so --enable-static-fd
> >>>>> --enable-static-dir --enable-static-sd --disable-batch-insert
> >>>>> And my 18+ hour backup ran in 9 hours 22 mins 27 secs last night. It
> >>>>> seems to be back to the pre 3.0 time.
> >>>>>
> >>>>> As for my tables I have an Admin job that optimizes my bacula tables
> >>>>> within mysql.
> >>>>> Job {
> >>>>> Name = optimize_table_JobMedia
> >>>>> JobDefs = "DefaultJob"
> >>>>> Type = Admin
> >>>>> Client = bacula-local-fd
> >>>>> RunBeforeJob = "/etc/bacula/optimize_table_JobMedia.sh"
> >>>>> Priority = 40
> >>>>> Cancel Queued Duplicates=yes
> >>>>> Maximum Concurrent Jobs = 1
> >>>>> Run Before Job = "/etc/bacula/check-duplicate-queued-job.sh '%n'
> >>>>> '%i'"
> >>>>> }
> >>>>>
> >>>>> Please let me know what else I can do to provide useful data for the
> >>>>> improvement of bacula (Worlds best backup software!).
> >>>>> Thanks -Jason
> >>>>>
> >>>>> On Sun, 2009-06-21 at 08:11 +0200, Marc Cousin wrote:
> >>>>>> First thing to know is which of the queries takes time (there are the
> >>>>>> multiple
> >>>>>> inserts into the batch table, then insert missing filenames/path, and
> >>>>>> the
> >>>>>> final insert). Or is it slow with all the queries ? Which storage
> >>>>>> engine ?
> >>>>>>
> >>>>>> There are two possibilities :
> >>>>>>
> >>>>>> - either you have a bad plan for one of the queries (but I don't see
> >>>>>> how,
> >>>>>> mysql having a very simple optimizer).
> >>>>>> - or you have a tuning problem with mysql (there I can't really help
> >>>>>> you : not
> >>>>>> enough memory ? fragmented indexes ? I don't really know how mysql
> >>>>>> works)
> >>>>>>
> >>>>>> Kern, I have not been following closely the 3.0 development : do the
> >>>>>> xattr go
> >>>>>> into the lstat field ?
> >>>>>>
> >>>>>> On Sunday 21 June 2009 01:52:34 Jason A. Kates wrote:
> >>>>>>> I have a backup that runs for 3 hours then spends another 15 hours in
> >>>>>>> status "Dir inserting Attributes".
> >>>>>>>
> >>>>>>> The backup has 9,442,578 files and 239,718,461,832 bytes. It was
> >>>>>>> much
> >>>>>>> faster with bacula before and I thought that that the time change was
> >>>>>>> due to additional data with xattrsupport that didn't exist before
> >>>>>>> 3.0.
> >>>>>>> The job a spool job with spooled attributes too.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> We are using mysql-server-5.0.45-7.el5. I would be glad to collect
> >>>>>>> and
> >>>>>>> share data. Please let me know what you sould like to me set or do to
> >>>>>>> get the appropriate data.
> >>>>>>>
> >>>>>>> Thanks -Jason
> >>>>>>>
> >>>>>>> On Sat, 2009-06-20 at 17:48 +0200, Marc Cousin wrote:
> >>>>>>>> There may be something. But first we have to know why mysql behaves
> >>>>>>>> that
> >>>>>>>> slowly for Tom Summers. The problem being that I don't have a big
> >>>>>>>> mysql
> >>>>>>>> database to test on ... So as long as we don't have the query plan
> >>>>>>>> or an
> >>>>>>>> explanation ...
> >>>>>>>>
> >>>>>>>> On Saturday 20 June 2009 14:36:44 Kern Sibbald wrote:
> >>>>>>>>> Hello Marc,
> >>>>>>>>>
> >>>>>>>>> Thanks for looking at this and for your responses. If I understand
> >>>>>>>>> you
> >>>>>>>>> correctly, there is nothing to be done, which does not surprise me
> >>>>>>>>> :-)
> >>>>>>>>>
> >>>>>>>>> Best regards,
> >>>>>>>>>
> >>>>>>>>> Kern
> >>>>>>>>>
> >>>>>>>>> On Saturday 20 June 2009 13:16:44 Marc Cousin wrote:
> >>>>>>>>>>> Subject: Re: [Bacula-users] Performance with MySQL queries since
> >>>>>>>>>>> 3.0.0 (Dir inserting attributes hang)
> >>>>>>>>>>> Date: Saturday 20 June 2009
> >>>>>>>>>>> From: Kern Sibbald <[email protected]>
> >>>>>>>>>>> To: Tom Sommer <[email protected]>
> >>>>>>>>>>>
> >>>>>>>>>>> On Saturday 20 June 2009 08:51:53 Tom Sommer wrote:
> >>>>>>>>>>>> Tom Sommer wrote:
> >>>>>>>>>>>>> Mike Holden wrote:
> >>>>>>>>>>>>>> Jari Fredriksson wrote:
> >>>>>>>>>>>>>>>> INSERT INTO Filename( Name )
> >>>>>>>>>>>>>>>> SELECT a.Name
> >>>>>>>>>>>>>>>> FROM (
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> SELECT DISTINCT Name
> >>>>>>>>>>>>>>>> FROM batch
> >>>>>>>>>>>>>>>> ) AS a
> >>>>>>>>>>>>>>>> WHERE NOT
> >>>>>>>>>>>>>>>> EXISTS (
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> SELECT Name
> >>>>>>>>>>>>>>>> FROM Filename AS f
> >>>>>>>>>>>>>>>> WHERE f.Name = a.Name
> >>>>>>>>>>>>>>>> )
> >>>>>>>>>>>>>>> That looks silly.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I would write it shorter as
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> INSERT INTO Filename(Name)
> >>>>>>>>>>>>>>> SELECT DISTINCT Name
> >>>>>>>>>>>>>>> FROM batch AS a
> >>>>>>>>>>>>>>> WHERE NOT EXISTS
> >>>>>>>>>>>>>>> (
> >>>>>>>>>>>>>>> SELECT Name
> >>>>>>>>>>>>>>> FROM Filename AS f
> >>>>>>>>>>>>>>> WHERE f.Name = a.Name
> >>>>>>>>>>>>>>> )
> >>>>>>>>>>>>>> You may also want to consider using a JOIN rather than a
> >>>>>>>>>>>>>> subquery with a NOT EXISTS, something like (untested and
> >>>>>>>>>>>>>> unvalidated!):
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> INSERT INTO filename(name)
> >>>>>>>>>>>>>> SELECT DISTINCT name
> >>>>>>>>>>>>>> FROM batch AS a
> >>>>>>>>>>>>>> LEFT JOIN filename AS f USING (name)
> >>>>>>>>>>>>>> WHERE f.name IS NULL
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I know from experience as an Oracle DBA (my day job) that this
> >>>>>>>>>>>>>> can often produce far more efficient results.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Of course, all options need testing for both speed of
> >>>>>>>>>>>>>> execution and resource usage, bearing in mind that data varies
> >>>>>>>>>>>>>> from one installation to another, and one size may not fit
> >>>>>>>>>>>>>> all!
> >>>>>>>>>>>>> Good suggestions, sounds like there might be an overall
> >>>>>>>>>>>>> performance problem with the current query for batch-inserts
> >>>>>>>>>>>>> with lots of data. I'm a bit unsure if I dare test these
> >>>>>>>>>>>>> queries on my current installation.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I'll CC Kern on the thread, perhaps he has some insights.
> >>>>>>>>>>>> So this morning I had to kill the above query because it's been
> >>>>>>>>>>>> running for 24+ hours, preventing the new daily jobs from
> >>>>>>>>>>>> running. I think I'm going to try and disable batch-inserts, the
> >>>>>>>>>>>> current situation is simply not good enough, it's become a major
> >>>>>>>>>>>> headache to run backups suddenly. When I hit the end of this
> >>>>>>>>>>>> month and ALL servers have to run FULL backups, I'm gonna be in a
> >>>>>>>>>>>> world of trouble I think - I just don't understand what has
> >>>>>>>>>>>> changed, because it's all been running great up until now.
> >>>>>>>>>>>>
> >>>>>>>>>>>> // Tom
> >>>>>>>>>>> Hello,
> >>>>>>>>>>>
> >>>>>>>>>>> We certainly can use help optimizing SQL since we are not DBAs,
> >>>>>>>>>>> and
> >>>>>>>>>>> we will look into the SQL optimization suggestions given above,
> >>>>>>>>>>> keeping in mind that there are often rather radical differences in
> >>>>>>>>>>> timing of particular SQL queries depending on the database engine
> >>>>>>>>>>> used.
> >>>>>>>>>>>
> >>>>>>>>>>> To the best of my knowledge nothing has changed in terms of the
> >>>>>>>>>>> Batch insert queries since when it was implemented, and it is
> >>>>>>>>>>> *very* unlikely (though I haven't checked the code) that something
> >>>>>>>>>>> changed from 2.4.4 to 3.0.x.
> >>>>>>>>>>>
> >>>>>>>>>>> More likely, your workload or MySQL has changed in some way --
> >>>>>>>>>>> e.g.
> >>>>>>>>>>> more Jobs, more machines backed up, Director machine with less
> >>>>>>>>>>> memory or other jobs that use memory, a new version of MySQL, ...
> >>>>>>>>>>>
> >>>>>>>>>>> - I would suggest that you ensure that your database has all the
> >>>>>>>>>>> recommended indexes (see the make_mysql_tables file), and that you
> >>>>>>>>>>> are running with the large memory /etc/my.cnf file.
> >>>>>>>>>>>
> >>>>>>>>>>> - Another thing to do is to compact your database. One way to do
> >>>>>>>>>>> it is to write it to an ASCII file and then re-insert it.
> >>>>>>>>>>>
> >>>>>>>>>>> - If you are running certain programs that create and delete lots
> >>>>>>>>>>> of temporary files with different names, you Filename table may
> >>>>>>>>>>> need cleaning.
> >>>>>>>>>>>
> >>>>>>>>>>> - I would strongly recommend not starting *lots* of Full backups
> >>>>>>>>>>> at
> >>>>>>>>>>> the same time or on the same day. By lots, I mean more than say
> >>>>>>>>>>> 10
> >>>>>>>>>>> or 20 (depends on the size of your system). It is generally far
> >>>>>>>>>>> better to stage 1/4 of the backup every week for a full backup so
> >>>>>>>>>>> that the peak workload is spread out over the month.
> >>>>>>>>>>>
> >>>>>>>>>>> - If the bottleneck is in MySQL, you might consider moving it to
> >>>>>>>>>>> another machine that has more memory and faster disks.
> >>>>>>>>>>>
> >>>>>>>>>>> - If you really have a huge number of backups (say 50 or more)
> >>>>>>>>>>> that
> >>>>>>>>>>> all run at the same time, it might be advisable to consider using
> >>>>>>>>>>> PostgreSQL, but in that case, you will probably need an onsite DBA
> >>>>>>>>>>> to properly tune and maintain it.
> >>>>>>>>>>>
> >>>>>>>>>>> - Regardless of what hardware you have, there are certain
> >>>>>>>>>>> limitations on how many simultaneous jobs you can run (this kicks
> >>>>>>>>>>> in on many systems around 50). Once a certain number is exceeded,
> >>>>>>>>>>> the total throughput can rather radically decrease so careful
> >>>>>>>>>>> monitoring is necessary. Bweb can help a lot in these situations.
> >>>>>>>>>>>
> >>>>>>>>>>> - If you are really "in a world of trouble" and it is a
> >>>>>>>>>>> performance
> >>>>>>>>>>> issue, there is not much we (the Bacula project) can do for you
> >>>>>>>>>>> other than the above tips. However, ($$) Bacula Systems has tools
> >>>>>>>>>>> that can help more precisely identify bottlenecks and help balance
> >>>>>>>>>>> loads.
> >>>>>>>>>>>
> >>>>>>>>>>> Regards,
> >>>>>>>>>>>
> >>>>>>>>>>> Kern
> >>>>>>>>>>>
> >>>>>>>>>>> -------------------------------------------------------
> >>>>>>>>>> Sorry Kern, I wasn't monitoring this thread (mysql in the title
> >>>>>>>>>> :) )
> >>>>>>>>>>
> >>>>>>>>>> I wrote these queries, so maybe I should explain (not why I'm
> >>>>>>>>>> right,
> >>>>>>>>>> I'm not sure of that, but why we've done it this way).
> >>>>>>>>>>
> >>>>>>>>>> A few forewords :
> >>>>>>>>>> - batch insert was created at the begining for postgresql. I don't
> >>>>>>>>>> have any experience with mysql, except that knowing at the moment
> >>>>>>>>>> we
> >>>>>>>>>> programmed batch insert it didn't really do anything but nested
> >>>>>>>>>> loops... - The main reason for batch inserts with postgresql was to
> >>>>>>>>>> use COPY instead of INSERT, which is much faster (we also had good
> >>>>>>>>>> results with stored procedures, but batch insert was even faster)
> >>>>>>>>>> - our current file table is nearly 200GB (1 billion rows), with a
> >>>>>>>>>> 250GB database, and we insert around 50 million records a day
> >>>>>>>>>> (rough
> >>>>>>>>>> estimate)
> >>>>>>>>>>
> >>>>>>>>>> * First question I've seen in the thread:
> >>>>>>>>>>
> >>>>>>>>>> Why
> >>>>>>>>>>
> >>>>>>>>>> INSERT INTO Filename( Name )
> >>>>>>>>>> SELECT a.Name
> >>>>>>>>>> FROM (
> >>>>>>>>>> SELECT DISTINCT Name
> >>>>>>>>>> FROM batch
> >>>>>>>>>> ) AS a
> >>>>>>>>>> WHERE NOT EXISTS (
> >>>>>>>>>> SELECT Name
> >>>>>>>>>> FROM Filename AS f
> >>>>>>>>>> WHERE f.Name = a.Name
> >>>>>>>>>> )
> >>>>>>>>>>
> >>>>>>>>>> instead of the much simpler
> >>>>>>>>>>
> >>>>>>>>>> INSERT INTO Filename(Name)
> >>>>>>>>>> SELECT DISTINCT Name
> >>>>>>>>>> FROM batch AS a
> >>>>>>>>>> WHERE NOT EXISTS
> >>>>>>>>>> (
> >>>>>>>>>> SELECT Name
> >>>>>>>>>> FROM Filename AS f
> >>>>>>>>>> WHERE f.Name = a.Name
> >>>>>>>>>> )
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> => Because the first one removes the duplicate rows from batch
> >>>>>>>>>> table
> >>>>>>>>>> before checking they exist in the filename table, while the second
> >>>>>>>>>> one checks after. Performance was better in benchmarks with the
> >>>>>>>>>> first
> >>>>>>>>>> one (with PostgreSQL, with our data).
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> * Second question :
> >>>>>>>>>> Why not :
> >>>>>>>>>>
> >>>>>>>>>> INSERT INTO filename(name)
> >>>>>>>>>> SELECT DISTINCT name
> >>>>>>>>>> FROM batch AS a
> >>>>>>>>>> LEFT JOIN filename AS f USING (name)
> >>>>>>>>>> WHERE f.name IS NULL
> >>>>>>>>>>
> >>>>>>>>>> In fact, if the engine choses the right execution path, the
> >>>>>>>>>> execution
> >>>>>>>>>> plan will be the roughly the same (nested loop) as
> >>>>>>>>>>
> >>>>>>>>>> INSERT INTO Filename(Name)
> >>>>>>>>>> SELECT DISTINCT Name
> >>>>>>>>>> FROM batch AS a
> >>>>>>>>>> WHERE NOT EXISTS
> >>>>>>>>>> (
> >>>>>>>>>> SELECT Name
> >>>>>>>>>> FROM Filename AS f
> >>>>>>>>>> WHERE f.Name = a.Name
> >>>>>>>>>> )
> >>>>>>>>>>
> >>>>>>>>>> Except if batch is close in size to filename (for instance on my
> >>>>>>>>>> bacula database, it would be a 50 million files backup). Anyway,
> >>>>>>>>>> If i
> >>>>>>>>>> recall correctly about mysql (and it hasn't changed recently),
> >>>>>>>>>> mysql
> >>>>>>>>>> will do a nested loop for both, as it doesn't know how to merge or
> >>>>>>>>>> hash join ...
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> But I agree that this second syntax theorically gives more
> >>>>>>>>>> lattitude
> >>>>>>>>>> to the engine. Except that recent oracle and postgresql versions
> >>>>>>>>>> will
> >>>>>>>>>> anyway be able to convert between the exists/not exists and the
> >>>>>>>>>> join
> >>>>>>>>>> (or IN) in this case, as they are semantically equivalent.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Now, I don't know how mysql would react to all these queries. An
> >>>>>>>>>> explain plan would be great to understand... but I would have
> >>>>>>>>>> thought
> >>>>>>>>>> that the NOT EXIST query would have been great for it's optimizer,
> >>>>>>>>>> as
> >>>>>>>>>> it tells mysql exactly what to do :)
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> A last note, for postgreSQL users on this mailing list :
> >>>>>>>>>> You can try to get better performance with batch inserts by
> >>>>>>>>>> lowering
> >>>>>>>>>> seq_page_cost and random_page_cost, so that the engine favours
> >>>>>>>>>> nested
> >>>>>>>>>> loop instead of hash join for the INSERT INTO File query : most of
> >>>>>>>>>> the time, most of filename and path will be in the database or
> >>>>>>>>>> filesystem cache, and the plan will be much faster (and save you
> >>>>>>>>>> big
> >>>>>>>>>> sorts)
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Please tell me if you need more information.
> >>>>>>>>>>
> >>
>
> >
>
>
> --
>
> Bruno Friedmann
>
>
>
> ------------------------------------------------------------------------------
> Are you an open source citizen? Join us for the Open Source Bridge conference!
> Portland, OR, June 17-19. Two days of sessions, one day of unconference: $250.
> Need another reason to go? 24-hour hacker lounge. Register today!
> http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org
> _______________________________________________
> Bacula-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/bacula-devel
--
----------------------------------------------------------------------------
Jason A. Kates ([email protected])
Fax: 208-975-1514
Phone: 212-400-1670 x2
============================================================================
------------------------------------------------------------------------------
Are you an open source citizen? Join us for the Open Source Bridge conference!
Portland, OR, June 17-19. Two days of sessions, one day of unconference: $250.
Need another reason to go? 24-hour hacker lounge. Register today!
http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel