Hi,

Esbach, Brandon wrote:
Hi all,

I'm guessing this is a fairly common question, but I've drawn a blank so
far with all workarounds possible.
Here's the rough scenario:
We have data streaming in constantly from various offsite locations.
This comprises of several details: a session, a unit of that session,
and a measurement of that unit.
Data is captured and transferred on a timed process, so often data is
repeated in the data transfer packets.  This repeating is unavoidable as
the software used to capture dumps this data for each measurement and
each unit for the session it's working on.

Due to the volume, a bulk update is done using "values()" with an insert
statement.

Unfortunately, often there are repeats of either session, unit, or
measurement (as there could be new measurement or unit for the capture
that needs to be associated to the session).
The problem I've been experiencing is fairly straightforward (I hope): I
have two, sometimes three columns in any given record that need to
always be unique.  This comprises an ID (the key column), and one
(depending on the table, sometimes two) GUIDs which should be unique at
all times for the entire table.

I've tried setting the additional columns to be a primary key (which in
turn sets them up to be "unique" when viewed under "Schema Indices" on
the MySQL Administrator tool); however this does not give an error (or
fail silently with "insert ignore") when I insert a duplicate - mySQL
seems quite happy to add the duplicate record.

At the moment, I'm running a process in the beginning which simply
gathers all guids from the database and compares them as it runs through
the data (then adds new ones as it runs).. This is hardly reliable, and
also means starting the service would take several hours to gather the
existing guids at current data levels... almost frightening to think,
what will end up happening as the data expands.

It sounds like you need a separate primary key and unique index:

create table t (
id int not null,
guid char(32) not null,
unique key (guid),
primary key(id)
);

Then you can do REPLACE or IGNORE with the LOAD DATA INFILE. I can't tell if you are actually using LOAD DATA INFILE or if your "bulk load" is a big INSERT statement. If you're using an INSERT with multiple VALUES() sections, you can also use ON DUPLICATE KEY UPDATE.

I agree the current strategy won't hold up well over time.

Baron

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to