subject:"Re\: Removing Data Duplicacy"

Re: Removing Data Duplicacy

2012-02-23 Thread Johan De Meersman

- Original Message -

 From: Adarsh Sharma adarsh.sha...@orkash.com

 Thanks Johan, but i mentioned before that adding auto increment
 column doesn't solve the issue  it causes errors in the multi
 threaded application.

If it causes errors, you have other problems than this. 

 Multiple clients calls this insert procedure simultaneously, so it
 fails the transactions if two or more clients reads the same ID
 value.

Maybe I'm missing something, but both auto_increment and last_insert_id() are 
threadsafe, so the normal procedure for handling this is an auto_increment on 
table2.c_id, then 

* insert into table2 
* select last_insert_id() into page_id 
* insert into table1 using page_id 

This will guarantee both unique numbering *and* corresponding IDs without the 
need for locks or transactions. Transactions may be a good idea for other 
reasons, but that's another discussion. 

If you *really* will not add an auto_increment to table2, then create pk_table 
with an autoincrement and use that to grab the page_id. Forget this whole 
locking history, you do not need it with auto_increment and last_insert_id. 

 I need to insert row simultaneously with the same C_ID into two
 tables ( table 1  table 2).

There's no such thing as simultaneous. 

 Please let me know if the below solution is genuine or not, I update
 my procedure  create a table that contains only ID.

 UPDATE pk_table set c_id=LAST_INSERT_ID(c_id+1);

Nope, won't work, for the very simple reason that you're not using an 
auto_increment, so there simply *is* no last_insert_id(). Go read the 
documentation before you keep arguing, please. 

You've had roughly the same answer from several people who've been doing this 
stuff for ages. If you keep ignoring that advice, I'm simply going to ignore 
this whole thread until you come up with a damn good technical reason why an 
auto_increment isn't an option. We're trying to help you find a solution, but I 
do have better things to do than hit my head against a wall repeatedly. 

-- 
Bier met grenadyn 
Is als mosterd by den wyn 
Sy die't drinkt, is eene kwezel 
Hy die't drinkt, is ras een ezel

Re: Removing Data Duplicacy

2012-02-22 Thread Johan De Meersman

- Original Message -
 From: Adarsh Sharma adarsh.sha...@orkash.com
 
 Today I noticed some duplicacy in my c_id column which is not

Yes, that's what you get if you don't use auto_increments.

 I need multiple client select cid from 2 tables  insert data with
 adding 1 to previous C_id  in isolated manner.
 I think I need to use lock  statements in my procedure.

Locks will utterly kill your performance. It's already a lot lower than it 
could be because you're doing selects prior to every insert.

 Can anyone suggest how to solve the issue.

Kick the guy who writes the business requirements in the balls, hard. Then 
explain that some things are not to be messed with, and he can't get what he 
wants this time. Kick him again, then add an auto_increment. Make sure to use 
sharp-tipped shoes.

Failing that (damn his iron codpiece) you can look at cheating and faking the 
Oracle concept of sequences using a separate table with just an auto_increment 
field.

Drupal (the older versions like 5, at least) also does not use auto_increments 
on MySQL iirc, you could have a look at their code to see how they solved the 
problem. I suspect they also emulate sequences somehow.

If you *really* feel like reinventing the wheel, write a stored procedure to 
process your updates using something like the hot potato or bankers' algorithm.


-- 
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: Removing Data Duplicacy

2012-02-22 Thread Arthur Fuller

I agree with the testicular remedy, but in the case of the iron codpiece, I
can think of another approach which may work for you. It still uses Select,
but reads a one-row table, so it shouldn't hurt performance much. The table
serves no other purpose than storing the next available PK; call the table
NextPK, say. The algorithm might go like this:

1. Lock the table NextPK.
2. Select its value.
3. Update the column with current value + 1.
4. Unlock the table.
5. Do your inserts.

The lock will be very brief, perhaps brief enough to satisfy your
requirement.

-- 
Arthur
Cell: 647.710.1314

Only two businesses refer to their clientele as users: drug dealing and
software development.

-- Arthur Fuller

Re: Removing Data Duplicacy

2012-02-22 Thread Johnny Withers

You can also handle this with transactions:

CREATE TABLE `seq` (
  `seq_num` int(10) unsigned NOT NULL DEFAULT '1000'
) ENGINE=InnoDB DEFAULT CHARSET=latin1


#Initialize sequence numbers
INSERT INTO seq(seq_num) VALUES(1000);

#Get next sequence number
START TRANSACTION;

UPDATE seq SET seq_num=LAST_INSERT_ID(seq_num+1);

#Do other inserts into other tables with this sequence number

COMMIT;

#ROLLBACK if something fails

Other inserts will be blocked until this process is either COMMIT'd or
ROLLBACK'd, preventing you from using the same sequence # again. As soon as
the COMMIT or ROLLBACK occurs, the other transactions will continue.

-JW

On Wed, Feb 22, 2012 at 7:57 AM, Arthur Fuller fuller.art...@gmail.comwrote:

 I agree with the testicular remedy, but in the case of the iron codpiece, I
 can think of another approach which may work for you. It still uses Select,
 but reads a one-row table, so it shouldn't hurt performance much. The table
 serves no other purpose than storing the next available PK; call the table
 NextPK, say. The algorithm might go like this:

 1. Lock the table NextPK.
 2. Select its value.
 3. Update the column with current value + 1.
 4. Unlock the table.
 5. Do your inserts.

 The lock will be very brief, perhaps brief enough to satisfy your
 requirement.

 --
 Arthur
 Cell: 647.710.1314

 Only two businesses refer to their clientele as users: drug dealing and
 software development.

 -- Arthur Fuller




-- 
-
Johnny Withers
601.209.4985
joh...@pixelated.net

Re: Removing Data Duplicacy

2012-02-22 Thread Johnny Withers

I'm not sure, It seems to me the proper way to do would be to insert into
table1, get the insert ID, then insert into table2 using that ID, this is
pretty standard stuff.

Not sure why, in this case, he cannot do that.

-JW


On Wed, Feb 22, 2012 at 8:54 AM, Rhino rhi...@sympatico.ca wrote:

 I miised the first message in this thread but is there some compelling
 reason why you simply don't use a unique index on the primary key of the
 table to prevent duplicates in the first place?

 --
 Rhino


 On 2012-02-22 09:40, Johnny Withers wrote:

 You can also handle this with transactions:

 CREATE TABLE `seq` (
   `seq_num` int(10) unsigned NOT NULL DEFAULT '1000'
 ) ENGINE=InnoDB DEFAULT CHARSET=latin1


 #Initialize sequence numbers
 INSERT INTO seq(seq_num) VALUES(1000);

 #Get next sequence number
 START TRANSACTION;

 UPDATE seq SET seq_num=LAST_INSERT_ID(seq_**num+1);

 #Do other inserts into other tables with this sequence number

 COMMIT;

 #ROLLBACK if something fails

 Other inserts will be blocked until this process is either COMMIT'd or
 ROLLBACK'd, preventing you from using the same sequence # again. As soon
 as
 the COMMIT or ROLLBACK occurs, the other transactions will continue.

 -JW

 On Wed, Feb 22, 2012 at 7:57 AM, Arthur Fullerfuller.art...@gmail.com**
 wrote:

  I agree with the testicular remedy, but in the case of the iron
 codpiece, I
 can think of another approach which may work for you. It still uses
 Select,
 but reads a one-row table, so it shouldn't hurt performance much. The
 table
 serves no other purpose than storing the next available PK; call the
 table
 NextPK, say. The algorithm might go like this:

 1. Lock the table NextPK.
 2. Select its value.
 3. Update the column with current value + 1.
 4. Unlock the table.
 5. Do your inserts.

 The lock will be very brief, perhaps brief enough to satisfy your
 requirement.

 --
 Arthur
 Cell: 647.710.1314

 Only two businesses refer to their clientele as users: drug dealing and
 software development.

 -- Arthur Fuller






-- 
-
Johnny Withers
601.209.4985
joh...@pixelated.net

Re: Removing Data Duplicacy

2012-02-22 Thread Johan De Meersman

- Original Message -
 From: Johnny Withers joh...@pixelated.net
 
 I'm not sure, It seems to me the proper way to do would be to insert
 into table1, get the insert ID, then insert into table2 using that ID,
 this is pretty standard stuff.
 
 Not sure why, in this case, he cannot do that.

last_insert_id() only works on tables with an auto_increment column, which was 
specified is not there and cannot be added. I still favour the testicular 
remediation path.


-- 
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: Removing Data Duplicacy

2012-02-22 Thread Adarsh Sharma

Thanks Johan,  but i mentioned before that adding auto increment column 
doesn't solve the issue  it causes errors in the multi threaded 
application.
Multiple clients calls this insert procedure simultaneously, so it fails 
the transactions if two or more clients reads the same ID value.
I need to  insert row simultaneously with the same C_ID into two tables 
( table 1  table 2).


Please let me know if the below solution is genuine or not, I update my 
procedure  create a table that contains only ID.


*Solution 1 *: *Create a PK Table  Lock it when calling the Insert 
Procedure.*



create table *pk_table*( c_id bigint unsigned NOT NULL) ENGINE=InnoDB ;
insert into pk_table values (219846925);  ( Last ID inserted in 
the Table1  Table 2);



*Modified Insert Procedure :-*


CREATE DEFINER=`root`@`localhost` PROCEDURE `insert_table`(url_level 
tinyint,table2_size int,master_id bigint,table2_type varchar(50),
page_m_date varchar(70),anchortext varchar(150),url_heading 
varchar(250),page_url varchar(500),source mediumtext,content_data 
mediumtext,page_category varchar(150),table2_language 
varchar(10),url_title varchar(250), referred_url varchar(500), keyword 
varchar(500))


begin
declare page_id bigint;
start transaction;
*lock table pk_table write;*

select ifnull(max(c_id)+1,1) into page_id from *pk_table*;

insert ignore into table1
(c_id,link_level,content_size,url_id,content_type,page_modified_date,anchor_text,heading,crawled_page_url,source_code,title,referring_url)
values
(page_id,url_level,table2_size,master_id,table2_type,page_m_date,anchortext,ifnull(url_heading,anchortext),page_url,source,url_title,referred_url);

insert ignore into table2
(c_id,link_level,category,content_language,url_id,heading,crawled_page_url,content,publishing_date,keywords)
values
(page_id,url_level,page_category,table2_language,master_id,ifnull(url_heading,anchortext),page_url,content_data,page_m_date,keyword);

*UPDATE pk_table set c_id=LAST_INSERT_ID(c_id+1);*

commit;
unlock tables;

end

Best Regards.


Arthur Fuller wrote:
I agree with the testicular remedy, but in the case of the iron 
codpiece, I can think of another approach which may work for you. It 
still uses Select, but reads a one-row table, so it shouldn't hurt 
performance much. The table serves no other purpose than storing the 
next available PK; call the table NextPK, say. The algorithm might go 
like this:


1. Lock the table NextPK.
2. Select its value.
3. Update the column with current value + 1.
4. Unlock the table.
5. Do your inserts.

The lock will be very brief, perhaps brief enough to satisfy your 
requirement.


--
Arthur
Cell: 647.710.1314

Only two businesses refer to their clientele as users: drug dealing 
and software development.


-- Arthur Fuller

Re: Removing Data Duplicacy

Re: Removing Data Duplicacy

Re: Removing Data Duplicacy

Re: Removing Data Duplicacy

Re: Removing Data Duplicacy

Re: Removing Data Duplicacy

Re: Removing Data Duplicacy

7 matches

Site Navigation

Mail list logo

Footer information