Re: load data into 2 tables and set id

2004-06-25 Thread J S
Hi Shawn,
I wondered if you might be able to help me with an SQL query.
I want to list all the internet sites I've surfed in my database.
Here's a query that matches the url with a urlid:
SELECT concat(usc.scheme,://,us.server,up.path)
FROM url_visit uv
INNER JOIN url_servers us
   ON us.id=uv.url_server_ID
INNER JOIN url_paths up
   ON up.id=uv.url_path_ID
INNER JOIN url_queries uq
   ON uq.id=uv.url_query_ID
INNER JOIN url_schemes usc
   ON usc.id=uv.url_scheme_ID
AND uv.urlid=10023;
And that works really quickly.
However I want to run this for each urlid matching my uid in table 
internet_usage.

mysql desc internet_usage;
+---+--+--+-+-+---+
| Field | Type | Null | Key | Default | Extra |
+---+--+--+-+-+---+
| uid   | varchar(10)  | YES  | MUL | NULL|   |
| time  | datetime | YES  | | NULL|   |
| ip| bigint(20)   | YES  | | 0   |   |
| urlid | mediumint(9) |  | | 0   |   |
| size  | int(11)  | YES  | | 0   |   |
+---+--+--+-+-+---+
So maybe it's something like:
SELECT iu.time, INET_NTOA(iu.ip), concat(usc.scheme,://,us.server,up.path)
FROM internet_usage us, url_visit uv
WHERE iu.uid=u752352
INNER JOIN url_servers us
   ON us.id=uv.url_server_ID
INNER JOIN url_paths up
   ON up.id=uv.url_path_ID
INNER JOIN url_queries uq
   ON uq.id=uv.url_query_ID
INNER JOIN url_schemes usc
   ON usc.id=uv.url_scheme_ID
But I haven't tried this because my tables are very big and I didn't want to 
do the wrong join!

Thanks for any help you can offer.
js.
_
It's fast, it's easy and it's free. Get MSN Messenger today! 
http://www.msn.co.uk/messenger

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: load data into 2 tables and set id

2004-06-25 Thread J S
Figured it out! Took a gamble and run the below command!
SELECT iu.time, INET_NTOA(iu.ip), concat(usc.scheme,://,us.server,up.path)
FROM url_visit uv
INNER JOIN internet_usage iu
   ON iu.urlid=uv.urlid
INNER JOIN url_servers us
   ON us.id=uv.url_server_ID
INNER JOIN url_paths up
   ON up.id=uv.url_path_ID
INNER JOIN url_queries uq
   ON uq.id=uv.url_query_ID
INNER JOIN url_schemes usc
   ON usc.id=uv.url_scheme_ID
WHERE iu.uid=u752359;
Hi Shawn,
I wondered if you might be able to help me with an SQL query.
I want to list all the internet sites I've surfed in my database.
Here's a query that matches the url with a urlid:
SELECT concat(usc.scheme,://,us.server,up.path)
FROM url_visit uv
INNER JOIN url_servers us
   ON us.id=uv.url_server_ID
INNER JOIN url_paths up
   ON up.id=uv.url_path_ID
INNER JOIN url_queries uq
   ON uq.id=uv.url_query_ID
INNER JOIN url_schemes usc
   ON usc.id=uv.url_scheme_ID
AND uv.urlid=10023;
And that works really quickly.
However I want to run this for each urlid matching my uid in table 
internet_usage.

mysql desc internet_usage;
+---+--+--+-+-+---+
| Field | Type | Null | Key | Default | Extra |
+---+--+--+-+-+---+
| uid   | varchar(10)  | YES  | MUL | NULL|   |
| time  | datetime | YES  | | NULL|   |
| ip| bigint(20)   | YES  | | 0   |   |
| urlid | mediumint(9) |  | | 0   |   |
| size  | int(11)  | YES  | | 0   |   |
+---+--+--+-+-+---+
So maybe it's something like:
SELECT iu.time, INET_NTOA(iu.ip), 
concat(usc.scheme,://,us.server,up.path)
FROM internet_usage us, url_visit uv
WHERE iu.uid=u752352
INNER JOIN url_servers us
   ON us.id=uv.url_server_ID
INNER JOIN url_paths up
   ON up.id=uv.url_path_ID
INNER JOIN url_queries uq
   ON uq.id=uv.url_query_ID
INNER JOIN url_schemes usc
   ON usc.id=uv.url_scheme_ID

But I haven't tried this because my tables are very big and I didn't want 
to do the wrong join!

Thanks for any help you can offer.
js.
_
It's fast, it's easy and it's free. Get MSN Messenger today! 
http://www.msn.co.uk/messenger

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
_
Want to block unwanted pop-ups? Download the free MSN Toolbar now!  
http://toolbar.msn.co.uk/

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: load data into 2 tables and set id

2004-06-23 Thread J S
Shawn,
I uncovered a problem this morning. I wonder if you (or anyone else) can 
help me out again?

mysql select * from url_visit where url_scheme_ID=3 limit 10;
+-+---+---+-+---+--+
| urlid   | url_scheme_ID | url_server_ID | url_path_ID | query | category |
+-+---+---+-+---+--+
|   23392 | 3 |  1070 |   1 | NULL  | none |
| 1346269 | 3 |  1070 |   1 | NULL  | none |
+-+---+---+-+---+--+
2 rows in set (0.00 sec)
This is the insert statement I'm using:
INSERT IGNORE url_visit (url_scheme_ID, url_server_ID, url_path_ID, query, 
category)
SELECT DISTINCT uc.ID, us.ID, 
up.ID,bt.query,bt.category
FROM bulk_table bt
INNER JOIN url_servers us ON us.server=bt.server
INNER JOIN url_paths up ON up.path=bt.path
INNER JOIN url_schemes uc ON uc.scheme=bt.scheme

Do I  need brackets after the distinct? e.g
SELECT DISTINCT (uc.ID, us.ID, up.ID,bt.query,bt.category)
Thanks,
js.

Great catch! I believe you found the problem. Sorry about that!   ;-D
So... just curious... from the original 60GB of text data, how much space
did the final data require (including indices)?
Yours,
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

  J S
  [EMAIL PROTECTED]To:   [EMAIL PROTECTED]
  com cc:   
[EMAIL PROTECTED]
   Fax to:
  06/22/2004 07:55 Subject:  Re: load data 
into 2 tables and set id
  AM



I think I fixed it!
INSERT IGNORE url_visit (url_server_ID, url_path_ID, querystring,
category)
SELECT DISTINCT us.ID, up.ID, if(bt.path_split 0, SUBSTRING(bt.url,path),
NULL),
bt.category
FROM bulk_table bt
INNER JOIN url_servers us
   ON us.server = bt.server
INNER JOIN url_paths up
   on up.path=bt.path
After this I ran the sql to insert into internet_usage and this time it
finished after 10m32.
I'll do some more testing and let you know how it goes.
Thanks,
js.

Did you mean there to be duplicates in the url_visits? Do I need to use
IGNORE in the following SQL?

INSERT url_visit (url_server_ID, url_path_ID, querystring,
category)
SELECT us.ID, up.ID, if(bt.path_split 0, SUBSTRING(bt.url,path),
NULL),
bt.category
FROM bulk_table bt
INNER JOIN url_servers us
   ON us.server = bt.server
INNER JOIN url_paths up
   on up.path=bt.path

js.

mysql select * from url_visit where urlid=1631;
+---+---+-++--+
| urlid | url_server_ID | url_path_ID | query

 | category |
+---+---+-++--+
|  1631 | 21720 |  630695 |
cid=mrkbakisrc=catcmrkdswi|p1|rshop/catcmrkseai|p1|rshop/catcmrkhori|p1|rshop/catcmrki|p1|rshop/catcripa|p1|rgift
| Online Sales |
+---+---+-++--+
1 row in set (0.01 sec)

mysql select * from url_visit where urlid=1753;
+---+---+-++--+
| urlid | url_server_ID | url_path_ID | query

 | category |
+---+---+-++--+
|  1753 | 21720 |  630695 |
cid=mrkbakisrc=catcmrkdswi|p1|rshop/catcmrkseai|p1|rshop/catcmrkhori|p1|rshop/catcmrki|p1|rshop/catcripa|p1|rgift
| Online Sales |
+---+---+-++--+
1 row in set (0.00 sec)


Shawn,

Thanks for your reply below. I found it extremely useful. I have 
followed

your instructions and got good results up to the url_visits table.

I have a perl script to parse the values out of the log. The log has
3,770,246 lines and is gzipped. I then applied your SQL statements with
the following times:

1. Parse script: 10m31
2. Load infile: 2m39
3. Insert url_servers and url_paths: 5m38
4. Insert url_visits 9m50

However when I ran the sql to insert the internet usage table, it just
ran
until all the log space was filled up:

-rw-rw   1 mysqlmysql379694680 Jun 22 10:05 bulk_table.MYD

Re: load data into 2 tables and set id

2004-06-23 Thread SGreen

J S,

Check to see if the url_scheme_ID part of a unique constraint/key or the
Primary Key? If it isn't, you will get dupes even if you use INSERT IGNORE
unless you pre-screen your INSERTs some other way to avoid duplication.

Yours,
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine





   

  J S

  [EMAIL PROTECTED]To:   [EMAIL PROTECTED]
 
  com cc:   [EMAIL PROTECTED] 

   Fax to: 

  06/23/2004 04:13 Subject:  Re: load data into 2 tables 
and set id
  AM   

   

   





Shawn,

I uncovered a problem this morning. I wonder if you (or anyone else) can
help me out again?

mysql select * from url_visit where url_scheme_ID=3 limit 10;
+-+---+---+-+---+--+

| urlid   | url_scheme_ID | url_server_ID | url_path_ID | query | category
|
+-+---+---+-+---+--+

|   23392 | 3 |  1070 |   1 | NULL  | none
|
| 1346269 | 3 |  1070 |   1 | NULL  | none
|
+-+---+---+-+---+--+

2 rows in set (0.00 sec)

This is the insert statement I'm using:

INSERT IGNORE url_visit (url_scheme_ID, url_server_ID, url_path_ID, query,
category)
 SELECT DISTINCT uc.ID, us.ID,
up.ID,bt.query,bt.category
 FROM bulk_table bt
 INNER JOIN url_servers us ON us.server=bt.server
 INNER JOIN url_paths up ON up.path=bt.path
 INNER JOIN url_schemes uc ON uc.scheme=bt.scheme

Do I  need brackets after the distinct? e.g
SELECT DISTINCT (uc.ID, us.ID, up.ID,bt.query,bt.category)

Thanks,

js.



Great catch! I believe you found the problem. Sorry about that!   ;-D

So... just curious... from the original 60GB of text data, how much space
did the final data require (including indices)?

Yours,
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine




   J S
   [EMAIL PROTECTED]To:
[EMAIL PROTECTED]
   com cc:
[EMAIL PROTECTED]
Fax to:
   06/22/2004 07:55 Subject:  Re: load data
into 2 tables and set id
   AM






I think I fixed it!

INSERT IGNORE url_visit (url_server_ID, url_path_ID, querystring,
category)
SELECT DISTINCT us.ID, up.ID, if(bt.path_split 0, SUBSTRING(bt.url,path),
NULL),
bt.category
FROM bulk_table bt
INNER JOIN url_servers us
ON us.server = bt.server
INNER JOIN url_paths up
on up.path=bt.path

After this I ran the sql to insert into internet_usage and this time it
finished after 10m32.

I'll do some more testing and let you know how it goes.

Thanks,

js.
 
 Did you mean there to be duplicates in the url_visits? Do I need to use
 IGNORE in the following SQL?
 
 INSERT url_visit (url_server_ID, url_path_ID, querystring,
 category)
 SELECT us.ID, up.ID, if(bt.path_split 0, SUBSTRING(bt.url,path),
 NULL),
 bt.category
 FROM bulk_table bt
 INNER JOIN url_servers us
ON us.server = bt.server
 INNER JOIN url_paths up
on up.path=bt.path
 
 js.
 
 mysql select * from url_visit where urlid=1631;

+---+---+-++--+


 | urlid | url_server_ID | url_path_ID | query

 

  | category |

+---+---+-++--+


 |  1631 | 21720 |  630695 |

cid=mrkbakisrc=catcmrkdswi|p1|rshop/catcmrkseai|p1|rshop/catcmrkhori|p1|rshop/catcmrki|p1|rshop/catcripa|p1|rgift


 | Online Sales |

+---+---+-++--+


 1 row in set (0.01 sec

Re: load data into 2 tables and set id

2004-06-23 Thread J S
Hi Shawn,
Here's the url_Schemes table (it's the same as the url_paths and 
url_servers). This means url_scheme_ID is part of a unique constraint/key ?

mysql desc url_schemes;
++-+--+-+-++
| Field  | Type| Null | Key | Default | Extra  |
++-+--+-+-++
| ID | int(11) |  | MUL | NULL| auto_increment |
| scheme | varchar(20) |  | PRI | ||
++-+--+-+-++
2 rows in set (0.00 sec)
mysql desc url_visit;
+---+--+--+-+-++
| Field | Type | Null | Key | Default | Extra  |
+---+--+--+-+-++
| urlid | mediumint(9) |  | PRI | NULL| auto_increment |
| url_scheme_ID | int(11)  |  | MUL | 0   ||
| url_server_ID | int(11)  |  | | 0   ||
| url_path_ID   | int(11)  |  | | 0   ||
| query | text | YES  | | NULL||
| category  | varchar(50)  | YES  | | NULL||
+---+--+--+-+-++
6 rows in set (0.00 sec)
mysql select * from url_schemes;
+++
| ID | scheme |
+++
|  1 | http   |
|  2 | tcp|
|  3 | -  |
|  4 | ftp|
|  5 | https  |
+++
5 rows in set (0.00 sec)

J S,
Check to see if the url_scheme_ID part of a unique constraint/key or the
Primary Key? If it isn't, you will get dupes even if you use INSERT IGNORE
unless you pre-screen your INSERTs some other way to avoid duplication.
Yours,
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine


  J S
  [EMAIL PROTECTED]To:   [EMAIL PROTECTED]
  com cc:   
[EMAIL PROTECTED]
   Fax to:
  06/23/2004 04:13 Subject:  Re: load data 
into 2 tables and set id
  AM



Shawn,
I uncovered a problem this morning. I wonder if you (or anyone else) can
help me out again?
mysql select * from url_visit where url_scheme_ID=3 limit 10;
+-+---+---+-+---+--+
| urlid   | url_scheme_ID | url_server_ID | url_path_ID | query | category
|
+-+---+---+-+---+--+
|   23392 | 3 |  1070 |   1 | NULL  | none
|
| 1346269 | 3 |  1070 |   1 | NULL  | none
|
+-+---+---+-+---+--+
2 rows in set (0.00 sec)
This is the insert statement I'm using:
INSERT IGNORE url_visit (url_scheme_ID, url_server_ID, url_path_ID, query,
category)
 SELECT DISTINCT uc.ID, us.ID,
up.ID,bt.query,bt.category
 FROM bulk_table bt
 INNER JOIN url_servers us ON us.server=bt.server
 INNER JOIN url_paths up ON up.path=bt.path
 INNER JOIN url_schemes uc ON uc.scheme=bt.scheme
Do I  need brackets after the distinct? e.g
SELECT DISTINCT (uc.ID, us.ID, up.ID,bt.query,bt.category)
Thanks,
js.


Great catch! I believe you found the problem. Sorry about that!   ;-D

So... just curious... from the original 60GB of text data, how much space
did the final data require (including indices)?

Yours,
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine




   J S
   [EMAIL PROTECTED]To:
[EMAIL PROTECTED]
   com cc:
[EMAIL PROTECTED]
Fax to:
   06/22/2004 07:55 Subject:  Re: load data
into 2 tables and set id
   AM






I think I fixed it!

INSERT IGNORE url_visit (url_server_ID, url_path_ID, querystring,
category)
SELECT DISTINCT us.ID, up.ID, if(bt.path_split 0, 
SUBSTRING(bt.url,path),
NULL),
bt.category
FROM bulk_table bt
INNER JOIN url_servers us
ON us.server = bt.server
INNER JOIN url_paths up
on up.path=bt.path

After this I ran the sql to insert into internet_usage and this time it
finished after 10m32.

I'll do some more testing and let you know how it goes.

Thanks,

js.
 
 Did you mean there to be duplicates in the url_visits? Do I need to use
 IGNORE in the following SQL?
 
 INSERT url_visit (url_server_ID, url_path_ID, querystring,
 category)
 SELECT us.ID, up.ID, if(bt.path_split 0, SUBSTRING(bt.url,path),
 NULL),
 bt.category
 FROM bulk_table bt
 INNER JOIN url_servers us
ON us.server = bt.server
 INNER JOIN url_paths up
on up.path=bt.path
 
 js.
 
 mysql select * from

Re: load data into 2 tables and set id

2004-06-23 Thread Michael Stassen
No, url_scheme_ID has key type MUL, which means that that multiple 
occurences of a given value are allowed within the field.

To prevent duplicate entries in url_visit, decide which combination of 
columns should have no duplicates, then add a unique index on that combination.

Michael
J S wrote:
Hi Shawn,
Here's the url_Schemes table (it's the same as the url_paths and 
url_servers). This means url_scheme_ID is part of a unique constraint/key ?

mysql desc url_schemes;
++-+--+-+-++
| Field  | Type| Null | Key | Default | Extra  |
++-+--+-+-++
| ID | int(11) |  | MUL | NULL| auto_increment |
| scheme | varchar(20) |  | PRI | ||
++-+--+-+-++
2 rows in set (0.00 sec)
mysql desc url_visit;
+---+--+--+-+-++
| Field | Type | Null | Key | Default | Extra  |
+---+--+--+-+-++
| urlid | mediumint(9) |  | PRI | NULL| auto_increment |
| url_scheme_ID | int(11)  |  | MUL | 0   ||
| url_server_ID | int(11)  |  | | 0   ||
| url_path_ID   | int(11)  |  | | 0   ||
| query | text | YES  | | NULL||
| category  | varchar(50)  | YES  | | NULL||
+---+--+--+-+-++
6 rows in set (0.00 sec)
mysql select * from url_schemes;
+++
| ID | scheme |
+++
|  1 | http   |
|  2 | tcp|
|  3 | -  |
|  4 | ftp|
|  5 | https  |
+++
5 rows in set (0.00 sec)

J S,
Check to see if the url_scheme_ID part of a unique constraint/key or the
Primary Key? If it isn't, you will get dupes even if you use INSERT 
IGNORE
unless you pre-screen your INSERTs some other way to avoid duplication.

Yours,
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine


  J S
  [EMAIL PROTECTED]To:   
[EMAIL PROTECTED]
  com cc:   
[EMAIL PROTECTED]
   Fax to:
  06/23/2004 04:13 Subject:  Re: load data 
into 2 tables and set id
  AM



Shawn,
I uncovered a problem this morning. I wonder if you (or anyone else) can
help me out again?
mysql select * from url_visit where url_scheme_ID=3 limit 10;
+-+---+---+-+---+--+ 

| urlid   | url_scheme_ID | url_server_ID | url_path_ID | query | 
category
|
+-+---+---+-+---+--+ 

|   23392 | 3 |  1070 |   1 | NULL  | none
|
| 1346269 | 3 |  1070 |   1 | NULL  | none
|
+-+---+---+-+---+--+ 

2 rows in set (0.00 sec)
This is the insert statement I'm using:
INSERT IGNORE url_visit (url_scheme_ID, url_server_ID, url_path_ID, 
query,
category)
 SELECT DISTINCT uc.ID, us.ID,
up.ID,bt.query,bt.category
 FROM bulk_table bt
 INNER JOIN url_servers us ON us.server=bt.server
 INNER JOIN url_paths up ON up.path=bt.path
 INNER JOIN url_schemes uc ON uc.scheme=bt.scheme

Do I  need brackets after the distinct? e.g
SELECT DISTINCT (uc.ID, us.ID, up.ID,bt.query,bt.category)
Thanks,
js.


Great catch! I believe you found the problem. Sorry about that!   ;-D

So... just curious... from the original 60GB of text data, how much 
space
did the final data require (including indices)?

Yours,
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine




   J S
   [EMAIL PROTECTED]To:
[EMAIL PROTECTED]
   com cc:
[EMAIL PROTECTED]
Fax to:
   06/22/2004 07:55 Subject:  Re: load data
into 2 tables and set id
   AM






I think I fixed it!

INSERT IGNORE url_visit (url_server_ID, url_path_ID, querystring,
category)
SELECT DISTINCT us.ID, up.ID, if(bt.path_split 0, 
SUBSTRING(bt.url,path),
NULL),
bt.category
FROM bulk_table bt
INNER JOIN url_servers us
ON us.server = bt.server
INNER JOIN url_paths up
on up.path=bt.path

After this I ran the sql to insert into internet_usage and this time it
finished after 10m32.

I'll do some more testing and let you know how it goes.

Thanks,

js.
 
 Did you mean there to be duplicates in the url_visits? Do I need to 
use
 IGNORE in the following SQL?
 
 INSERT

Re: load data into 2 tables and set id

2004-06-23 Thread SGreen

I can't tell from a DESCRIBE output. Please post the results of  SHOW
CREATE TABLE url_visit;

Thanks!
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine


   

  J S

  [EMAIL PROTECTED]To:   [EMAIL PROTECTED]
 
  com cc:   [EMAIL PROTECTED] 

   Fax to: 

  06/23/2004 09:57 Subject:  Re: load data into 2 tables 
and set id
  AM   

   

   





Hi Shawn,

Here's the url_Schemes table (it's the same as the url_paths and
url_servers). This means url_scheme_ID is part of a unique constraint/key ?

mysql desc url_schemes;
++-+--+-+-++
| Field  | Type| Null | Key | Default | Extra  |
++-+--+-+-++
| ID | int(11) |  | MUL | NULL| auto_increment |
| scheme | varchar(20) |  | PRI | ||
++-+--+-+-++
2 rows in set (0.00 sec)

mysql desc url_visit;
+---+--+--+-+-++
| Field | Type | Null | Key | Default | Extra  |
+---+--+--+-+-++
| urlid | mediumint(9) |  | PRI | NULL| auto_increment |
| url_scheme_ID | int(11)  |  | MUL | 0   ||
| url_server_ID | int(11)  |  | | 0   ||
| url_path_ID   | int(11)  |  | | 0   ||
| query | text | YES  | | NULL||
| category  | varchar(50)  | YES  | | NULL||
+---+--+--+-+-++
6 rows in set (0.00 sec)

mysql select * from url_schemes;
+++
| ID | scheme |
+++
|  1 | http   |
|  2 | tcp|
|  3 | -  |
|  4 | ftp|
|  5 | https  |
+++
5 rows in set (0.00 sec)


-- previous responses clipped for space -




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: load data into 2 tables and set id

2004-06-23 Thread J S
Sorry! I'm not that experienced with databases as you probably realised!
url_visit | CREATE TABLE `url_visit` (
 `urlid` mediumint(9) NOT NULL auto_increment,
 `url_scheme_ID` int(11) NOT NULL default '0',
 `url_server_ID` int(11) NOT NULL default '0',
 `url_path_ID` int(11) NOT NULL default '0',
 `query` text,
 `category` varchar(50) default NULL,
 PRIMARY KEY  (`urlid`),
 KEY `url_scheme_ID` (`url_scheme_ID`,`url_server_ID`,`url_path_ID`)
) TYPE=MyISAM

I can't tell from a DESCRIBE output. Please post the results of  SHOW
CREATE TABLE url_visit;
Thanks!
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

  J S
  [EMAIL PROTECTED]To:   [EMAIL PROTECTED]
  com cc:   
[EMAIL PROTECTED]
   Fax to:
  06/23/2004 09:57 Subject:  Re: load data 
into 2 tables and set id
  AM



Hi Shawn,
Here's the url_Schemes table (it's the same as the url_paths and
url_servers). This means url_scheme_ID is part of a unique constraint/key ?
mysql desc url_schemes;
++-+--+-+-++
| Field  | Type| Null | Key | Default | Extra  |
++-+--+-+-++
| ID | int(11) |  | MUL | NULL| auto_increment |
| scheme | varchar(20) |  | PRI | ||
++-+--+-+-++
2 rows in set (0.00 sec)
mysql desc url_visit;
+---+--+--+-+-++
| Field | Type | Null | Key | Default | Extra  |
+---+--+--+-+-++
| urlid | mediumint(9) |  | PRI | NULL| auto_increment |
| url_scheme_ID | int(11)  |  | MUL | 0   ||
| url_server_ID | int(11)  |  | | 0   ||
| url_path_ID   | int(11)  |  | | 0   ||
| query | text | YES  | | NULL||
| category  | varchar(50)  | YES  | | NULL||
+---+--+--+-+-++
6 rows in set (0.00 sec)
mysql select * from url_schemes;
+++
| ID | scheme |
+++
|  1 | http   |
|  2 | tcp|
|  3 | -  |
|  4 | ftp|
|  5 | https  |
+++
5 rows in set (0.00 sec)

-- previous responses clipped for space -

_
It's fast, it's easy and it's free. Get MSN Messenger today! 
http://www.msn.co.uk/messenger

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: load data into 2 tables and set id

2004-06-23 Thread SGreen

Do this to prevent duplication on those three columns in the future:

ALTER TABLE url_visit DROP KEY `url_scheme_ID`, add UNIQUE KEY
(url_scheme_ID, url_server_ID, url_path_id);

The way I have composed that key (table-column order), it will force you to
include the url_scheme_ID if you want to use the index to find
url_server_id and url_path_id. Maybe a better Idea is to organize that new
key so that the columns are listed in their order of prevalence in your
queries, if you search by url_sever_ID most often, list it first. If
url_scheme_ID is not something you need as often put it last. That
changes the statement to look like:

ALTER TABLE url_visit DROP KEY `url_scheme_ID`, add UNIQUE KEY
(url_server_ID, url_path_ID, url_scheme_ID);

This way you can search on {url_server_ID},  {url_server_ID, url_path_ID},
or (url_server_ID, url_path_ID, url_scheme_ID} and MySQL will still use the
index. Because it's designated as UNIQUE key, there will always be at most
1 record with any combination of those three values.

Yours,
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine



   

  J S

  [EMAIL PROTECTED]To:   [EMAIL PROTECTED]
 
  com cc:   [EMAIL PROTECTED] 

   Fax to: 

  06/23/2004 10:38 Subject:  Re: load data into 2 tables 
and set id
  AM   

   

   





Sorry! I'm not that experienced with databases as you probably realised!

url_visit | CREATE TABLE `url_visit` (
  `urlid` mediumint(9) NOT NULL auto_increment,
  `url_scheme_ID` int(11) NOT NULL default '0',
  `url_server_ID` int(11) NOT NULL default '0',
  `url_path_ID` int(11) NOT NULL default '0',
  `query` text,
  `category` varchar(50) default NULL,
  PRIMARY KEY  (`urlid`),
  KEY `url_scheme_ID` (`url_scheme_ID`,`url_server_ID`,`url_path_ID`)
) TYPE=MyISAM



I can't tell from a DESCRIBE output. Please post the results of  SHOW
CREATE TABLE url_visit;

Thanks!
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine



   J S
   [EMAIL PROTECTED]To:
[EMAIL PROTECTED]
   com cc:
[EMAIL PROTECTED]
Fax to:
   06/23/2004 09:57 Subject:  Re: load data
into 2 tables and set id
   AM






Hi Shawn,

Here's the url_Schemes table (it's the same as the url_paths and
url_servers). This means url_scheme_ID is part of a unique constraint/key
?

mysql desc url_schemes;
++-+--+-+-++
| Field  | Type| Null | Key | Default | Extra  |
++-+--+-+-++
| ID | int(11) |  | MUL | NULL| auto_increment |
| scheme | varchar(20) |  | PRI | ||
++-+--+-+-++
2 rows in set (0.00 sec)

mysql desc url_visit;
+---+--+--+-+-++
| Field | Type | Null | Key | Default | Extra  |
+---+--+--+-+-++
| urlid | mediumint(9) |  | PRI | NULL| auto_increment |
| url_scheme_ID | int(11)  |  | MUL | 0   ||
| url_server_ID | int(11)  |  | | 0   ||
| url_path_ID   | int(11)  |  | | 0   ||
| query | text | YES  | | NULL||
| category  | varchar(50)  | YES  | | NULL||
+---+--+--+-+-++
6 rows in set (0.00 sec)

mysql select * from url_schemes;
+++
| ID | scheme |
+++
|  1 | http   |
|  2 | tcp|
|  3 | -  |
|  4 | ftp|
|  5 | https  |
+++
5 rows in set (0.00 sec)

 
-- previous responses clipped for space -




_
It's fast, it's easy and it's free. Get MSN Messenger today!
http://www.msn.co.uk/messenger







-- 
MySQL

Re: load data into 2 tables and set id

2004-06-22 Thread J S
Shawn,
Thanks for your reply below. I found it extremely useful. I have followed 
your instructions and got good results up to the url_visits table.

I have a perl script to parse the values out of the log. The log has 
3,770,246 lines and is gzipped. I then applied your SQL statements with the 
following times:

1. Parse script: 10m31
2. Load infile: 2m39
3. Insert url_servers and url_paths: 5m38
4. Insert url_visits 9m50
However when I ran the sql to insert the internet usage table, it just ran 
until all the log space was filled up:

-rw-rw   1 mysqlmysql379694680 Jun 22 10:05 bulk_table.MYD   
 
-rw-rw   1 mysqlmysql   1024 Jun 22 10:05 bulk_table.MYI
-rw-rw   1 mysqlmysql   8760 Jun 22 09:59 bulk_table.frm
-rw-rw   1 mysqlmysql2114977792 Jun 22 11:11 internet_usage.MYD  
   
-rw-rw   1 mysqlmysql   1024 Jun 22 10:34 internet_usage.MYI
-rw-rw   1 mysqlmysql   8646 Jun 22 09:59 internet_usage.frm
-rw-rw   1 mysqlmysql33437600 Jun 22 10:23 url_paths.MYD
-rw-rw   1 mysqlmysql27696128 Jun 22 10:23 url_paths.MYI
-rw-rw   1 mysqlmysql   8574 Jun 22 09:59 url_paths.frm
-rw-rw   1 mysqlmysql 646160 Jun 22 10:19 url_servers.MYD
-rw-rw   1 mysqlmysql 796672 Jun 22 10:19 url_servers.MYI
-rw-rw   1 mysqlmysql   8578 Jun 22 09:59 url_servers.frm
-rw-rw   1 mysqlmysql119076844 Jun 22 10:32 url_visit.MYD
-rw-rw   1 mysqlmysql73026560 Jun 22 10:33 url_visit.MYI
-rw-rw   1 mysqlmysql   8694 Jun 22 09:59 url_visit.frm

I'm sure something's not right because the internet Usage table is bigger 
than the bulk table. I changed your sql a bit (on the last line with the 
query). Could this be the problem?

INSERT internet_usage (uid,time,ip,urlid,size)
SELECT bt.user, bt.time, bt.ip, uv.urlid, bt.size
FROM bulk_table bt
INNER JOIN url_servers us
   ON us.server=bt.server
INNER JOIN url_paths up
   ON up.path=bt.path
INNER JOIN url_visit uv
   ON uv.url_server_ID=us.ID
   AND uv.url_path_ID=up.ID
   AND uv.query=bt.query;
Thanks again,
js.
 welcome to a basic overview of bulk importing and normalizing as
you go 
[ author's note: if you are seeing this thread for the first time and
certain items seem to be
introduced out of context, please review all previous posts in this thread.
There has been
a lot of information already exchanged on this topic that I clipped
out of this response. Thanks! -- SG]
In an earlier post you said
The data values I have for each record are:

user_id   date_time   size   url   category
for example:

u752359   2004-04-02 12:33:04   3403   http://www.mysql.com/index.html
business

To me that implies that you have a standard text log where each field is
separated by a space and rows are delimited by a CRLF pair. To get that log
into MySQL you will need to do something _like_ (I say like as you will
most likely need to tweak it to accommodate your actual data)
CREATE TABLE bulk_table (
  user varchar(10)
  , eventdate date
  , eventtime time
  , size int
  , url text
  , category varchar(50)
);
LOAD DATA INFILE a proxy log file INTO bulk_table FIELDS TERMINATED BY '
' LINES TERMINATED BY '\r\n';
Now we can start merging the logs into your data tables.
To quote RFC 2396:
   The URI syntax does not require that the scheme-specific-part have
   any general structure or set of semantics which is common among all
   URI.  However, a subset of URI do share a common syntax for
   representing hierarchical relationships within the namespace.  This
   generic URI syntax consists of a sequence of four main components:
  scheme://authoritypath?query
   each of which, except scheme, may be absent from a particular URI.
That translates into 4 logical pieces we can split a URL into:
1) the scheme -- HTTP, FTP, GOPHER, etc...
2) the authority -- that is the server (www.yahoo.com) and any port numbers
or login information
3) the path -- /somefolder/somefile.whatever
4) the query -- everything after the ?
Not part of the generic URI is that bit known as a fragment (as identified
by the #) it is ALWAYS at the end of the _entire URL_ (including the query)
when it's used. I have always lumped those and queries into the same field.
So you could create tables for each of those parts and get VERY normalized
or you can partly normalize like this:
ALTER TABLE bulk_table add server_split int not null default 0
  , add path_split int not null default 0
  , add server varchar(255)
  , add path varchar(255)
UPDATE bulk_table
set server_split = LOCATE('/', URL , 8)-1;
UPDATE bulk_table
SET path_split = if(LOCATE('?', URL, server_split) 0, LOCATE('?', URL,
server_split),  LOCATE('#', URL, server_split)-1);
UPDATE bulk_table
set server=LEFT(URL, server_split )
#those 4 new columns helped us to parse out the 3 major parts of the url
#I added them to the table so 

Re: load data into 2 tables and set id

2004-06-22 Thread J S
Did you mean there to be duplicates in the url_visits? Do I need to use 
IGNORE in the following SQL?

INSERT url_visit (url_server_ID, url_path_ID, querystring,
category)
SELECT us.ID, up.ID, if(bt.path_split 0, SUBSTRING(bt.url,path),
NULL),
bt.category
FROM bulk_table bt
INNER JOIN url_servers us
  ON us.server = bt.server
INNER JOIN url_paths up
  on up.path=bt.path
js.
mysql select * from url_visit where urlid=1631;
+---+---+-++--+
| urlid | url_server_ID | url_path_ID | query

  | category |
+---+---+-++--+
|  1631 | 21720 |  630695 | 
cid=mrkbakisrc=catcmrkdswi|p1|rshop/catcmrkseai|p1|rshop/catcmrkhori|p1|rshop/catcmrki|p1|rshop/catcripa|p1|rgift 
| Online Sales |
+---+---+-++--+
1 row in set (0.01 sec)

mysql select * from url_visit where urlid=1753;
+---+---+-++--+
| urlid | url_server_ID | url_path_ID | query

  | category |
+---+---+-++--+
|  1753 | 21720 |  630695 | 
cid=mrkbakisrc=catcmrkdswi|p1|rshop/catcmrkseai|p1|rshop/catcmrkhori|p1|rshop/catcmrki|p1|rshop/catcripa|p1|rgift 
| Online Sales |
+---+---+-++--+
1 row in set (0.00 sec)

Shawn,
Thanks for your reply below. I found it extremely useful. I have followed 
your instructions and got good results up to the url_visits table.

I have a perl script to parse the values out of the log. The log has 
3,770,246 lines and is gzipped. I then applied your SQL statements with the 
following times:

1. Parse script: 10m31
2. Load infile: 2m39
3. Insert url_servers and url_paths: 5m38
4. Insert url_visits 9m50
However when I ran the sql to insert the internet usage table, it just ran 
until all the log space was filled up:

-rw-rw   1 mysqlmysql379694680 Jun 22 10:05 bulk_table.MYD  
  
-rw-rw   1 mysqlmysql   1024 Jun 22 10:05 bulk_table.MYI
-rw-rw   1 mysqlmysql   8760 Jun 22 09:59 bulk_table.frm
-rw-rw   1 mysqlmysql2114977792 Jun 22 11:11 internet_usage.MYD 

-rw-rw   1 mysqlmysql   1024 Jun 22 10:34 internet_usage.MYI
-rw-rw   1 mysqlmysql   8646 Jun 22 09:59 internet_usage.frm
-rw-rw   1 mysqlmysql33437600 Jun 22 10:23 url_paths.MYD
-rw-rw   1 mysqlmysql27696128 Jun 22 10:23 url_paths.MYI
-rw-rw   1 mysqlmysql   8574 Jun 22 09:59 url_paths.frm
-rw-rw   1 mysqlmysql 646160 Jun 22 10:19 url_servers.MYD
-rw-rw   1 mysqlmysql 796672 Jun 22 10:19 url_servers.MYI
-rw-rw   1 mysqlmysql   8578 Jun 22 09:59 url_servers.frm
-rw-rw   1 mysqlmysql119076844 Jun 22 10:32 url_visit.MYD
-rw-rw   1 mysqlmysql73026560 Jun 22 10:33 url_visit.MYI
-rw-rw   1 mysqlmysql   8694 Jun 22 09:59 url_visit.frm

I'm sure something's not right because the internet Usage table is bigger 
than the bulk table. I changed your sql a bit (on the last line with the 
query). Could this be the problem?

INSERT internet_usage (uid,time,ip,urlid,size)
SELECT bt.user, bt.time, bt.ip, uv.urlid, bt.size
FROM bulk_table bt
INNER JOIN url_servers us
   ON us.server=bt.server
INNER JOIN url_paths up
   ON up.path=bt.path
INNER JOIN url_visit uv
   ON uv.url_server_ID=us.ID
   AND uv.url_path_ID=up.ID
   AND uv.query=bt.query;
Thanks again,
js.
 welcome to a basic overview of bulk importing and normalizing as
you go 
[ author's note: if you are seeing this thread for the first time and
certain items seem to be
introduced out of context, please review all previous posts in this 
thread.
There has been
a lot of information already exchanged on this topic that I clipped
out of this response. Thanks! -- SG]

In an earlier post you said
The data values I have for each record are:

user_id   date_time   size   url   category
for example:

u752359   2004-04-02 12:33:04   3403   

Re: load data into 2 tables and set id

2004-06-22 Thread J S
I think I fixed it!
INSERT IGNORE url_visit (url_server_ID, url_path_ID, querystring,
category)
SELECT DISTINCT us.ID, up.ID, if(bt.path_split 0, SUBSTRING(bt.url,path),
NULL),
bt.category
FROM bulk_table bt
INNER JOIN url_servers us
  ON us.server = bt.server
INNER JOIN url_paths up
  on up.path=bt.path
After this I ran the sql to insert into internet_usage and this time it 
finished after 10m32.

I'll do some more testing and let you know how it goes.
Thanks,
js.
Did you mean there to be duplicates in the url_visits? Do I need to use 
IGNORE in the following SQL?

INSERT url_visit (url_server_ID, url_path_ID, querystring,
category)
SELECT us.ID, up.ID, if(bt.path_split 0, SUBSTRING(bt.url,path),
NULL),
bt.category
FROM bulk_table bt
INNER JOIN url_servers us
  ON us.server = bt.server
INNER JOIN url_paths up
  on up.path=bt.path
js.
mysql select * from url_visit where urlid=1631;
+---+---+-++--+
| urlid | url_server_ID | url_path_ID | query   
   
| category |
+---+---+-++--+
|  1631 | 21720 |  630695 | 
cid=mrkbakisrc=catcmrkdswi|p1|rshop/catcmrkseai|p1|rshop/catcmrkhori|p1|rshop/catcmrki|p1|rshop/catcripa|p1|rgift 
| Online Sales |
+---+---+-++--+
1 row in set (0.01 sec)

mysql select * from url_visit where urlid=1753;
+---+---+-++--+
| urlid | url_server_ID | url_path_ID | query   
   
| category |
+---+---+-++--+
|  1753 | 21720 |  630695 | 
cid=mrkbakisrc=catcmrkdswi|p1|rshop/catcmrkseai|p1|rshop/catcmrkhori|p1|rshop/catcmrki|p1|rshop/catcripa|p1|rgift 
| Online Sales |
+---+---+-++--+
1 row in set (0.00 sec)

Shawn,
Thanks for your reply below. I found it extremely useful. I have followed 
your instructions and got good results up to the url_visits table.

I have a perl script to parse the values out of the log. The log has 
3,770,246 lines and is gzipped. I then applied your SQL statements with 
the following times:

1. Parse script: 10m31
2. Load infile: 2m39
3. Insert url_servers and url_paths: 5m38
4. Insert url_visits 9m50
However when I ran the sql to insert the internet usage table, it just ran 
until all the log space was filled up:

-rw-rw   1 mysqlmysql379694680 Jun 22 10:05 bulk_table.MYD 
   
-rw-rw   1 mysqlmysql   1024 Jun 22 10:05 bulk_table.MYI
-rw-rw   1 mysqlmysql   8760 Jun 22 09:59 bulk_table.frm
-rw-rw   1 mysqlmysql2114977792 Jun 22 11:11 
internet_usage.MYD 
-rw-rw   1 mysqlmysql   1024 Jun 22 10:34 internet_usage.MYI
-rw-rw   1 mysqlmysql   8646 Jun 22 09:59 internet_usage.frm
-rw-rw   1 mysqlmysql33437600 Jun 22 10:23 url_paths.MYD
-rw-rw   1 mysqlmysql27696128 Jun 22 10:23 url_paths.MYI
-rw-rw   1 mysqlmysql   8574 Jun 22 09:59 url_paths.frm
-rw-rw   1 mysqlmysql 646160 Jun 22 10:19 url_servers.MYD
-rw-rw   1 mysqlmysql 796672 Jun 22 10:19 url_servers.MYI
-rw-rw   1 mysqlmysql   8578 Jun 22 09:59 url_servers.frm
-rw-rw   1 mysqlmysql119076844 Jun 22 10:32 url_visit.MYD
-rw-rw   1 mysqlmysql73026560 Jun 22 10:33 url_visit.MYI
-rw-rw   1 mysqlmysql   8694 Jun 22 09:59 url_visit.frm

I'm sure something's not right because the internet Usage table is bigger 
than the bulk table. I changed your sql a bit (on the last line with the 
query). Could this be the problem?

INSERT internet_usage (uid,time,ip,urlid,size)
SELECT bt.user, bt.time, bt.ip, uv.urlid, bt.size
FROM bulk_table bt
INNER JOIN url_servers us
   ON us.server=bt.server
INNER JOIN url_paths up
   ON up.path=bt.path
INNER JOIN url_visit uv
   ON uv.url_server_ID=us.ID
   AND uv.url_path_ID=up.ID
   AND uv.query=bt.query;
Thanks again,
js.
 welcome to a basic overview of bulk importing and normalizing as
you go 

Re: load data into 2 tables and set id

2004-06-22 Thread SGreen

Great catch! I believe you found the problem. Sorry about that!   ;-D

So... just curious... from the original 60GB of text data, how much space
did the final data require (including indices)?

Yours,
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine



   

  J S

  [EMAIL PROTECTED]To:   [EMAIL PROTECTED]
 
  com cc:   [EMAIL PROTECTED] 

   Fax to: 

  06/22/2004 07:55 Subject:  Re: load data into 2 tables 
and set id
  AM   

   

   





I think I fixed it!

INSERT IGNORE url_visit (url_server_ID, url_path_ID, querystring,
category)
SELECT DISTINCT us.ID, up.ID, if(bt.path_split 0, SUBSTRING(bt.url,path),
NULL),
bt.category
FROM bulk_table bt
INNER JOIN url_servers us
   ON us.server = bt.server
INNER JOIN url_paths up
   on up.path=bt.path

After this I ran the sql to insert into internet_usage and this time it
finished after 10m32.

I'll do some more testing and let you know how it goes.

Thanks,

js.

Did you mean there to be duplicates in the url_visits? Do I need to use
IGNORE in the following SQL?

INSERT url_visit (url_server_ID, url_path_ID, querystring,
category)
SELECT us.ID, up.ID, if(bt.path_split 0, SUBSTRING(bt.url,path),
NULL),
bt.category
FROM bulk_table bt
INNER JOIN url_servers us
   ON us.server = bt.server
INNER JOIN url_paths up
   on up.path=bt.path

js.

mysql select * from url_visit where urlid=1631;
+---+---+-++--+

| urlid | url_server_ID | url_path_ID | query



 | category |
+---+---+-++--+

|  1631 | 21720 |  630695 |
cid=mrkbakisrc=catcmrkdswi|p1|rshop/catcmrkseai|p1|rshop/catcmrkhori|p1|rshop/catcmrki|p1|rshop/catcripa|p1|rgift

| Online Sales |
+---+---+-++--+

1 row in set (0.01 sec)

mysql select * from url_visit where urlid=1753;
+---+---+-++--+

| urlid | url_server_ID | url_path_ID | query



 | category |
+---+---+-++--+

|  1753 | 21720 |  630695 |
cid=mrkbakisrc=catcmrkdswi|p1|rshop/catcmrkseai|p1|rshop/catcmrkhori|p1|rshop/catcmrki|p1|rshop/catcripa|p1|rgift

| Online Sales |
+---+---+-++--+

1 row in set (0.00 sec)


Shawn,

Thanks for your reply below. I found it extremely useful. I have followed

your instructions and got good results up to the url_visits table.

I have a perl script to parse the values out of the log. The log has
3,770,246 lines and is gzipped. I then applied your SQL statements with
the following times:

1. Parse script: 10m31
2. Load infile: 2m39
3. Insert url_servers and url_paths: 5m38
4. Insert url_visits 9m50

However when I ran the sql to insert the internet usage table, it just
ran
until all the log space was filled up:

-rw-rw   1 mysqlmysql379694680 Jun 22 10:05 bulk_table.MYD


-rw-rw   1 mysqlmysql   1024 Jun 22 10:05 bulk_table.MYI
-rw-rw   1 mysqlmysql   8760 Jun 22 09:59 bulk_table.frm
-rw-rw   1 mysqlmysql2114977792 Jun 22 11:11
internet_usage.MYD 
-rw-rw   1 mysqlmysql   1024 Jun 22 10:34 internet_usage.MYI
-rw-rw   1 mysqlmysql   8646 Jun 22 09:59 internet_usage.frm
-rw-rw   1 mysqlmysql33437600 Jun 22 10:23 url_paths.MYD
-rw-rw   1 mysqlmysql

Re: load data into 2 tables and set id

2004-06-22 Thread J S
Shawn,
Here are the results:
$ ls -l /sawmill/rawlog/SG_CSGL02_main_47060821.log.gz
-rw-r-   1 bluecoat staff138510199 Jun 14 10:04 
/sawmill/rawlog/SG_CSGL02_main_47060821.log.gz

$ time ./logfile.pl /sawmill/rawlog/SG_CSGL02_main_47060821.log.gz
Tue Jun 22 16:53:40 2004: PARSING 
/sawmill/rawlog/SG_CSGL02_main_47060821.log.gz...
Tue Jun 22 17:03:37 2004: BULK_TABLE_INSERT...
Tue Jun 22 17:05:41 2004: INSERT_URL_SCHEMES...
Tue Jun 22 17:06:08 2004: INSERT_URL_SERVERS...
Tue Jun 22 17:08:06 2004: INSERT_URL_PATHS...
Tue Jun 22 17:11:20 2004: INSERT_URL_VISITS...
Tue Jun 22 17:28:16 2004: INSERT_INTERNET_USAGE...
Tue Jun 22 17:42:01 2004: Finished

real47m16.68s
user10m44.47s
sys 0m17.95s
$
# ls -l
total 1206168
-rw-rw   1 mysqlmysql379692348 Jun 22 17:05 bulk_table.MYD
-rw-rw   1 mysqlmysql   1024 Jun 22 17:05 bulk_table.MYI
-rw-rw   1 mysqlmysql   8760 Jun 22 16:51 bulk_table.frm
-rw-rw   1 mysqlmysql115443608 Jun 22 17:40 internet_usage.MYD
-rw-rw   1 mysqlmysql19328000 Jun 22 17:42 internet_usage.MYI
-rw-rw   1 mysqlmysql   8646 Jun 22 16:51 internet_usage.frm
-rw-rw   1 mysqlmysql33437600 Jun 22 17:11 url_paths.MYD
-rw-rw   1 mysqlmysql27696128 Jun 22 17:11 url_paths.MYI
-rw-rw   1 mysqlmysql   8574 Jun 22 16:51 url_paths.frm
-rw-rw   1 mysqlmysql 80 Jun 22 17:06 url_schemes.MYD
-rw-rw   1 mysqlmysql   3072 Jun 22 17:06 url_schemes.MYI
-rw-rw   1 mysqlmysql   8578 Jun 22 16:51 url_schemes.frm
-rw-rw   1 mysqlmysql 646160 Jun 22 17:08 url_servers.MYD
-rw-rw   1 mysqlmysql 796672 Jun 22 17:08 url_servers.MYI
-rw-rw   1 mysqlmysql   8578 Jun 22 16:51 url_servers.frm
-rw-rw   1 mysqlmysql24082472 Jun 22 17:28 url_visit.MYD
-rw-rw   1 mysqlmysql16331776 Jun 22 17:28 url_visit.MYI
-rw-rw   1 mysqlmysql   8736 Jun 22 16:51 url_visit.frm
# du -sk .
603088  .
I had a problem trying to get the hits results. It works through the mysql 
command line but if I try to run mysql  hits.sql   hits.txt it can't find 
the temp table tmpStats.

js.
Great catch! I believe you found the problem. Sorry about that!   ;-D
So... just curious... from the original 60GB of text data, how much space
did the final data require (including indices)?
Yours,
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

  J S
  [EMAIL PROTECTED]To:   [EMAIL PROTECTED]
  com cc:   
[EMAIL PROTECTED]
   Fax to:
  06/22/2004 07:55 Subject:  Re: load data 
into 2 tables and set id
  AM



I think I fixed it!
INSERT IGNORE url_visit (url_server_ID, url_path_ID, querystring,
category)
SELECT DISTINCT us.ID, up.ID, if(bt.path_split 0, SUBSTRING(bt.url,path),
NULL),
bt.category
FROM bulk_table bt
INNER JOIN url_servers us
   ON us.server = bt.server
INNER JOIN url_paths up
   on up.path=bt.path
After this I ran the sql to insert into internet_usage and this time it
finished after 10m32.
I'll do some more testing and let you know how it goes.
Thanks,
js.

Did you mean there to be duplicates in the url_visits? Do I need to use
IGNORE in the following SQL?

INSERT url_visit (url_server_ID, url_path_ID, querystring,
category)
SELECT us.ID, up.ID, if(bt.path_split 0, SUBSTRING(bt.url,path),
NULL),
bt.category
FROM bulk_table bt
INNER JOIN url_servers us
   ON us.server = bt.server
INNER JOIN url_paths up
   on up.path=bt.path

js.

mysql select * from url_visit where urlid=1631;
+---+---+-++--+
| urlid | url_server_ID | url_path_ID | query

 | category |
+---+---+-++--+
|  1631 | 21720 |  630695 |
cid=mrkbakisrc=catcmrkdswi|p1|rshop/catcmrkseai|p1|rshop/catcmrkhori|p1|rshop/catcmrki|p1|rshop/catcripa|p1|rgift
| Online Sales |
+---+---+-++--+
1 row in set (0.01 sec)

mysql select * from url_visit where urlid=1753;
+---+---+-++--+
| urlid | url_server_ID | url_path_ID | query

 | category

Re: load data into 2 tables and set id

2004-06-18 Thread J S


Mos forgot to populate the url_id column in your user table. I would use
his same process but re-arrange it like this:
1) create table BIG_TABLE
2) load data infile
3) create table URL_TABLE (
  url_id bigint not null auto_increment,
  url varchar(25) not null primary key,
  category 
  )
4) INSERT IGNORE URL_TABLE (url, category)
  SELECT url,category
  FROM BIG_TABLE
4) create table USER_TABLE (
  user_id varchar?(...) primary key,
  date_time datetime,
  url_id bigint,
  size int
  )
5) insert USER_TABLE
  SELECT bt.userID, bt.datetime, u.url_id, bt.size
  FROM BIG_TABLE bt
  INNER JOIN URL_TABLE u
  ON u.url = bt.url
doing it this way lets the SQL engine handle the conversion of URLs to
their IDs in the USER_TABLE...see? No scripting required at all!
Yours,
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

Shawn,
Many thanks for your reply. It sounds like this might be quicker than my 
perl script which parses a 1GB log file and fills the database in 40 mins at 
the mo. (The actual parsing only takes 1.5 mins).
There's one snag which I forgot about and that's in the url_table I have 
another column called hits which is the number of hits for each url.  I'm 
terrible at SQL and wondered if you might be able to suggest a way of doing 
this with the method above?

Thanks,
JS.
_
It's fast, it's easy and it's free. Get MSN Messenger today! 
http://www.msn.co.uk/messenger

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: load data into 2 tables and set id

2004-06-18 Thread SGreen

No problem!!

Please post the structures of your big_table and your url_table
(whatever you called them)  and I will help you to rewrite step 4 to count
how many times a URL appears in the big_table.

Is this bulk import a process you need to do repeatedly? if so we need to
worry about updating the count column on the next batch import and not just
re-creating it from the new data. That will change the query significantly.

Yours,
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine



   
 
  J S
 
  [EMAIL PROTECTED]To:   [EMAIL PROTECTED]
  
  com cc:   [EMAIL PROTECTED] 
 
   Fax to: 
 
  06/13/2004 12:29 Subject:  Re: load data into 2 tables 
and set id 
  PM   
 
   
 
   
 








Mos forgot to populate the url_id column in your user table. I would use
his same process but re-arrange it like this:

1) create table BIG_TABLE
2) load data infile
3) create table URL_TABLE (
   url_id bigint not null auto_increment,
   url varchar(25) not null primary key,
   category 
   )
4) INSERT IGNORE URL_TABLE (url, category)
   SELECT url,category
   FROM BIG_TABLE
4) create table USER_TABLE (
   user_id varchar?(...) primary key,
   date_time datetime,
   url_id bigint,
   size int
   )

5) insert USER_TABLE
   SELECT bt.userID, bt.datetime, u.url_id, bt.size
   FROM BIG_TABLE bt
   INNER JOIN URL_TABLE u
   ON u.url = bt.url

doing it this way lets the SQL engine handle the conversion of URLs to
their IDs in the USER_TABLE...see? No scripting required at all!

Yours,
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine



Shawn,

Many thanks for your reply. It sounds like this might be quicker than my
perl script which parses a 1GB log file and fills the database in 40 mins
at
the mo. (The actual parsing only takes 1.5 mins).
There's one snag which I forgot about and that's in the url_table I have
another column called hits which is the number of hits for each url.  I'm
terrible at SQL and wondered if you might be able to suggest a way of doing

this with the method above?

Thanks,

JS.

_
It's fast, it's easy and it's free. Get MSN Messenger today!
http://www.msn.co.uk/messenger







-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: load data into 2 tables and set id

2004-06-18 Thread J S
Shawn,
Thanks for helping on this. I really appreciate it.
No problem!!
Please post the structures of your big_table and your url_table
(whatever you called them)  and I will help you to rewrite step 4 to count
how many times a URL appears in the big_table.
mysql desc internet_usage;
++-+--+-+-+---+
| Field  | Type| Null | Key | Default | Extra |
++-+--+-+-+---+
| uid| varchar(10) | YES  | MUL | NULL|   |
| time   | datetime| YES  | | NULL|   |
| ip | varchar(20) | YES  | | NULL|   |
| action | varchar(20) | YES  | | NULL|   |
| urlid  | int(11) | YES  | | NULL|   |
| size   | int(11) | YES  | | NULL|   |
++-+--+-+-+---+
6 rows in set (0.03 sec)
mysql desc url_table;
+--+--+--+-+-++
| Field| Type | Null | Key | Default | Extra|
+--+--+--+-+-++
| urlid| mediumint(9) |  | PRI | NULL| auto_increment |
| url  | text | YES  | MUL | NULL||
| hits | mediumint(9) | YES  | | NULL||
| category | varchar(50)  | YES  | | NULL||
+--+--+--+-+-++
4 rows in set (0.01 sec)
mysql
Is this bulk import a process you need to do repeatedly? if so we need to
worry about updating the count column on the next batch import and not just
re-creating it from the new data. That will change the query significantly.
Yes it will be a repeated process. Actually I have a backlog of 6 months 
data to load!

Here's an example of what the data looks like:
mysql select * from internet_usage limit 5;
+-+-++--+---+--+
| uid | time| ip | action   | urlid | 
size |
+-+-++--+---+--+
| n58396  | 2004-06-07 21:12:16 | 21.38.25.204   | TCP_TUNNELED |  5999 | 
5297 |
| u344584 | 2004-06-07 21:07:12 | 21.33.136.74   | TCP_HIT  |  4494 |  
438 |
| -   | 2004-06-07 21:07:02 | 21.38.92.76| TCP_NC_MISS  |  2134 |  
771 |
| u524797 | 2004-06-07 21:03:27 | 21.32.25.41| TCP_NC_MISS  |   260 |  
582 |
| -   | 2004-06-07 21:09:13 | 21.201.130.240 | TCP_HIT  |  3112 |  
542 |
+-+-++--+---+--+

mysql select * from url_table limit 5;
+---+---+--+---+
| urlid | url
   | hits | category  |
+---+---+--+---+
| 1 | 
http://www.bbc.co.uk/horoscopes/chinesehoroscopes/images/hp-snake.gif | NULL 
| Entertainment |
| 2 | http://www.call18866.co.uk/images/logo.jpg 
   | NULL | none  |
| 3 | http://m2.doubleclick.net/866421/0409_santoku_250.gif  
   | NULL | none  |
| 4 | http://lysto1-dc02.ww.ad.ba.com/   
   | NULL | Travel|
| 5 | http://www.aboutscotland.com/edin/newstreetph/sitview.jpg  
   | NULL | Travel|
+---+---+--+---+
5 rows in set (0.00 sec)

One other problem I'm having here is making the rows in internet_usage 
unique. At the moment I have lots of duplicates, and I was trying to create 
a temporary table but unfortunately got an error 27 (I think this refers to 
a 2GB limit).

mysql CREATE TEMPORARY TABLE bad_temp AS SELECT DISTINCT * FROM 
internet_usage;
ERROR 1030: Got error 27 from table handler

Is there another way of doing this?
Yours,
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

Mos forgot to populate the url_id column in your user table. I would use
his same process but re-arrange it like this:

1) create table BIG_TABLE
2) load data infile
3) create table URL_TABLE (
   url_id bigint not null auto_increment,
   url varchar(25) not null primary key,
   category 
   )
4) INSERT IGNORE URL_TABLE (url, category)
   SELECT url,category
   FROM BIG_TABLE
4) create table USER_TABLE (
   user_id varchar?(...) primary key,
   date_time datetime,
   url_id bigint,
   size int
   )

5) insert USER_TABLE
   SELECT bt.userID, bt.datetime, u.url_id, bt.size
   FROM BIG_TABLE bt
   INNER JOIN URL_TABLE u
   ON u.url = bt.url

doing it this way lets the SQL engine handle the conversion of URLs to
their IDs in 

Re: load data into 2 tables and set id

2004-06-18 Thread SGreen

JS,

I need one more piece of information to help make that query work for you,
I need the structure of the table that you use (or would use) to bulk
import those logs into.

If you are running out of room you may consider further normalizing you
data (which saves space, but creates more data maintenance steps).  I, too,
have had to deal with millions of rows of internet usage logs so I
understand your pain.

You can store your IP addresses in an INT and get them back in dotted
notation with the MySQL functions INET_ATON() and INET_NTOA().  That will
save you an average of 10 bytes PER ROW (it adds up when you are into the
millions of rows).

Create a Proxy_Action table (id tinyint, action varchar(20) ) and populate
it with all of the actions your proxy/firewall can log.  Then replace the
column action varchar(20) with ProxyAction_id tinyint.  (I assume there
are less than 256 action messages available from your proxy?) That's 1
byte vs. 8 at the low end of your sample data (counting the null at the end
of the string).  First add the new column to the table, populate it with
the ID values from your new Action table, then drop the old column.

URLs consist (in a basic sense) of the server portion (to the left of the
first  single / ) and the path portion (right of the first / and left of a
? or #) and either a fragment (after the #) or a query string (after the ?)
I would at least split out the server portion into it's own table. For each
page request (assume 1 page and 9 pictures, all from the same server) that
would be 10 rows of log data that all contain the same chunk of similar
information. Reducing that heavily repeated portion of your data to an ID
number will greatly help reduce the size of your database.

About the non-uniqueness of your internet_usage table.  Even if the same
user visits the same URL multiple times (is that what you mean by repeated
records?) the times should all be slightly different.  If they are not
different, it is still possible that the same person requested the same
page twice or more during the same second (the auto-login feature of MSIE
comes to mind as one culprit). OR you could have multiple users all on the
same userID hitting the same page from different machines I guess I
would have to see more data to understand your problem better.

Yours,
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine



   
 
  J S
 
  [EMAIL PROTECTED]To:   [EMAIL PROTECTED]
  
  com cc:   [EMAIL PROTECTED] 
 
   Fax to: 
 
  06/18/2004 09:40 Subject:  Re: load data into 2 tables 
and set id 
  AM   
 
   
 
   
 




Shawn,

Thanks for helping on this. I really appreciate it.

No problem!!

Please post the structures of your big_table and your url_table
(whatever you called them)  and I will help you to rewrite step 4 to count
how many times a URL appears in the big_table.

mysql desc internet_usage;
++-+--+-+-+---+
| Field  | Type| Null | Key | Default | Extra |
++-+--+-+-+---+
| uid| varchar(10) | YES  | MUL | NULL|   |
| time   | datetime| YES  | | NULL|   |
| ip | varchar(20) | YES  | | NULL|   |
| action | varchar(20) | YES  | | NULL|   |
| urlid  | int(11) | YES  | | NULL|   |
| size   | int(11) | YES  | | NULL|   |
++-+--+-+-+---+
6 rows in set (0.03 sec)

mysql desc url_table;
+--+--+--+-+-++
| Field| Type | Null | Key | Default | Extra|
+--+--+--+-+-++
| urlid| mediumint(9) |  | PRI | NULL| auto_increment |
| url  | text | YES  | MUL | NULL||
| hits | mediumint(9) | YES  | | NULL||
| category | varchar(50)  | YES  | | NULL||
+--+--+--+-+-++
4 rows in set (0.01 sec)

mysql


Is this bulk import a process you

Re: load data into 2 tables and set id

2004-06-18 Thread J S
Shawn,
Thanks for the email below - I will go through it over later tonight in more 
detail (going home time now!) although I can already see good points there.

JS,
I need one more piece of information to help make that query work for you,
I need the structure of the table that you use (or would use) to bulk
import those logs into.
I'm completely flexible over this as I'm still at the testing stage. You 
might be able to suggest the most efficient structure? Isn't the problem 
getting the urlid before I insert into internet_usage so that I can find the 
associated url.

If you are running out of room you may consider further normalizing you
data (which saves space, but creates more data maintenance steps).  I, too,
have had to deal with millions of rows of internet usage logs so I
understand your pain.
You can store your IP addresses in an INT and get them back in dotted
notation with the MySQL functions INET_ATON() and INET_NTOA().  That will
save you an average of 10 bytes PER ROW (it adds up when you are into the
millions of rows).
Create a Proxy_Action table (id tinyint, action varchar(20) ) and populate
it with all of the actions your proxy/firewall can log.  Then replace the
column action varchar(20) with ProxyAction_id tinyint.  (I assume there
are less than 256 action messages available from your proxy?) That's 1
byte vs. 8 at the low end of your sample data (counting the null at the end
of the string).  First add the new column to the table, populate it with
the ID values from your new Action table, then drop the old column.
URLs consist (in a basic sense) of the server portion (to the left of the
first  single / ) and the path portion (right of the first / and left of a
? or #) and either a fragment (after the #) or a query string (after the ?)
I would at least split out the server portion into it's own table. For each
page request (assume 1 page and 9 pictures, all from the same server) that
would be 10 rows of log data that all contain the same chunk of similar
information. Reducing that heavily repeated portion of your data to an ID
number will greatly help reduce the size of your database.
About the non-uniqueness of your internet_usage table.  Even if the same
user visits the same URL multiple times (is that what you mean by repeated
records?) the times should all be slightly different.  If they are not
different, it is still possible that the same person requested the same
page twice or more during the same second (the auto-login feature of MSIE
comes to mind as one culprit). OR you could have multiple users all on the
same userID hitting the same page from different machines I guess I
would have to see more data to understand your problem better.
Yours,
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

  J S
  [EMAIL PROTECTED]To:   [EMAIL PROTECTED]
  com cc:   
[EMAIL PROTECTED]
   Fax to:
  06/18/2004 09:40 Subject:  Re: load data 
into 2 tables and set id
  AM



Shawn,
Thanks for helping on this. I really appreciate it.
No problem!!

Please post the structures of your big_table and your url_table
(whatever you called them)  and I will help you to rewrite step 4 to 
count
how many times a URL appears in the big_table.

mysql desc internet_usage;
++-+--+-+-+---+
| Field  | Type| Null | Key | Default | Extra |
++-+--+-+-+---+
| uid| varchar(10) | YES  | MUL | NULL|   |
| time   | datetime| YES  | | NULL|   |
| ip | varchar(20) | YES  | | NULL|   |
| action | varchar(20) | YES  | | NULL|   |
| urlid  | int(11) | YES  | | NULL|   |
| size   | int(11) | YES  | | NULL|   |
++-+--+-+-+---+
6 rows in set (0.03 sec)
mysql desc url_table;
+--+--+--+-+-++
| Field| Type | Null | Key | Default | Extra|
+--+--+--+-+-++
| urlid| mediumint(9) |  | PRI | NULL| auto_increment |
| url  | text | YES  | MUL | NULL||
| hits | mediumint(9) | YES  | | NULL||
| category | varchar(50)  | YES  | | NULL||
+--+--+--+-+-++
4 rows in set (0.01 sec)
mysql

Is this bulk import a process you need to do repeatedly? if so we need to
worry about updating the count column on the next batch import and not
just
re-creating it from the new data. That will change the query
significantly.
Yes it will be a repeated process. Actually I have a backlog of 6 months
data to load!
Here's an example of what the data looks like:
mysql select * from

Re: load data into 2 tables and set id

2004-06-18 Thread SGreen

 welcome to a basic overview of bulk importing and normalizing as
you go 

[ author's note: if you are seeing this thread for the first time and
certain items seem to be
introduced out of context, please review all previous posts in this thread.
There has been
a lot of information already exchanged on this topic that I clipped
out of this response. Thanks! -- SG]

In an earlier post you said
The data values I have for each record are:

user_id   date_time   size   url   category

for example:

u752359   2004-04-02 12:33:04   3403   http://www.mysql.com/index.html
business


To me that implies that you have a standard text log where each field is
separated by a space and rows are delimited by a CRLF pair. To get that log
into MySQL you will need to do something _like_ (I say like as you will
most likely need to tweak it to accommodate your actual data)

CREATE TABLE bulk_table (
  user varchar(10)
  , eventdate date
  , eventtime time
  , size int
  , url text
  , category varchar(50)
);

LOAD DATA INFILE a proxy log file INTO bulk_table FIELDS TERMINATED BY '
' LINES TERMINATED BY '\r\n';

Now we can start merging the logs into your data tables.

To quote RFC 2396:
   The URI syntax does not require that the scheme-specific-part have
   any general structure or set of semantics which is common among all
   URI.  However, a subset of URI do share a common syntax for
   representing hierarchical relationships within the namespace.  This
   generic URI syntax consists of a sequence of four main components:

  scheme://authoritypath?query

   each of which, except scheme, may be absent from a particular URI.

That translates into 4 logical pieces we can split a URL into:
1) the scheme -- HTTP, FTP, GOPHER, etc...
2) the authority -- that is the server (www.yahoo.com) and any port numbers
or login information
3) the path -- /somefolder/somefile.whatever
4) the query -- everything after the ?

Not part of the generic URI is that bit known as a fragment (as identified
by the #) it is ALWAYS at the end of the _entire URL_ (including the query)
when it's used. I have always lumped those and queries into the same field.

So you could create tables for each of those parts and get VERY normalized
or you can partly normalize like this:

ALTER TABLE bulk_table add server_split int not null default 0
  , add path_split int not null default 0
  , add server varchar(255)
  , add path varchar(255)

UPDATE bulk_table
set server_split = LOCATE('/', URL , 8)-1;

UPDATE bulk_table
SET path_split = if(LOCATE('?', URL, server_split) 0, LOCATE('?', URL,
server_split),  LOCATE('#', URL, server_split)-1);

UPDATE bulk_table
set server=LEFT(URL, server_split )

#those 4 new columns helped us to parse out the 3 major parts of the url
#I added them to the table so that we would not have to keep recalculating
those values later on
# if it turns out that adding the columns takes a LONG time, we can create
this table with those columns
# already created and just not import to them (change the LOAD DATA INFILE
statement slightly)

CREATE TABLE IF NOT EXISTS url_servers (
ID int not null auto_increment,
server varchar(255) primary key,
Key (ID)
)

CREATE TABLE IF NOT EXISTS url_paths (
ID int not null auto_increment,
path varchar(255) primary key,
Key (ID)
)

INSERT IGNORE INTO url_servers(server)
SELECT DISTINCT server
FROM bulk_table

INSERT IGNORE INTO url_paths (path)
SELECT DISTINCT path
FROM bulk_table

# at this point we have all of our new Servers and our Paths uniquely
numbered
# but we are going to need a slightly different URL table to track visits.

CREATE TABLE url_visit (
urlid mediumint not null auto_increment primary key,
url_server_ID int not null default 0,
url_path_ID int not null default 0,
querystring text default null,
category varchar(50)default null,
KEY(url_server_ID, url_path_ID)
)

## that last key is to speed up our joins to our _servers and _paths
tables...

#  we finally have enough information to insert to the visit table

INSERT url_visit (url_server_ID, url_path_ID, querystring, category)
SELECT us.ID, up.ID, if(bt.path_split 0, SUBSTRING(bt.url,path), NULL),
bt.category
FROM bulk_table bt
INNER JOIN url_servers us
  ON us.server = bt.server
INNER JOIN url_paths up
  on up.path=bt.path

## (see where the new pre-computed columns come in handy?) :-D

# and now we have enough information to load the internet_usage table.
Though there isn't enough data in your sample
# source data to fill in all of the columns

INSERT internet_usage ( uid,`time`,urlid, size)
SELECT bt.user, ADDTIME(bt.date, bt.time), uv.urlid, bt.size
FROM bulk_table bt
INNER JOIN url_servers us
  ON us.server = bt.server
INNER JOIN url_paths up
  ON up.path=bt.path
INNER JOIN url_visit uv
  ON uv.url_server_ID = us.ID
  AND uv.url_path_ID = up.id
  AND uv.querystring = if(bt.path_split 0, SUBSTRING(bt.url,path),
NULL)


It may not be perfect but it's how 

Re: load data into 2 tables and set id

2004-06-10 Thread J S
Hi,
I need some help please! I have 60GB of proxy logs to parse and load into 
a mysql database. I've written a parsing script but I'm stuck now on how 
to load the data in.

I have a database called PROXY_LOG with 2 tables:
USER_TABLE
user_id   date_time   url_id   size
and
URL_TABLE
url_id   url   category
The data values I have for each record are:
user_id   date_time   size   url   category
for example:
u752359   2004-04-02 12:33:04   3403   http://www.mysql.com/index.html
business
The problem is I'm not sure how to load the data into the 2 tables and set 
the url_id. Could anyone point me in the right direction please?

Many thanks,
js.
js,
Is this a one time operation? If so, you could create a temporary 
table with all the fields:

BIG_TABLE
user_id   date_time   url_id   size  url  category
and use Load Data InFile to load the data into this one table.
Then create the two smaller tables using:
create table USER_TABLE select  user_id, date_time, url_id, size from 
BIG_TABLE;
create table URL_TABLE select url_id, url, category from BIG_TABLE;
drop table BIG_TABLE;

So there you have it in 3 lines of code. :-)
Mike
P.S. If you could break the text file into two files to begin with, then 
two Load Data InFile statements would handle things nicely.

Thanks Mos, that sounds ideal but the url_id value has to come from the 
database.

js.
_
Want to block unwanted pop-ups? Download the free MSN Toolbar now!  
http://toolbar.msn.co.uk/

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: load data into 2 tables and set id

2004-06-10 Thread SGreen

Mos forgot to populate the url_id column in your user table. I would use
his same process but re-arrange it like this:

1) create table BIG_TABLE
2) load data infile
3) create table URL_TABLE (
  url_id bigint not null auto_increment,
  url varchar(25) not null primary key,
  category 
  )
4) INSERT IGNORE URL_TABLE (url, category)
  SELECT url,category
  FROM BIG_TABLE
4) create table USER_TABLE (
  user_id varchar?(...) primary key,
  date_time datetime,
  url_id bigint,
  size int
  )

5) insert USER_TABLE
  SELECT bt.userID, bt.datetime, u.url_id, bt.size
  FROM BIG_TABLE bt
  INNER JOIN URL_TABLE u
  ON u.url = bt.url

doing it this way lets the SQL engine handle the conversion of URLs to
their IDs in the USER_TABLE...see? No scripting required at all!

Yours,
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine



   
   
  mos  
   
  [EMAIL PROTECTED]To:   [EMAIL PROTECTED]

  m   cc: 
   
   Fax to: 
   
  06/09/2004 05:00 Subject:  Re: load data into 2 tables 
and set id   
  PM   
   
   
   
   
   




At 02:34 PM 6/9/2004, you wrote:
Hi,

I need some help please! I have 60GB of proxy logs to parse and load into
a mysql database. I've written a parsing script but I'm stuck now on how
to load the data in.

I have a database called PROXY_LOG with 2 tables:

USER_TABLE
user_id   date_time   url_id   size

and

URL_TABLE
url_id   url   category


The data values I have for each record are:

user_id   date_time   size   url   category

for example:

u752359   2004-04-02 12:33:04   3403   http://www.mysql.com/index.html
business

The problem is I'm not sure how to load the data into the 2 tables and set

the url_id. Could anyone point me in the right direction please?

Many thanks,

js.

js,
 Is this a one time operation? If so, you could create a temporary
table with all the fields:

BIG_TABLE
user_id   date_time   url_id   size  url  category

and use Load Data InFile to load the data into this one table.
Then create the two smaller tables using:

create table USER_TABLE select  user_id, date_time, url_id, size from
BIG_TABLE;
create table URL_TABLE select url_id, url, category from BIG_TABLE;
drop table BIG_TABLE;

So there you have it in 3 lines of code. :-)

Mike

P.S. If you could break the text file into two files to begin with, then
two Load Data InFile statements would handle things nicely.


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]







-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



load data into 2 tables and set id

2004-06-09 Thread J S
Hi,
I need some help please! I have 60GB of proxy logs to parse and load into a 
mysql database. I've written a parsing script but I'm stuck now on how to 
load the data in.

I have a database called PROXY_LOG with 2 tables:
USER_TABLE
user_id   date_time   url_id   size
and
URL_TABLE
url_id   url   category
The data values I have for each record are:
user_id   date_time   size   url   category
for example:
u752359   2004-04-02 12:33:04   3403   http://www.mysql.com/index.html   
business

The problem is I'm not sure how to load the data into the 2 tables and set 
the url_id. Could anyone point me in the right direction please?

Many thanks,
js.
_
It's fast, it's easy and it's free. Get MSN Messenger today! 
http://www.msn.co.uk/messenger

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: load data into 2 tables and set id

2004-06-09 Thread William R. Mussatto
J S said:
 Hi,

 I need some help please! I have 60GB of proxy logs to parse and load
 into a  mysql database. I've written a parsing script but I'm stuck now
 on how to  load the data in.

 I have a database called PROXY_LOG with 2 tables:

 USER_TABLE
 user_id   date_time   url_id   size

 and

 URL_TABLE
 url_id   url   category


 The data values I have for each record are:

 user_id   date_time   size   url   category

 for example:

 u752359   2004-04-02 12:33:04   3403   http://www.mysql.com/index.html
  business

 The problem is I'm not sure how to load the data into the 2 tables and
 set  the url_id. Could anyone point me in the right direction please?

 Many thanks,

 js.

 _
 It's fast, it's easy and it's free. Get MSN Messenger today!
 http://www.msn.co.uk/messenger
What language did you use to do the parsing.  If it was perl I'd recommend
looking at using perl's DBI interface and doing it progromaticaly.
Do you have any idea of how many different url's you have.

Basic steps:
   Get record,
   check to see if url is in database, if it is get the url_id.
   if not insert it and get the generated url_id.
   insert the user record using the url_id you now have.
repeat until you run out of records.

Not elegent but it will get the job done.  Note look into documentation on
how to get the new url_id after you do an insert.  Its in the DBD::mysql
for perl.

-- 
William R. Mussatto, Senior Systems Engineer
Ph. 909-920-9154 ext. 27
FAX. 909-608-7061



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: load data into 2 tables and set id

2004-06-09 Thread mos
At 02:34 PM 6/9/2004, you wrote:
Hi,
I need some help please! I have 60GB of proxy logs to parse and load into 
a mysql database. I've written a parsing script but I'm stuck now on how 
to load the data in.

I have a database called PROXY_LOG with 2 tables:
USER_TABLE
user_id   date_time   url_id   size
and
URL_TABLE
url_id   url   category
The data values I have for each record are:
user_id   date_time   size   url   category
for example:
u752359   2004-04-02 12:33:04   3403   http://www.mysql.com/index.html
business
The problem is I'm not sure how to load the data into the 2 tables and set 
the url_id. Could anyone point me in the right direction please?

Many thanks,
js.
js,
Is this a one time operation? If so, you could create a temporary 
table with all the fields:

BIG_TABLE
user_id   date_time   url_id   size  url  category
and use Load Data InFile to load the data into this one table.
Then create the two smaller tables using:
create table USER_TABLE select  user_id, date_time, url_id, size from 
BIG_TABLE;
create table URL_TABLE select url_id, url, category from BIG_TABLE;
drop table BIG_TABLE;

So there you have it in 3 lines of code. :-)
Mike
P.S. If you could break the text file into two files to begin with, then 
two Load Data InFile statements would handle things nicely. 

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: load data into 2 tables and set id

2004-06-09 Thread J S

 Hi,

 I need some help please! I have 60GB of proxy logs to parse and load
 into a  mysql database. I've written a parsing script but I'm stuck now
 on how to  load the data in.

 I have a database called PROXY_LOG with 2 tables:

 USER_TABLE
 user_id   date_time   url_id   size

 and

 URL_TABLE
 url_id   url   category


 The data values I have for each record are:

 user_id   date_time   size   url   category

 for example:

 u752359   2004-04-02 12:33:04   3403   http://www.mysql.com/index.html
  business

 The problem is I'm not sure how to load the data into the 2 tables and
 set  the url_id. Could anyone point me in the right direction please?

 Many thanks,

 js.

 _
 It's fast, it's easy and it's free. Get MSN Messenger today!
 http://www.msn.co.uk/messenger
What language did you use to do the parsing.  If it was perl I'd recommend
looking at using perl's DBI interface and doing it progromaticaly.
Do you have any idea of how many different url's you have.
Basic steps:
   Get record,
   check to see if url is in database, if it is get the url_id.
   if not insert it and get the generated url_id.
   insert the user record using the url_id you now have.
repeat until you run out of records.
Not elegent but it will get the job done.  Note look into documentation on
how to get the new url_id after you do an insert.  Its in the DBD::mysql
for perl.
Thanks for your reply William. I am using a perl script. If I have to insert 
these records one by one it's going to be really slow isn't it? Maybe the 
quickest way is to parse the logs twice i.e. fill the URL_TABLE first using 
your procedure above, then on the second run, create a LOAD file for 
USER_TABLE?

js.
_
It's fast, it's easy and it's free. Get MSN Messenger today! 
http://www.msn.co.uk/messenger

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: load data into 2 tables and set id

2004-06-09 Thread William R. Mussatto
J S said:


  Hi,
 
  I need some help please! I have 60GB of proxy logs to parse and load
 into a  mysql database. I've written a parsing script but I'm stuck
 now on how to  load the data in.
 
  I have a database called PROXY_LOG with 2 tables:
 
  USER_TABLE
  user_id   date_time   url_id   size
 
  and
 
  URL_TABLE
  url_id   url   category
 
 
  The data values I have for each record are:
 
  user_id   date_time   size   url   category
 
  for example:
 
  u752359   2004-04-02 12:33:04   3403
 http://www.mysql.com/index.html
   business
 
  The problem is I'm not sure how to load the data into the 2 tables
 and set  the url_id. Could anyone point me in the right direction
 please?
 
  Many thanks,
 
  js.
 
  _
 It's fast, it's easy and it's free. Get MSN Messenger today!
  http://www.msn.co.uk/messenger
What language did you use to do the parsing.  If it was perl I'd
 recommend looking at using perl's DBI interface and doing it
 progromaticaly. Do you have any idea of how many different url's you
 have.

Basic steps:
Get record,
check to see if url is in database, if it is get the url_id.
if not insert it and get the generated url_id.
insert the user record using the url_id you now have.
repeat until you run out of records.

Not elegent but it will get the job done.  Note look into documentation
 on how to get the new url_id after you do an insert.  Its in the
 DBD::mysql for perl.


 Thanks for your reply William. I am using a perl script. If I have to
 insert  these records one by one it's going to be really slow isn't it?
 Maybe the  quickest way is to parse the logs twice i.e. fill the
 URL_TABLE first using  your procedure above, then on the second run,
 create a LOAD file for  USER_TABLE?

 js.
How will you get the information for the url-id's?  I can see splitting
the logs and using a load file for the url_table (if you can eliminate
duplicates).  You can save some time, if you can build a perl hash with
the
$url{urlvalue} = url_id. Test that and only do inserts if you need to.
Hash look up is faster than db query, but you will have to have the hash
in memory.  You can use the hash to prepare the USER_TABLE and then load
infile that.
Just thought, url is going to have to be a unique key? You can speed up
the initial inserts by inserting without that key (using the perl hash to
avoid collisions) and then altering table to add the key in.  However,
question comes back to do you have enough memory for the hash in perl?
Notice also, that you don't have a rowID equivalent in the USER_TABLE



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]