Re: matching people with projects via resources

2004-10-01 Thread Matt Eaton
Hi Laszlo,
This is sort of a butchery, and might be a little nicer with two queries 
and a temp table, but this works in mysql 4.1.3-beta (at least, it did 
for me).

SELECT A.name, B.proj
FROM people as A, project as B
WHERE A.rsrc=B.rsrc
GROUP BY A.name, B.proj
HAVING COUNT(*)=(SELECT COUNT(*) FROM project WHERE proj=B.proj);
This counts up the number of rows each (name,project) pair with 
resources in common and then checks to see if it's equal to the total 
number of resources for that project.  This would be pretty slow on a 
really huge table (in the tens of thousands, maybe?  I don't have a 
great sense for how it would scale), in which case you'd want to make a 
temporary table with all of the resource counts cached per project.

Hope that helps,
Matt
Laszlo Thoth wrote:
I'm having difficulty constructing a query.  I've got two kinds of information:
a table of resources that various people have, and a table of resources that
various projects need.
===
CREATE TABLE `people` (
 `name` varchar(11) default NULL,
 `rsrc` varchar(15) default NULL
);
INSERT INTO `people` VALUES
('noah','wood'),('noah','canvas'),('lincoln','wood'),('davinci','canvas'),('davinci','paint');
CREATE TABLE `project` (
 `proj` varchar(11) default NULL,
 `rsrc` varchar(15) default NULL
);
INSERT INTO `project` VALUES
('ark','wood'),('ark','canvas'),('cabin','wood'),('monalisa','canvas'),('monalisa','paint'),('jeans','canvas'),('jeans','sewingmachine');
===
I need a query that will tell me which people have the resources required to
complete a given project.  Unfortunately all I can get are incomplete matches:
I'm not sure how to express the concept of fully satisfying the requirements
to MySQL.
Restructuring the tables is allowed: I'm not tied to the current schema, I just
need to solve the problem.  The only limit is that resources must be arbitrary:
I can't use a SET to define resources because I might want to insert a new
resource at some future point without redefining the column type.
I'm pretty sure this is a good starting point, but that's just matching resource
to resource without excluding Lincoln from building an Ark (no canvas).
mysql SELECT project.proj,project.rsrc,people.name FROM project LEFT JOIN
people ON project.rsrc=people.rsrc;
+--+-+-+
| proj | rsrc| name|
+--+-+-+
| ark  | wood| noah|
| ark  | wood| lincoln |
| ark  | canvas  | noah|
| ark  | canvas  | davinci |
| cabin| wood| noah|
| cabin| wood| lincoln |
| monalisa | canvas  | noah|
| monalisa | canvas  | davinci |
| monalisa | paint   | davinci |
| jeans| canvas  | noah|
| jeans| canvas  | davinci |
| jeans| sewingmachi | NULL|
+--+-+-+
It would also be sufficient but less optimal to solve a subset of this problem,
where I only determine who could complete this project for a single project
rather than trying to match all projects to all people in one query.
 


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Cross server selects?

2004-08-28 Thread Matt Eaton
Hi all.  Just a quick syntax question.  Is there a way to select rows 
from a different server  database into the one currently in use?  In 
other words, if I had two servers, is there something equivalent to 
saying (while using the client on server1):

SELECT * FROM server2.databasename.tablename WHERE id  5
I couldn't find anything the manual on this.  Thanks!
-Matt
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


RE: GROUP BY optimization headscratcher

2004-08-14 Thread Matt Eaton
 Michael, and .  However, it didn't work.  Whole 
thing still takes about 1 second.

On the other hand, I realized I'm an idiot and that the reason it was running so fast 
when I handled temporary tables myself is that I was using mysqlcc, which truncated 
the first table to 1000 rows rather than 475,000, which--as one would imagine--sped 
things up considerably.  

However, I'm still looking for a way to make this fast.  This is an integral part of 
my application, it'd be a big load off my mind ( my processesor) if I could get it 
under half a second on my box.  I've made the changes Michael suggested, so I was 
wondering if anyone had suggestions on how to optimize this further.  Below please 
find the query in question, a little background, the create statements and the output 
of explain:

SELECT T2.guid, sum(T2.d+T1.d) AS theSum 
FROM T1, T2 
WHERE T1.qid=T2.qid 
GROUP BY T2.guid;

(I grouped by the wrong T last time, sorry).

T1 contains one user, and their answers to various questions, so guid actually has 
only 1 value in this table, and qid has about 65, for a total of 65 rows.  T2 contains 
about 15,000 users, so guid has 15,000 different values and qid has 34 possible 
values, and the total cardinality comes out to around 475,000.

The Create Table statements look like:
CREATE TABLE `T1` (
  `guid` smallint(5) unsigned NOT NULL default '0',
  `qid` smallint(5) unsigned NOT NULL default '0',
  `a` tinyint(2) NOT NULL default '-2',
  `d` tinyint(2) NOT NULL default '-2',
  UNIQUE KEY `IX_T1_qid_guid` (`qid`,`guid`)
) TYPE=MyISAM

CREATE TABLE `T2` (
  `guid` mediumint(8) unsigned NOT NULL default '0',
  `qid` tinyint(3) unsigned NOT NULL default '0',
  `a` tinyint(4) NOT NULL default '0',
  `d` decimal(1,0) unsigned NOT NULL default '0',
  UNIQUE KEY `IX_T2_qid_guid` (`qid`,`guid`)
) TYPE=MyISAM

And the explain is:
+---+--+++-++--+-+
| table | type | possible_keys  | key| key_len | ref| rows | Extra 
  |
+---+--+++-++--+-+
| T1| ALL  | IX_T1_qid_guid | NULL   |NULL | NULL   |   65 | Using 
temporary; Using filesort |
| T2| ref  | IX_T2_qid_guid | IX_T2_qid_guid |   1 | T1.qid | 4979 | Using 
where |
+---+--+++-++--+-+

Thanks so much!
-Matt

-Original Message-
From:   [mailto:[EMAIL PROTECTED] 
Sent: Saturday, August 14, 2004 3:46 AM
To: Matt Eaton
Subject: Re: GROUP BY optimization headscratcher

 Matt

ME CREATE TABLE `T1` (
ME   `guid` smallint(5) unsigned NOT NULL default '0',
ME   `qid` smallint(5) unsigned NOT NULL default '0',
ME   `a` tinyint(2) NOT NULL default '-2',
ME   `d` tinyint(2) NOT NULL default '-2',
ME   KEY `IX_FW_qid` (`qid`),
ME   KEY `IX_FW_d` (`d`)
ME ) TYPE=HEAP 
 
ME CREATE TABLE `T2` (
ME   `guid` mediumint(8) unsigned NOT NULL default '0',
ME   `qid` tinyint(3) unsigned NOT NULL default '0',
ME   `a` tinyint(4) NOT NULL default '0',
ME   `d` decimal(1,0) unsigned NOT NULL default '0',
ME   PRIMARY KEY  (`guid`,`qid`),
ME   KEY `IX_s23aw_d` (`d`),
ME   KEY `IX_s23aw_qid` (`qid`)
ME ) TYPE=HEAP
 
ME SELECT T1.guid, sum(T1.d + T2.d) as theSum
ME FROM T1, T2
ME WHERE T1.qid=T2.qid
ME GROUP BY T1.guid

make key in T1:

 KEY `` (qid,guid )

 and change table type to MyIsam for both table.
  

Michael Monashev
http://softsearch.ru/



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



GROUP BY optimization headscratcher

2004-08-13 Thread Matt Eaton
Hi all.  Got a weird one.  Mysql 4.0.20.  Let's say for the sake of
argument that I've got two tables, T1  T2.  Here are the two create
statements:
 
CREATE TABLE `T1` (
  `guid` smallint(5) unsigned NOT NULL default '0',
  `qid` smallint(5) unsigned NOT NULL default '0',
  `a` tinyint(2) NOT NULL default '-2',
  `d` tinyint(2) NOT NULL default '-2',
  KEY `IX_FW_qid` (`qid`),
  KEY `IX_FW_d` (`d`)
) TYPE=HEAP 
 
CREATE TABLE `T2` (
  `guid` mediumint(8) unsigned NOT NULL default '0',
  `qid` tinyint(3) unsigned NOT NULL default '0',
  `a` tinyint(4) NOT NULL default '0',
  `d` decimal(1,0) unsigned NOT NULL default '0',
  PRIMARY KEY  (`guid`,`qid`),
  KEY `IX_s23aw_d` (`d`),
  KEY `IX_s23aw_qid` (`qid`)
) TYPE=HEAP
 
So, in T1 there are as many records as there are qids (around 150)
(there's only one user in T1).  In T2 there are as many records as there
are qids * user id's = (around 497,964).
 
The weirdness comes when I try to join them and do a group by at the
same time.  The following:
 
SELECT T1.guid, sum(T1.d + T2.d) as theSum
FROM T1, T2
WHERE T1.qid=T2.qid
GROUP BY T1.guid
 
takes 1 second to run.  This seems absurdly long.  Explain shows that
everything seems fine (although it shows that T2 has a key length of 2,
which is weird, and I don't really understand), and if I do the same
query without the group by or the sum, it goes in 0.01 seconds.  So, I
ran the same query without the group by and the sum and stored the
result set in a temporary table, and then did a group by on guid for
that temporary table, and that ran 0.01 seconds. so I've got a solution
to my problem already, even though it's a two-query solution.  However,
I'm really curious as to why MySQL takes so long on my original query.
It seems like it's creating a temporary table in memory and doing the
group by on that, which is exactly what I was doing the second time
around. and I find it hard to believe that I'm that much smarter than
the MySQL preprocessor.  So, if anyone has any thoughts on this strange
disparity in time, I'd be interested to hear them!  Thanks a lot!
 
-Matt


RE: Select help

2004-07-01 Thread Matt Eaton
Hey Rob,
You're looking for a group by to allow mysql to aggregate over the IP's:

SELECT ip, count(*) FROM iptable GROUP BY ip ORDER BY ip DESC limit 10;

-Matt


-Original Message-
From: rmck [mailto:[EMAIL PROTECTED] 
Sent: Thursday, July 01, 2004 1:03 PM
To: [EMAIL PROTECTED]
Subject: Select help

Hi,

I have a table with ip,port and I want to see the top ten Ip's with the
most entries? 
Ip's can be in db many times...

Not the first distinct 10... Im stuck...

I have tried:
mysql select DISTINCT ip from iptable limit 10;

+---+
| ip   |
+---+
| 0.0.0.0   |
| 10.0.1.42 |
| 10.0.1.8  |
| 10.1.1.1  |
| 10.10.10.1|
| 10.115.94.193 |
| 10.115.94.195 |
| 10.115.94.40  |
| 10.122.1.1|
| 10.20.7.184   |
+---+
10 rows in set (0.04 sec)
 
mysql 

But doesn't that just give the first 10 DISTINCT ip's?? 

rob




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:
http://lists.mysql.com/[EMAIL PROTECTED]


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: Select help

2004-07-01 Thread Matt Eaton
Woops!  Forget I said that, you wanted to order by the most occurrences.
Sorry.

SELECT ip, count(*) FROM iptable GROUP BY ip ORDER BY 2 DESC limit 10;

Heh... I should learn to read one of these days...

-Matt


-Original Message-
From: rmck [mailto:[EMAIL PROTECTED] 
Sent: Thursday, July 01, 2004 1:03 PM
To: [EMAIL PROTECTED]
Subject: Select help

Hi,

I have a table with ip,port and I want to see the top ten Ip's with the
most entries? 
Ip's can be in db many times...

Not the first distinct 10... Im stuck...

I have tried:
mysql select DISTINCT ip from iptable limit 10;

+---+
| ip   |
+---+
| 0.0.0.0   |
| 10.0.1.42 |
| 10.0.1.8  |
| 10.1.1.1  |
| 10.10.10.1|
| 10.115.94.193 |
| 10.115.94.195 |
| 10.115.94.40  |
| 10.122.1.1|
| 10.20.7.184   |
+---+
10 rows in set (0.04 sec)
 
mysql 

But doesn't that just give the first 10 DISTINCT ip's?? 

rob




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:
http://lists.mysql.com/[EMAIL PROTECTED]


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



C API -- huge result sets slowin me down

2004-06-28 Thread Matt Eaton
Hi all,

I was hoping this was the right place for a question about the C API.
I've been grabbing result sets from tables in the C API for a few years
now, but I'm starting to work with result sets that are big enough to
bog me down.  Of course, the result sets aren't insanely big, so I was
wondering why it was taking so long for me to suck them in to C,
especially when I can run the same query from the command line using the
binaries and they can cache it to a file on the hard disk pretty much
instantly.  So, basically, I was just hoping that I've been doing
something wrong, or at least that there was something I could do better,
to make my database communication as fast as the mysql command line
tools.  I've checked out their source and nothing obvious jumps out at
me.  Here's a non-functional sample of my code:

int main(int argc, char *argv[] ) {
int uid;
int sid;
char sqlBuff[4000];
int err = 0;
int i;
// Setup the database communications space:
MYSQL dbase;
MYSQL_RES *result;
MYSQL_ROW row;

float **genAttrib;

//... snip ...


// Connect to the database:
if (mysql_init(dbase) == NULL) err = 1;
else {


if(mysql_real_connect(dbase,localhost,login,pass,test,0,NULL,CL
IENT_FOUND_ROWS) == NULL) {
err = 1;
fprintf(stderr, Failed to connect to database:
Error: %s\n,
mysql_error(dbase));
}
}

// If the connection couldn't be established:
if(err) {
printf(db connection failed!\n);
exit(1);
}


//... snip ...

// This query could have as many as a million rows returned, but
the query itself runs quite fast.  It seems to just be
// sucking it into C that can take up to four seconds on our
dual Xeon server.
sprintf(sqlBuff,SELECT A.* FROM `attribs` as A, login AS L
WHERE A.guid=L.guid AND L.isActive=1 AND L.sid=%d AND
A.guid!=%d,sid,uid);
if (mysql_real_query(dbase,sqlBuff,strlen(sqlBuff))) {
printf(Pool Attributes Select Failed... dumbass\n);
fprintf(stderr, Error: %s\n,
mysql_error(dbase));
exit(1);
}

result = mysql_store_result(dbase);
numRows=mysql_num_rows(result);
for (i=0;inumRows;i++) {
row = mysql_fetch_row(result);
tempq=atoi(row[1]);
tempP=atoi(row[0]);
genAttrib[tempP][tempq]=atoi(row[2]);
}

return 0;
}

So, if someone sees something that I could change to speed things up, or
I should direct this question elsewhere... thanks for your help and
thanks for reading this far!

Thanks again,
Matt

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]