Strange result of Group by on real values
Evening, Could someone explain the result of the Query 2? Query 2 does a GROUP BY avgOctets, but still there are duplicate avgOctets in my result set! Boxplot(octets,"AVG",0) is a UDF which returns REAL values. In this case, the real values have no decimals. The only reasonable explanation I could think of is that the REAL value returned by boxplot(...) is an estimation and therefore should be casted to an INTEGER before comparison by GROUP BY is possible? Inner Query: SELECT boxplot(octets,"AVG",0) AS avgOctets FROM joris_filter GROUP BY ipv4_src, ipv4_dst Result: avgOctets 46 46 46 46 46 46 46 40 46 46 46 144 48 48 Query 2: (With "inner query" as inner query): SELECT avgOctets, COUNT(*) AS frequency FROM ( SELECT boxplot(octets,"AVG",0) AS avgOctets FROM joris_filter GROUP BY ipv4_src, ipv4_dst ) avarages GROUP BY avgOctets ORDER BY avgOctets; Result: avgOctetsfrequency 40 350026 41 1 41 1 41 6 41 2 41 3 41 1 42 1 ... -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query optimization
-What I'am trying to do: Bit hard to explain. I've got a table consisting of ip addresses (ipv4_src), destination addresses (ipv4_dst), and port numbers (port_dst) and some other irrelevant columns. Ultimately my goal is to find a linear order in a subset of ports. For example, host A connects to B on port 1, 2, 3, 4,...20, I would like to find this linear relation. To achieve this, I have written some mathematical formula's. Unfortunately there is one downside to my formula: It can be fooled by adding some extreme values. It cannot find a linear relation in this list: 1, 2, 3, 4,...20, 45000. Although there are 20 numbers lined up, the 45000 number ruïns the result of the algorithm. So the query I've submitted, is ment to remove extreme values. The methods boxplot(pF.port_dst,"LOW") and boxplot(pF.port_dst,"HIGH") calculate in linear time the allowed range of numbers. Extreme values won't be included in this range. So in the example, the range would be [1,20] therby omitting the value 45000. Finally I would like to filter my table with port numbers for every tuple and remove all the numbers not fitting in the range. -In human readable pseudo code this is the query: SELECT source,dest,port,octets FROM ( SELECT source,dest,boxplot(port,"LOW") AS low,boxplot(port,"HIGH") AS high FROM --Calculate the LOW and HIGH values for each source,dest pair. ( SELECT source,dest,port,octets FROM... GROUP BY source,dest,port --This removes the duplicate entries. ) pF GROUP BY source,dest ) boxplot ( SELECT source,dest,port,octets FROM... GROUP BY source,dest,port --This removes the duplicate entries (again!). ) filter WHERE filter.source=boxplot.source AND filter.dest=boxplot.dest AND filter.port>=boxplot.LOW AND filter.port<=boxplot.HIGH --Relate the tables 'boxplot' and 'filter' to eachother AND select only the tuples where port is in the range [LOW,HIGH] from the filter table. -Here is the original query I would like to optimize again: SELECT filter.ipv4_src, filter.ipv4_dst, filter.port_dst, filter.octets FROM ( SELECT pF.ipv4_src, pF.ipv4_dst, boxplot(pF.port_dst,"LOW") AS low,boxplot(pF.port_dst,"HIGH") AS high FROM ( SELECT ipv4_src, ipv4_dst, port_dst, octets FROM ... GROUP BY ipv4_src, ipv4_dst, port_dst ) pF GROUP BY pF.ipv4_src, pF.ipv4_dst ) boxplot, ( SELECT ipv4_src, ipv4_dst, port_dst, octets FROM ... GROUP BY ipv4_src, ipv4_dst, port_dst )filter WHERE filter.ipv4_src=boxplot.ipv4_src AND filter.ipv4_dst=boxplot.ipv4_dst AND filter.port_dst>=boxplot.low AND filter.port_dst<=boxplot.low I'll try the 'EXPLAIN' query tonight, but therefor I've got to create a small database first to speed up the results. Thnx in advance for your help. On Jan 22, 2008 8:15 AM, Sebastian Mendel <[EMAIL PROTECTED]> wrote: > Joris Kinable schrieb: > > > Optimize query > > > > I've got one query, which I would like to improve a lot since it takes > > very long (>24 hours) to execute. Here is the idea: > > 1. Take the table (other rows in this > > table are not mentioned for clearity) and remove all duplicate > > tuple's. This is done by subquery 'filter'. > > 2. The same query is performed by the boxplot query, but this time an > > aditional group by command is executed, therby calculating a User > > Defined Function boxplot(row,type) which returns a double value. > > 3. Finally the results of the query in step 2 are used to select a > > subset of results from the 'filter' table. > > 4. As you can see, the subquery 'pF' used in step 2 is identical to > > the query 'filter'. It's an extreme waste to calculate the same table > > twice. I've tried to create a temporary table from filter, but > > unfortunately Mysql doesn't allow you to access a temporary table > > twice in the same query. I prefer a 1 query answer, instead of > > creating views, or temporary tables. > > > > Is there a way to improve this query, therby improving the execution time? > > > > Query: > > > > SELECT filter.ipv4_src, filter.ipv4_dst, filter.port_dst, filter.octets FROM > > ( > > SELECT pF.ipv4_src, pF.ipv4_dst, boxplot(pF.port_dst,"LOW") AS > > low,boxplot(pF.port_dst,"HIGH") AS high FROM > > ( > > SELECT ipv4_src, ipv4_dst, port_dst, octets FROM mytable WHERE > > prot=6 GROUP BY ipv4_src, ipv4_dst, port_dst ORDER BY > > ipv4_src,ipv4_dst,port_dst ASC > > ) pF GROUP BY pF.ipv4_src, pF.ipv4_dst HAVING > > COUNT(filter.port_dst)>10 > > ) boxplot, > > ( > > SELEC
Query optimization
Optimize query I've got one query, which I would like to improve a lot since it takes very long (>24 hours) to execute. Here is the idea: 1. Take the table (other rows in this table are not mentioned for clearity) and remove all duplicate tuple's. This is done by subquery 'filter'. 2. The same query is performed by the boxplot query, but this time an aditional group by command is executed, therby calculating a User Defined Function boxplot(row,type) which returns a double value. 3. Finally the results of the query in step 2 are used to select a subset of results from the 'filter' table. 4. As you can see, the subquery 'pF' used in step 2 is identical to the query 'filter'. It's an extreme waste to calculate the same table twice. I've tried to create a temporary table from filter, but unfortunately Mysql doesn't allow you to access a temporary table twice in the same query. I prefer a 1 query answer, instead of creating views, or temporary tables. Is there a way to improve this query, therby improving the execution time? Query: SELECT filter.ipv4_src, filter.ipv4_dst, filter.port_dst, filter.octets FROM ( SELECT pF.ipv4_src, pF.ipv4_dst, boxplot(pF.port_dst,"LOW") AS low,boxplot(pF.port_dst,"HIGH") AS high FROM ( SELECT ipv4_src, ipv4_dst, port_dst, octets FROM mytable WHERE prot=6 GROUP BY ipv4_src, ipv4_dst, port_dst ORDER BY ipv4_src,ipv4_dst,port_dst ASC ) pF GROUP BY pF.ipv4_src, pF.ipv4_dst HAVING COUNT(filter.port_dst)>10 ) boxplot, ( SELECT ipv4_src, ipv4_dst, port_dst, octets FROM mytable WHERE prot=6 GROUP BY ipv4_src, ipv4_dst, port_dst ORDER BY ipv4_src,ipv4_dst,port_dst ASC ) filter WHERE filter.ipv4_src=boxplot.ipv4_src AND filter.ipv4_dst=boxplot.ipv4_dst AND filter.port_dst>=boxplot.low AND filter.port_dst<=boxplot.low -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
creating indexes with myisamchk
Good evening, I've got to create a very large table: 180GB of data has to be stored. In order to to this I'm using the following steps: 1. Create database structure including keys. 2. Disable keys: ALTER TABLE ut_netflow_4 DISABLE KEYS 3. Load data into the database. 4. Generate all index keys: myisamchk --sort_buffer_size=2700M --key_buffer_size=2700M -r --tmpdir=/tmpdir --quick /ut_netflow_4.MYI 5. flush privileges: FLUSH TABLE ut_netflow_4 6. restart server. Everything up to step 3 goes great. Step 4 and 5 on the other hand do not seem to work. After running the command at step 4, I've got a huge ut_netflow_4.MYI index file, but none of the indexes seem to be enabled. If I use phpmysql to view the table, it says that the cardinalities of the indexes are 0. Instead of the command at step 4, I've also tried: -myisamchk --sort_buffer_size=2700M --key_buffer_size=2700M --sort-recover /ut_netflow_4.MYI -myisamchk --sort_buffer_size=2700M --key_buffer_size=2700M --parallel-recover /ut_netflow_4.MYI -myisamchk --sort_buffer_size=2700M --key_buffer_size=2700M -r /ut_netflow_4.MYI None of these seem to work either. The index file becomes larger, but phpmyadmin keeps indicating that there are no indexes. What am I doing wrong? How can I create and enable the indexes after loading the data into the table? Thnx in advance, Joris -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]