from:"Larry Martell"

Re: Mysql devel

2016-02-23 Thread Larry Martell

On Tue, Feb 23, 2016 at 7:07 AM, Larry Martell  wrote:
> On Tue, Feb 23, 2016 at 6:16 AM, Reindl Harald  wrote:
>>
>>
>> Am 23.02.2016 um 04:47 schrieb Larry Martell:
>>>
>>> I am trying to install the mysql devel libs (so I can build MySQLdb).
>>> This is on RHEL 6. It's failing as shown below. Can anyone help with
>>> this?
>>
>>
>> you are mixing remi and other repos
>
> I did this:
>
> # yum shell --enablerepo=remi
>> remove php-mysql
>> install php-mysqlnd
>> run
>
> Followed by this:
>
> yum install mysql-devel
> /usr/local/bin/pip2.7 install MySQL-python
>
> Now the MySQL server is down and will not come up. It fails with this:
>
> Fatal error: mysql.user table is damaged. Please run mysql_upgrade
>
> It was running 5.1 before and now it's running 5.7.
>
> But how can I run mysql_upgrade if I can't start the server?

Actually it was running 5.5 before:

Server version: 5.5.44 MySQL Community Server (GPL) by Remi

>
>>
>>> Package mysql-community-devel.x86_64 0:5.7.11-1.el6 will be installed
>>> for package: mysql-server-5.5.44-1.el6.remi.x86_64
>>
>>
>>

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: Mysql devel

2016-02-23 Thread Larry Martell

On Tue, Feb 23, 2016 at 6:16 AM, Reindl Harald  wrote:
>
>
> Am 23.02.2016 um 04:47 schrieb Larry Martell:
>>
>> I am trying to install the mysql devel libs (so I can build MySQLdb).
>> This is on RHEL 6. It's failing as shown below. Can anyone help with
>> this?
>
>
> you are mixing remi and other repos

I did this:

# yum shell --enablerepo=remi
> remove php-mysql
> install php-mysqlnd
> run

Followed by this:

yum install mysql-devel
/usr/local/bin/pip2.7 install MySQL-python

Now the MySQL server is down and will not come up. It fails with this:

Fatal error: mysql.user table is damaged. Please run mysql_upgrade

It was running 5.1 before and now it's running 5.7.

But how can I run mysql_upgrade if I can't start the server?

>
>> Package mysql-community-devel.x86_64 0:5.7.11-1.el6 will be installed
>> for package: mysql-server-5.5.44-1.el6.remi.x86_64
>
>
>

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Mysql devel

2016-02-22 Thread Larry Martell

I am trying to install the mysql devel libs (so I can build MySQLdb).
This is on RHEL 6. It's failing as shown below. Can anyone help with
this?

# yum install mysql-devel
Loaded plugins: dellsysid, product-id, refresh-packagekit, security,
subscription-manager
Setting up Install Process
Package mysql-devel is obsoleted by mysql-community-devel, trying to
install mysql-community-devel-5.7.11-1.el6.x86_64 instead
Resolving Dependencies
--> Running transaction check
---> Package mysql-community-devel.x86_64 0:5.7.11-1.el6 will be installed
--> Processing Dependency: mysql-community-libs(x86-64) = 5.7.11-1.el6
for package: mysql-community-devel-5.7.11-1.el6.x86_64
--> Processing Dependency: libmysqlclient.so.20()(64bit) for package:
mysql-community-devel-5.7.11-1.el6.x86_64
--> Running transaction check
---> Package mysql-community-libs.x86_64 0:5.7.11-1.el6 will be obsoleting
--> Processing Dependency: mysql-community-common(x86-64) =
5.7.11-1.el6 for package: mysql-community-libs-5.7.11-1.el6.x86_64
---> Package mysql-libs.x86_64 0:5.5.44-1.el6.remi will be obsoleted
--> Processing Dependency: libmysqlclient.so.18()(64bit) for package:
php-mysql-5.4.43-1.el6.remi.x86_64
--> Processing Dependency:
libmysqlclient.so.18(libmysqlclient_16)(64bit) for package:
php-mysql-5.4.43-1.el6.remi.x86_64
--> Processing Dependency:
libmysqlclient.so.18(libmysqlclient_18)(64bit) for package:
php-mysql-5.4.43-1.el6.remi.x86_64
--> Processing Dependency: real-mysql-libs(x86-64) = 5.5.44-1.el6.remi
for package: mysql-server-5.5.44-1.el6.remi.x86_64
--> Processing Dependency: real-mysql-libs(x86-64) = 5.5.44-1.el6.remi
for package: mysql-5.5.44-1.el6.remi.x86_64
--> Running transaction check
---> Package mysql.x86_64 0:5.5.44-1.el6.remi will be obsoleted
---> Package mysql-community-client.x86_64 0:5.7.11-1.el6 will be obsoleting
---> Package mysql-community-common.x86_64 0:5.7.11-1.el6 will be installed
---> Package mysql-community-server.x86_64 0:5.7.11-1.el6 will be obsoleting
---> Package mysql-libs.x86_64 0:5.5.44-1.el6.remi will be obsoleted
--> Processing Dependency: libmysqlclient.so.18()(64bit) for package:
php-mysql-5.4.43-1.el6.remi.x86_64
--> Processing Dependency:
libmysqlclient.so.18(libmysqlclient_16)(64bit) for package:
php-mysql-5.4.43-1.el6.remi.x86_64
--> Processing Dependency:
libmysqlclient.so.18(libmysqlclient_18)(64bit) for package:
php-mysql-5.4.43-1.el6.remi.x86_64
---> Package mysql-server.x86_64 0:5.5.44-1.el6.remi will be obsoleted
--> Finished Dependency Resolution
Error: Package: php-mysql-5.4.43-1.el6.remi.x86_64 (@remi)
   Requires: libmysqlclient.so.18(libmysqlclient_16)(64bit)
   Removing: mysql-libs-5.5.44-1.el6.remi.x86_64 (@remi)
   libmysqlclient.so.18(libmysqlclient_16)(64bit)
   Obsoleted By: mysql-community-libs-5.7.11-1.el6.x86_64
(mysql57-community)
   Not found
Error: Package: php-mysql-5.4.43-1.el6.remi.x86_64 (@remi)
   Requires: libmysqlclient.so.18()(64bit)
   Removing: mysql-libs-5.5.44-1.el6.remi.x86_64 (@remi)
   libmysqlclient.so.18()(64bit)
   Obsoleted By: mysql-community-libs-5.7.11-1.el6.x86_64
(mysql57-community)
   Not found
Error: Package: php-mysql-5.4.43-1.el6.remi.x86_64 (@remi)
   Requires: libmysqlclient.so.18(libmysqlclient_18)(64bit)
   Removing: mysql-libs-5.5.44-1.el6.remi.x86_64 (@remi)
   libmysqlclient.so.18(libmysqlclient_18)(64bit)
   Obsoleted By: mysql-community-libs-5.7.11-1.el6.x86_64
(mysql57-community)
   Not found
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: using alias in where clause

2016-01-29 Thread Larry Martell

On Fri, Jan 29, 2016 at 11:15 AM, shawn l.green
 wrote:
>
>
> On 1/28/2016 6:30 PM, Larry Martell wrote:
>>
>> On Thu, Jan 28, 2016 at 5:45 PM, shawn l.green 
>> wrote:
>>>
>>>
>>>
>>> On 1/28/2016 3:32 PM, Larry Martell wrote:
>>>>
>>>>
>>>> On Thu, Jan 28, 2016 at 2:13 PM, shawn l.green
>>>> 
>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 1/28/2016 1:14 PM, Larry Martell wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 26, 2016 at 8:40 AM, Hal.sz S.ndor  wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2016/01/25 19:16 ... Larry Martell:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> SELECT IFNULL(f_tag_bottom,
>>>>>>>> IFNULL(f_tag_bottom_major_axis,
>>>>>>>>  IFNULL(f_tag_bottom_minor_axis,
>>>>>>>>   IFNULL(f_tag_ch_x_bottom,
>>>>>>>>IFNULL(f_tag_ch_y_bottom, NULL) as ftag,
>>>>>>>>STDDEV(ch_x_top)
>>>>>>>> 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Of course, this isn't your real problem, but you could use COALESCE
>>>>>>> instead
>>>>>>> of all those IFNULLs (and you don't need the last one):
>>>>>>>SELECT COALESCE(f_tag_bottom, f_tag_bottom_major_axis,
>>>>>>> f_tag_bottom_minor_axis, f_tag_ch_x_bottom, f_tag_ch_y_bottom) as
>>>>>>> ftag,
>>>>>>>STDDEV(ch_x_top)
>>>>>>>
>>>>>>>
>>>>>>> As Johnny Withers points out, you may repeat the expression in the
>>>>>>> WHERE-clause:
>>>>>>>WHERE COALESCE(f_tag_bottom, f_tag_bottom_major_axis,
>>>>>>> f_tag_bottom_minor_axis, f_tag_ch_x_bottom, f_tag_ch_y_bottom) =
>>>>>>> 'E-CD7'
>>>>>>> If really only one of those is not NULL, it is equivalent to this:
>>>>>>>'E-CD7' IN (f_tag_bottom, f_tag_bottom_major_axis,
>>>>>>> f_tag_bottom_minor_axis, f_tag_ch_x_bottom, f_tag_ch_y_bottom)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Many thanks to Hal.sz and Johnny - I had forgotten about coalesce and
>>>>>> I didn't know I could use that in a where clause. This worked great
>>>>>> for the requirement I had, but of course, once that was implemented my
>>>>>> client changed the requirements. Now they want to know which of the 5
>>>>>> f_tag_* columns was matched. Not sure how I'll do that. Probably need
>>>>>> another query.
>>>>>>
>>>>> One option to consider is to add another column to the query with a
>>>>> CASE
>>>>> similar to this...
>>>>>
>>>>> SELECT
>>>>> , ... original fields ...
>>>>> , CASE
>>>>> WHEN f_tag_bottom THEN 'f_tag_bottom'
>>>>> WHEN f_tag_bottom_major_axis THEN 'f_tag_bottom_major_axis'
>>>>> ... repeat for the rest of the fields to test ...
>>>>> ELSE 'none'
>>>>> END as match_flag
>>>>> FROM ...
>>>>>
>>>>> Technically, the term in the WHERE clause should prevent a 'none'
>>>>> result
>>>>> but
>>>>> I put it there to help future-proof the code.
>>>>
>>>>
>>>>
>>>> Won't that find the first one of the f_tags that is not null, but not
>>>> necessarily the one that was matched by the where clause?
>>>>
>>>
>>> I slightly cheated in my example.
>>>
>>> My CASE...END was listing terms in the same order as the COALESCE()
>>> function
>>> you were using in the WHERE clause. The cheat was that only a non-null
>>> value
>>> could be TRUE. To be more accurate, I should have used
>>> ... WHEN f_tag_bottom IS NOT NULL THEN ...
>>> That way you end up with a true boolean check within the CASE decision
>>> tree.
>>>
>>> As the COALESCE() is testing its terms in the same sequence as the
>>> CASE...END, there should be no difference between the two checks.  But,
>>> that
>>> also adds to the maintenance cost of this query. If you should change the
>>> order of the f_tag checks in the COALESCE() function, you would need to
>>> change the CASE...END to the same sequence.
>>
>>
>> Yes, I see that, but does the case only look at the filtered rows? For
>> example, lets say there's this data:
>>
>> row 1: f_tag_bottom = "ABC"
>> row 2: f_tag_bottom_major_axis = "XYZ"
>>
>> and my where clause has this:
>>
>> WHERE COALESCE(f_tag_bottom, f_tag_bottom_major_axis,
>> f_tag_bottom_minor_axis, f_tag_ch_x_bottom, f_tag_ch_y_bottom) = 'XYZ'
>>
>> won't the CASE pick up row 1? Whereas I want it to pick up row 2.
>>
>
> Yes it would.
>
> Just modify the check in the case from "IS NOT NULL" to "='XYZ'" to make
> both functions (CASE and COALESCE) find the same field value in the same row
> at the same time.

Thanks very much Shawn.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: using alias in where clause

2016-01-28 Thread Larry Martell

On Thu, Jan 28, 2016 at 5:45 PM, shawn l.green  wrote:
>
>
> On 1/28/2016 3:32 PM, Larry Martell wrote:
>>
>> On Thu, Jan 28, 2016 at 2:13 PM, shawn l.green 
>> wrote:
>>>
>>>
>>>
>>> On 1/28/2016 1:14 PM, Larry Martell wrote:
>>>>
>>>>
>>>> On Tue, Jan 26, 2016 at 8:40 AM, Hal.sz S.ndor  wrote:
>>>>>
>>>>>
>>>>> 2016/01/25 19:16 ... Larry Martell:
>>>>>>
>>>>>>
>>>>>>
>>>>>> SELECT IFNULL(f_tag_bottom,
>>>>>>IFNULL(f_tag_bottom_major_axis,
>>>>>> IFNULL(f_tag_bottom_minor_axis,
>>>>>>  IFNULL(f_tag_ch_x_bottom,
>>>>>>   IFNULL(f_tag_ch_y_bottom, NULL) as ftag,
>>>>>>   STDDEV(ch_x_top)
>>>>>> 
>>>>>
>>>>>
>>>>>
>>>>> Of course, this isn't your real problem, but you could use COALESCE
>>>>> instead
>>>>> of all those IFNULLs (and you don't need the last one):
>>>>>   SELECT COALESCE(f_tag_bottom, f_tag_bottom_major_axis,
>>>>> f_tag_bottom_minor_axis, f_tag_ch_x_bottom, f_tag_ch_y_bottom) as ftag,
>>>>>   STDDEV(ch_x_top)
>>>>>   
>>>>>
>>>>> As Johnny Withers points out, you may repeat the expression in the
>>>>> WHERE-clause:
>>>>>   WHERE COALESCE(f_tag_bottom, f_tag_bottom_major_axis,
>>>>> f_tag_bottom_minor_axis, f_tag_ch_x_bottom, f_tag_ch_y_bottom) =
>>>>> 'E-CD7'
>>>>> If really only one of those is not NULL, it is equivalent to this:
>>>>>   'E-CD7' IN (f_tag_bottom, f_tag_bottom_major_axis,
>>>>> f_tag_bottom_minor_axis, f_tag_ch_x_bottom, f_tag_ch_y_bottom)
>>>>
>>>>
>>>>
>>>> Many thanks to Hal.sz and Johnny - I had forgotten about coalesce and
>>>> I didn't know I could use that in a where clause. This worked great
>>>> for the requirement I had, but of course, once that was implemented my
>>>> client changed the requirements. Now they want to know which of the 5
>>>> f_tag_* columns was matched. Not sure how I'll do that. Probably need
>>>> another query.
>>>>
>>> One option to consider is to add another column to the query with a CASE
>>> similar to this...
>>>
>>> SELECT
>>> , ... original fields ...
>>> , CASE
>>>WHEN f_tag_bottom THEN 'f_tag_bottom'
>>>WHEN f_tag_bottom_major_axis THEN 'f_tag_bottom_major_axis'
>>>... repeat for the rest of the fields to test ...
>>>ELSE 'none'
>>>END as match_flag
>>> FROM ...
>>>
>>> Technically, the term in the WHERE clause should prevent a 'none' result
>>> but
>>> I put it there to help future-proof the code.
>>
>>
>> Won't that find the first one of the f_tags that is not null, but not
>> necessarily the one that was matched by the where clause?
>>
>
> I slightly cheated in my example.
>
> My CASE...END was listing terms in the same order as the COALESCE() function
> you were using in the WHERE clause. The cheat was that only a non-null value
> could be TRUE. To be more accurate, I should have used
>... WHEN f_tag_bottom IS NOT NULL THEN ...
> That way you end up with a true boolean check within the CASE decision tree.
>
> As the COALESCE() is testing its terms in the same sequence as the
> CASE...END, there should be no difference between the two checks.  But, that
> also adds to the maintenance cost of this query. If you should change the
> order of the f_tag checks in the COALESCE() function, you would need to
> change the CASE...END to the same sequence.

Yes, I see that, but does the case only look at the filtered rows? For
example, lets say there's this data:

row 1: f_tag_bottom = "ABC"
row 2: f_tag_bottom_major_axis = "XYZ"

and my where clause has this:

WHERE COALESCE(f_tag_bottom, f_tag_bottom_major_axis,
f_tag_bottom_minor_axis, f_tag_ch_x_bottom, f_tag_ch_y_bottom) = 'XYZ'

won't the CASE pick up row 1? Whereas I want it to pick up row 2.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: using alias in where clause

2016-01-28 Thread Larry Martell

On Thu, Jan 28, 2016 at 2:13 PM, shawn l.green  wrote:
>
>
> On 1/28/2016 1:14 PM, Larry Martell wrote:
>>
>> On Tue, Jan 26, 2016 at 8:40 AM, Hal.sz S.ndor  wrote:
>>>
>>> 2016/01/25 19:16 ... Larry Martell:
>>>>
>>>>
>>>> SELECT IFNULL(f_tag_bottom,
>>>>   IFNULL(f_tag_bottom_major_axis,
>>>>IFNULL(f_tag_bottom_minor_axis,
>>>> IFNULL(f_tag_ch_x_bottom,
>>>>  IFNULL(f_tag_ch_y_bottom, NULL) as ftag,
>>>>  STDDEV(ch_x_top)
>>>> 
>>>
>>>
>>> Of course, this isn't your real problem, but you could use COALESCE
>>> instead
>>> of all those IFNULLs (and you don't need the last one):
>>>  SELECT COALESCE(f_tag_bottom, f_tag_bottom_major_axis,
>>> f_tag_bottom_minor_axis, f_tag_ch_x_bottom, f_tag_ch_y_bottom) as ftag,
>>>  STDDEV(ch_x_top)
>>>  
>>>
>>> As Johnny Withers points out, you may repeat the expression in the
>>> WHERE-clause:
>>>  WHERE COALESCE(f_tag_bottom, f_tag_bottom_major_axis,
>>> f_tag_bottom_minor_axis, f_tag_ch_x_bottom, f_tag_ch_y_bottom) = 'E-CD7'
>>> If really only one of those is not NULL, it is equivalent to this:
>>>  'E-CD7' IN (f_tag_bottom, f_tag_bottom_major_axis,
>>> f_tag_bottom_minor_axis, f_tag_ch_x_bottom, f_tag_ch_y_bottom)
>>
>>
>> Many thanks to Hal.sz and Johnny - I had forgotten about coalesce and
>> I didn't know I could use that in a where clause. This worked great
>> for the requirement I had, but of course, once that was implemented my
>> client changed the requirements. Now they want to know which of the 5
>> f_tag_* columns was matched. Not sure how I'll do that. Probably need
>> another query.
>>
> One option to consider is to add another column to the query with a CASE
> similar to this...
>
> SELECT
> , ... original fields ...
> , CASE
>   WHEN f_tag_bottom THEN 'f_tag_bottom'
>   WHEN f_tag_bottom_major_axis THEN 'f_tag_bottom_major_axis'
>   ... repeat for the rest of the fields to test ...
>   ELSE 'none'
>   END as match_flag
> FROM ...
>
> Technically, the term in the WHERE clause should prevent a 'none' result but
> I put it there to help future-proof the code.

Won't that find the first one of the f_tags that is not null, but not
necessarily the one that was matched by the where clause?

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: using alias in where clause

2016-01-28 Thread Larry Martell

On Tue, Jan 26, 2016 at 8:40 AM, Hal.sz S.ndor  wrote:
> 2016/01/25 19:16 ... Larry Martell:
>>
>> SELECT IFNULL(f_tag_bottom,
>>  IFNULL(f_tag_bottom_major_axis,
>>   IFNULL(f_tag_bottom_minor_axis,
>>IFNULL(f_tag_ch_x_bottom,
>> IFNULL(f_tag_ch_y_bottom, NULL) as ftag,
>> STDDEV(ch_x_top)
>> 
>
> Of course, this isn't your real problem, but you could use COALESCE instead
> of all those IFNULLs (and you don't need the last one):
> SELECT COALESCE(f_tag_bottom, f_tag_bottom_major_axis,
> f_tag_bottom_minor_axis, f_tag_ch_x_bottom, f_tag_ch_y_bottom) as ftag,
> STDDEV(ch_x_top)
> 
>
> As Johnny Withers points out, you may repeat the expression in the
> WHERE-clause:
> WHERE COALESCE(f_tag_bottom, f_tag_bottom_major_axis,
> f_tag_bottom_minor_axis, f_tag_ch_x_bottom, f_tag_ch_y_bottom) = 'E-CD7'
> If really only one of those is not NULL, it is equivalent to this:
> 'E-CD7' IN (f_tag_bottom, f_tag_bottom_major_axis,
> f_tag_bottom_minor_axis, f_tag_ch_x_bottom, f_tag_ch_y_bottom)

Many thanks to Hal.sz and Johnny - I had forgotten about coalesce and
I didn't know I could use that in a where clause. This worked great
for the requirement I had, but of course, once that was implemented my
client changed the requirements. Now they want to know which of the 5
f_tag_* columns was matched. Not sure how I'll do that. Probably need
another query.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: using alias in where clause

2016-01-25 Thread Larry Martell

On Mon, Jan 25, 2016 at 8:26 PM, Johnny Withers  wrote:
> You should probably turn this into a UNION and put an index on each column:
>
> SELECT f_tag_ch_y_bottom AS ftag FROM data_cst WHERE f_tag_ch_y_bottom =
> 'E-CD7'
> UNION ALL
> SELECT f_tag_ch_x_bottom AS ftag FROM data_cst WHERE f_tag_ch_x_bottom =
> 'E-CD7'
> UNION ALL
> SELECT f_tag_bottom_minor_axis AS ftag FROM data_cst WHERE
> f_tag_bottom_minor_axis = 'E-CD7'
> UNION ALL
> SELECT f_tag_bottom_major_axis AS ftag FROM data_cst WHERE
> f_tag_bottom_major_axis = 'E-CD7'
> UNION ALL
> SELECT f_tag_bottom FROM data_cst AS ftag WHERE f_tag_bottom = 'E-CD7'
> ;

This may work for me, but I need to explore this more tomorrow. I need
the select to control the rows included in the aggregation. For
example, the rows where the f_tag_* col that is used does not =
'E-CD7' should not be included in the aggregation. Also, I grossly
simplified the query for this post. In reality I have 15 items in the
where clause and a having as well.

> Doing this any other way will prevent index usage and a full table scan will
> be required.

Yes, I will be adding indices - right now I am just worried about
getting the query to work. But this is important as it's a big table.
So I do appreciate you mentioning it.

> Is there a possibility of more than one column matching? How would you
> handle that?

I was told only 1 of the 5 will be populated and the other 4 will be
null. But still, I said I have to code for the case where that is not
true. So then I was told to use the first one I find that is not null,
looking in the order I had in my original post.


> On Mon, Jan 25, 2016 at 6:16 PM, Larry Martell 
> wrote:
>>
>> I know I cannot use an alias in a where clause, but I am trying to
>> figure out how to achieve what I need.
>>
>> If I could have an alias in a where clause my sql would look like this:
>>
>> SELECT IFNULL(f_tag_bottom,
>> IFNULL(f_tag_bottom_major_axis,
>>  IFNULL(f_tag_bottom_minor_axis,
>>   IFNULL(f_tag_ch_x_bottom,
>>IFNULL(f_tag_ch_y_bottom, NULL) as ftag,
>>STDDEV(ch_x_top)
>> FROM data_cst
>> WHERE ftag = 'E-CD7'
>> GROUP BY wafer_id, lot_id
>>
>> But I can't use ftag in the where clause. I can't put it in a having
>> clause, as that would exclude the already aggregated rows and I want
>> to filter then before the aggregation. Anyone have any idea how I can
>> do this?
>>
>> --
>> MySQL General Mailing List
>> For list archives: http://lists.mysql.com/mysql
>> To unsubscribe:http://lists.mysql.com/mysql
>>
>
>
>
> --
> -
> Johnny Withers
> 601.209.4985
> joh...@pixelated.net

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: using alias in where clause

2016-01-25 Thread Larry Martell

On Mon, Jan 25, 2016 at 8:01 PM, Rebecca Love  wrote:
> Have you tried using a select case statement for ftag?

How would that help? Unless I'm missing something, I'd still have to
have a reference to the column alias in the where clause.


>> On Jan 25, 2016, at 6:39 PM, Larry Martell  wrote:
>>
>> On Mon, Jan 25, 2016 at 7:27 PM, Reindl Harald  
>> wrote:
>>>
>>>
>>> Am 26.01.2016 um 01:16 schrieb Larry Martell:
>>>>
>>>> I know I cannot use an alias in a where clause, but I am trying to
>>>> figure out how to achieve what I need.
>>>>
>>>> If I could have an alias in a where clause my sql would look like this:
>>>>
>>>> SELECT IFNULL(f_tag_bottom,
>>>> IFNULL(f_tag_bottom_major_axis,
>>>>  IFNULL(f_tag_bottom_minor_axis,
>>>>   IFNULL(f_tag_ch_x_bottom,
>>>>IFNULL(f_tag_ch_y_bottom, NULL) as ftag,
>>>>STDDEV(ch_x_top)
>>>> FROM data_cst
>>>> WHERE ftag = 'E-CD7'
>>>> GROUP BY wafer_id, lot_id
>>>>
>>>> But I can't use ftag in the where clause. I can't put it in a having
>>>> clause, as that would exclude the already aggregated rows and I want
>>>> to filter then before the aggregation. Anyone have any idea how I can
>>>> do this?
>>>
>>>
>>> why would you want to do that?
>>> it would make the where clause not be able to use proper indexes
>>
>> I do not know which of the 5 f_tag_* columns will be populated and I
>> need to filter on the one that is populated.
>>
>> --
>> MySQL General Mailing List
>> For list archives: http://lists.mysql.com/mysql
>> To unsubscribe:http://lists.mysql.com/mysql
>>
>

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: using alias in where clause

2016-01-25 Thread Larry Martell

On Mon, Jan 25, 2016 at 7:27 PM, Reindl Harald  wrote:
>
>
> Am 26.01.2016 um 01:16 schrieb Larry Martell:
>>
>> I know I cannot use an alias in a where clause, but I am trying to
>> figure out how to achieve what I need.
>>
>> If I could have an alias in a where clause my sql would look like this:
>>
>> SELECT IFNULL(f_tag_bottom,
>>  IFNULL(f_tag_bottom_major_axis,
>>   IFNULL(f_tag_bottom_minor_axis,
>>IFNULL(f_tag_ch_x_bottom,
>> IFNULL(f_tag_ch_y_bottom, NULL) as ftag,
>> STDDEV(ch_x_top)
>> FROM data_cst
>> WHERE ftag = 'E-CD7'
>> GROUP BY wafer_id, lot_id
>>
>> But I can't use ftag in the where clause. I can't put it in a having
>> clause, as that would exclude the already aggregated rows and I want
>> to filter then before the aggregation. Anyone have any idea how I can
>> do this?
>
>
> why would you want to do that?
> it would make the where clause not be able to use proper indexes

I do not know which of the 5 f_tag_* columns will be populated and I
need to filter on the one that is populated.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

using alias in where clause

2016-01-25 Thread Larry Martell

I know I cannot use an alias in a where clause, but I am trying to
figure out how to achieve what I need.

If I could have an alias in a where clause my sql would look like this:

SELECT IFNULL(f_tag_bottom,
IFNULL(f_tag_bottom_major_axis,
 IFNULL(f_tag_bottom_minor_axis,
  IFNULL(f_tag_ch_x_bottom,
   IFNULL(f_tag_ch_y_bottom, NULL) as ftag,
   STDDEV(ch_x_top)
FROM data_cst
WHERE ftag = 'E-CD7'
GROUP BY wafer_id, lot_id

But I can't use ftag in the where clause. I can't put it in a having
clause, as that would exclude the already aggregated rows and I want
to filter then before the aggregation. Anyone have any idea how I can
do this?

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: help with query to count rows while excluding certain rows

2016-01-02 Thread Larry Martell

On Fri, Jan 1, 2016 at 9:31 PM, Peter Brawley
 wrote:
> On 1/1/2016 19:24, Larry Martell wrote:
>>
>> On Fri, Jan 1, 2016 at 2:12 PM, Peter Brawley
>>  wrote:
>>>
>>> On 12/31/2015 0:51, Larry Martell wrote:
>>>>
>>>> I need to count the number of rows in a table that are grouped by a
>>>> list of columns, but I also need to exclude rows that have more then
>>>> some count when grouped by a different set of columns. Conceptually,
>>>> this is not hard, but I am having trouble doing this efficiently.
>>>>
>>>> My first counting query would be this:
>>>>
>>>> SELECT count(*)
>>>> FROM cst_rollup
>>>> GROUP BY target_name_id, ep, roiname, recipe_process,
>>>> recipe_product, recipe_layer, f_tag_bottom,
>>>> measname, recipe_id
>>>>
>>>> But from this count I need to subtract the count of rows that have
>>>> more then 50 rows with a different grouping:
>>>>
>>>> SELECT count(*)
>>>> FROM cst_rollup
>>>> GROUP BY target_name_id, ep, wafer_id
>>>> HAVING count(*) >= 50
>>>>
>>>> As you can see, the second query has wafer_id, but the first query does
>>>> not.
>>>>
>>>> Currently I am doing this in python, and it's slow. In my current
>>>> implementation I have one query, and it selects the columns (i.e.
>>>> doesn't just count), and I have added wafer_id:
>>>>
>>>> SELECT target_name_id, ep, roiname, recipe_process,
>>>> recipe_product, recipe_layer, f_tag_bottom,
>>>> measname, recipe_id, wafer_id
>>>> FROM cst_rollup
>>>>
>>>> Then I go through the result set (which can be over 200k rows) and I
>>>> count the number of rows with matching (target_name_id, ep, wafer_id).
>>>> Then I go through the rows again and regroup them without wafer_id,
>>>> but skipping the rows that have more then 50 rows for that row's
>>>> (target_name_id, ep, wafer_id).
>>>>
>>>> Is this clear to everyone what I am trying to do?
>>>
>>>
>>> If I've understand this correctly, the resultset you wish to aggregate on
>>> is
>>> ...
>>>
>>> select target_name_id, ep, wafer_id
>>> from cst_rollup a
>>> left join (   -- exclude rows for which wafer_id count >= 50
>>>select name_id, ep, wafer, count(*) n
>>>from cst_rollup
>>>group by target_name_id, ep, wafer_id
>>>having n >= 50
>>> ) b using ( target_name_id, ep, wafer_id )
>>> where b.target_name is null ;
>>>
>>> If that's so, you could assemble that resultset in a temp table then run
>>> the
>>> desired aggregate query on it, or you could aggregate on it directly as a
>>> subquery.
>>
>> That query gives:
>>
>> ERROR 1137 (HY000): Can't reopen table: 'a'
>
>
> So, it's a temporary table, and you'll need to make that not so.

Yes, cst_rollup is a temp table. The underlying table is millions of
rows (with 300 columns) so for efficiency a subset of the rows and
columns are selected into the temp table based on some user input.
It's just the rows in the temp table that are of interest for the
current report.

I was able to get this working with a second temp table:

CREATE TEMPORARY TABLE rollup_exclude
SELECT target_name_id, ep, wafer_id, count(*) n
FROM cst_rollup
GROUP BY target_name_id, ep, wafer_id
HAVING n >= 50

And then:

SELECT count(*)
FROM cst_rollup
LEFT JOIN(
SELECT target_name_id, ep, wafer_id
FROM rollup_exclude) b
USING (target_name_id, ep, wafer_id)
WHERE b.target_name_id IS NULL
GROUP by target_name_id, ep, roiname, recipe_process, recipe_product,
recipe_layer, f_tag_bottom, measname, recipe_id

And the rowcount from that query gave me what I needed.

Thanks very much for the help Peter, you gave me a push toward the right path.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: help with query to count rows while excluding certain rows

2016-01-01 Thread Larry Martell

On Fri, Jan 1, 2016 at 2:12 PM, Peter Brawley
 wrote:
> On 12/31/2015 0:51, Larry Martell wrote:
>>
>> I need to count the number of rows in a table that are grouped by a
>> list of columns, but I also need to exclude rows that have more then
>> some count when grouped by a different set of columns. Conceptually,
>> this is not hard, but I am having trouble doing this efficiently.
>>
>> My first counting query would be this:
>>
>> SELECT count(*)
>> FROM cst_rollup
>> GROUP BY target_name_id, ep, roiname, recipe_process,
>> recipe_product, recipe_layer, f_tag_bottom,
>> measname, recipe_id
>>
>> But from this count I need to subtract the count of rows that have
>> more then 50 rows with a different grouping:
>>
>> SELECT count(*)
>> FROM cst_rollup
>> GROUP BY target_name_id, ep, wafer_id
>> HAVING count(*) >= 50
>>
>> As you can see, the second query has wafer_id, but the first query does
>> not.
>>
>> Currently I am doing this in python, and it's slow. In my current
>> implementation I have one query, and it selects the columns (i.e.
>> doesn't just count), and I have added wafer_id:
>>
>> SELECT target_name_id, ep, roiname, recipe_process,
>> recipe_product, recipe_layer, f_tag_bottom,
>> measname, recipe_id, wafer_id
>> FROM cst_rollup
>>
>> Then I go through the result set (which can be over 200k rows) and I
>> count the number of rows with matching (target_name_id, ep, wafer_id).
>> Then I go through the rows again and regroup them without wafer_id,
>> but skipping the rows that have more then 50 rows for that row's
>> (target_name_id, ep, wafer_id).
>>
>> Is this clear to everyone what I am trying to do?
>
>
> If I've understand this correctly, the resultset you wish to aggregate on is
> ...
>
> select target_name_id, ep, wafer_id
> from cst_rollup a
> left join (   -- exclude rows for which wafer_id count >= 50
>   select name_id, ep, wafer, count(*) n
>   from cst_rollup
>   group by target_name_id, ep, wafer_id
>   having n >= 50
> ) b using ( target_name_id, ep, wafer_id )
> where b.target_name is null ;
>
> If that's so, you could assemble that resultset in a temp table then run the
> desired aggregate query on it, or you could aggregate on it directly as a
> subquery.

That query gives:

ERROR 1137 (HY000): Can't reopen table: 'a'

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

help with query to count rows while excluding certain rows

2015-12-30 Thread Larry Martell

I need to count the number of rows in a table that are grouped by a
list of columns, but I also need to exclude rows that have more then
some count when grouped by a different set of columns. Conceptually,
this is not hard, but I am having trouble doing this efficiently.

My first counting query would be this:

SELECT count(*)
FROM cst_rollup
GROUP BY target_name_id, ep, roiname, recipe_process,
recipe_product, recipe_layer, f_tag_bottom,
measname, recipe_id

But from this count I need to subtract the count of rows that have
more then 50 rows with a different grouping:

SELECT count(*)
FROM cst_rollup
GROUP BY target_name_id, ep, wafer_id
HAVING count(*) >= 50

As you can see, the second query has wafer_id, but the first query does not.

Currently I am doing this in python, and it's slow. In my current
implementation I have one query, and it selects the columns (i.e.
doesn't just count), and I have added wafer_id:

SELECT target_name_id, ep, roiname, recipe_process,
recipe_product, recipe_layer, f_tag_bottom,
measname, recipe_id, wafer_id
FROM cst_rollup

Then I go through the result set (which can be over 200k rows) and I
count the number of rows with matching (target_name_id, ep, wafer_id).
Then I go through the rows again and regroup them without wafer_id,
but skipping the rows that have more then 50 rows for that row's
(target_name_id, ep, wafer_id).

Is this clear to everyone what I am trying to do?

I'd like to do this all in sql with count because then I do not have
to actually return and parse the data in python.

Can anyone think of a way to do this in sql in a way that will be more
efficient then my current implementation?


Thanks!
-Larry

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

OPTIMIZE TABLE vs. myisamchk

2015-07-01 Thread Larry Martell

I have a very large table (~50GB) and periodically rows are purged
from it and I want to run OPTIMIZE TABLE to recover the space. But I
do not have enough space to run it. If I do run it the server hangs
and must be killed and restarted and the table is damaged and must be
repaired. I do this with myisamchk.

At https://dev.mysql.com/doc/refman/5.1/en/myisam-optimization.html I read this:

To coalesce fragmented rows and eliminate wasted space that results
from deleting or updating rows, run myisamchk in recovery mode:

shell> myisamchk -r tbl_name

You can optimize a table in the same way by using the OPTIMIZE TABLE
SQL statement. OPTIMIZE TABLE does a table repair and a key analysis,
and also sorts the index tree so that key lookups are faster. There is
also no possibility of unwanted interaction between a utility and the
server, because the server does all the work when you use OPTIMIZE
TABLE.

Does this mean that myisamchk -r -a -S does the same thing as OPTIMIZE TABLE?

If they do do the same thing, why does myisamchk work but OPTIMIZE
TABLE run out of space?

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: Upgrade from 5.5 to 5.6

2015-05-18 Thread Larry Martell

On Fri, May 15, 2015 at 8:27 PM, Reindl Harald  wrote:
>
> Am 15.05.2015 um 23:27 schrieb Larry Martell:
>>
>> I am upgrading my server from 5.5 to 5.6. This is on CentOS 6.5. I
>> removed 5.5 like this:
>>
>> yum remove mysql-libs MySQL-client MySQL-devel MySQL-shared
>>
>> and I installed 5.6 from
>> MySQL-5.6.24-1.linux_glibc2.5.x86_64.rpm-bundle.tar with:
>>
>> rpm -i MySQL-client-5.6.24-1.linux_glibc2.5.x86_64.rpm
>> rpm -i MySQL-client-5.6.24-1.linux_glibc2.5.x86_64.rpm
>> rpm -i MySQL-shared-5.6.24-1.linux_glibc2.5.x86_64.rpm
>>
>> I then ran mysql_upgrade and I got:
>>
>> This installation of MySQL is already upgraded to 5.5.43, use --force
>> if you still need to run mysql_upgrade
>>
>> Looking at the mod time of mysql_upgrade I see it's from when I
>> installed 5.5, not from today.
>>
>> Shouldn't I have gotten a new mysql_upgrade with the 5.6 install? Is
>> running it with --force all I need to do? Or do I not have the correct
>> version of mysql_upgrade?
>
>
> surely, but why don#t you upgrade mysql-server?

Yeah, that would help. All good now. Thanks.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Upgrade from 5.5 to 5.6

2015-05-15 Thread Larry Martell

I am upgrading my server from 5.5 to 5.6. This is on CentOS 6.5. I
removed 5.5 like this:

yum remove mysql-libs MySQL-client MySQL-devel MySQL-shared

and I installed 5.6 from
MySQL-5.6.24-1.linux_glibc2.5.x86_64.rpm-bundle.tar with:

   rpm -i MySQL-client-5.6.24-1.linux_glibc2.5.x86_64.rpm
   rpm -i MySQL-client-5.6.24-1.linux_glibc2.5.x86_64.rpm
   rpm -i MySQL-shared-5.6.24-1.linux_glibc2.5.x86_64.rpm

I then ran mysql_upgrade and I got:

This installation of MySQL is already upgraded to 5.5.43, use --force
if you still need to run mysql_upgrade

Looking at the mod time of mysql_upgrade I see it's from when I
installed 5.5, not from today.

Shouldn't I have gotten a new mysql_upgrade with the 5.6 install? Is
running it with --force all I need to do? Or do I not have the correct
version of mysql_upgrade?

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

'Lock wait timeout exceeded' with MyISAM tables

2015-05-08 Thread Larry Martell

I just upgraded a server from 5.1 to 5.5. Our tables are all MyISAM. I
have a python script that inserts rows to a table, in 5.1 it worked
fine. In 5.5 it's failing with 'Lock wait timeout exceeded'. I google
this, and it seems that all the cases of people getting that were with
InnoDB tables, and all the solutions only apply to InnoDB tables. How
can I fix this with MyISAM tables?

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: questions about timestamps and DST

2015-03-31 Thread Larry Martell

On Tue, Mar 31, 2015 at 1:13 AM, Andrew Moore  wrote:
> When you use a timezone with DST there is no such thing as 2.30am on the
> date of changeover. That hour doesn't exist.

I am using UCT - I am not using a timezone.

> Look up the difference between timestamp and datetime data types.

I did do that before I posted, but it wasn't really clear to me, but I
think I need to use a DATETIME instead of a TIMESTAMP. Correct?

>
> On 31 Mar 2015 05:43, "Larry Martell"  wrote:
>>
>> I have a column that was a timestamp type. I was inserting rows using
>> NOW(). When we switched to DST and the hour from 2am to 3am was
>> non-existent I of course had no data for that hour. For reasons I
>> don't need to go into, that missing hour caused problems downstream.
>> To prevent this from happening next year I changed the insert to use
>> UTC_TIMESTAMP() and I wanted to fill in data for that missing hour.
>> But no matter what I do it will not let me insert values for that hour
>> - it gives me an 'Invalid TIMESTAMP value' warning" and inserts a row
>> with a time of 3:00 for any time in that hour I give. This makes me
>> think that I have not actually solved the problem for next year (I
>> can't test this to know).
>>
>> So my questions are:
>>
>> 1) How can I actually insert a timestamp value that will not be
>> affected by the time change and not have the missing hour?
>> 2) Why is it not allowing me to insert UTC times for that missing
>> hour? How can I insert UTC values for that missing hour?
>>
>>
>> TIA!
>> -larry
>>
>> --
>> MySQL General Mailing List
>> For list archives: http://lists.mysql.com/mysql
>> To unsubscribe:http://lists.mysql.com/mysql
>>
>

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

questions about timestamps and DST

2015-03-30 Thread Larry Martell

I have a column that was a timestamp type. I was inserting rows using
NOW(). When we switched to DST and the hour from 2am to 3am was
non-existent I of course had no data for that hour. For reasons I
don't need to go into, that missing hour caused problems downstream.
To prevent this from happening next year I changed the insert to use
UTC_TIMESTAMP() and I wanted to fill in data for that missing hour.
But no matter what I do it will not let me insert values for that hour
- it gives me an 'Invalid TIMESTAMP value' warning" and inserts a row
with a time of 3:00 for any time in that hour I give. This makes me
think that I have not actually solved the problem for next year (I
can't test this to know).

So my questions are:

1) How can I actually insert a timestamp value that will not be
affected by the time change and not have the missing hour?
2) Why is it not allowing me to insert UTC times for that missing
hour? How can I insert UTC values for that missing hour?


TIA!
-larry

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: Help improving query performance

2015-02-04 Thread Larry Martell

On Wed, Feb 4, 2015 at 3:25 PM, shawn l.green  wrote:
> Hi Larry,
>
>
> On 2/4/2015 3:18 PM, Larry Martell wrote:
>>
>> On Wed, Feb 4, 2015 at 2:56 PM, shawn l.green 
>> wrote:
>>>
>>> Hi Larry,
>>>
>>>
>>> On 2/1/2015 4:49 PM, Larry Martell wrote:
>>>>
>>>>
>>>> I have 2 queries. One takes 4 hours to run and returns 21 rows, and
>>>> the other, which has 1 additional where clause, takes 3 minutes and
>>>> returns 20 rows. The main table being selected from is largish
>>>> (37,247,884 rows with 282 columns). Caching is off for my testing, so
>>>> it's not related to that. To short circuit anyone asking, these
>>>> queries are generated by python code, which is why there's an IN
>>>> clause with 1 value, as oppose to an =.
>>>>
>>>> Here are the queries and their explains. The significant difference is
>>>> that the faster query has "Using
>>>> intersect(data_cst_bbccbce0,data_cst_fba12377)" in the query plan -
>>>> those 2 indexes are on the 2 columns in the where clause, so that's
>>>> why the second one is faster. But I am wondering what I can do to make
>>>> the first one faster.
>>>>
>>>>
>>>> 4 hour query:
>>>>
>>>> SELECT MIN(data_tool.name) as tool,
>>>>  MIN(data_cst.date_time) "start",
>>>>  MAX(data_cst.date_time) "end",
>>>>  MIN(data_target.name) as target,
>>>>  MIN(data_lot.name) as lot,
>>>>  MIN(data_wafer.name) as wafer,
>>>>  MIN(measname) as measname,
>>>>  MIN(data_recipe.name) as recipe
>>>> FROM data_cst
>>>> INNER JOIN data_tool ON data_tool.id = data_cst.tool_id
>>>> INNER JOIN data_target ON data_target.id = data_cst.target_name_id
>>>> INNER JOIN data_lot ON data_lot.id = data_cst.lot_id
>>>> INNER JOIN data_wafer ON data_wafer.id = data_cst.wafer_id
>>>> INNER JOIN data_measparams ON data_measparams.id =
>>>> data_cst.meas_params_name_id
>>>> INNER JOIN data_recipe ON data_recipe.id = data_cst.recipe_id
>>>> WHERE data_target.id IN (172) AND
>>>> data_cst.date_time BETWEEN '2015-01-26 00:00:00' AND '2015-01-26
>>>> 23:59:59'
>>>> GROUP BY wafer_id, data_cst.lot_id, target_name_id
>>>>
>>>
>>> ... snipped ...
>>>
>>>>
>>>>
>>>> Faster query:
>>>>
>>>> SELECT MIN(data_tool.name) as tool,
>>>>  MIN(data_cst.date_time) "start",
>>>>  MAX(data_cst.date_time) "end",
>>>>  MIN(data_target.name) as target,
>>>>  MIN(data_lot.name) as lot,
>>>>  MIN(data_wafer.name) as wafer,
>>>>  MIN(measname) as measname,
>>>>  MIN(data_recipe.name) as recipe
>>>> FROM data_cst
>>>> INNER JOIN data_tool ON data_tool.id = data_cst.tool_id
>>>> INNER JOIN data_target ON data_target.id = data_cst.target_name_id
>>>> INNER JOIN data_lot ON data_lot.id = data_cst.lot_id
>>>> INNER JOIN data_wafer ON data_wafer.id = data_cst.wafer_id
>>>> INNER JOIN data_measparams ON data_measparams.id =
>>>> data_cst.meas_params_name_id
>>>> INNER JOIN data_recipe ON data_recipe.id = data_cst.recipe_id
>>>> WHERE data_target.id IN (172) AND
>>>> data_recipe.id IN (148) AND
>>>> data_cst.date_time BETWEEN '2015-01-26 00:00:00' AND '2015-01-26
>>>> 23:59:59'
>>>> GROUP BY wafer_id, data_cst.lot_id, target_name_id
>>>>
>>> ... snip ...
>>>>
>>>>
>>>>
>>>> Thanks for taking the time to read this, and for any help or pointers
>>>> you can give me.
>>>>
>>>
>>> The biggest difference is the added selectivity generated by the WHERE
>>> term
>>> against the data_recipe table.
>>>
>>> Compare the two EXPLAINS, in the faster query you see that data_recipe is
>>> listed second. This allows the additional term a chance to reduce the
>>> number
>>> of row combinations for the entire query.
>>>
>>> To really get at the logic behind how the Optimizer chooses its execution
>>> plan, get an optimizer trace. Look at the &q

Re: Help improving query performance

2015-02-04 Thread Larry Martell

On Wed, Feb 4, 2015 at 2:56 PM, shawn l.green  wrote:
> Hi Larry,
>
>
> On 2/1/2015 4:49 PM, Larry Martell wrote:
>>
>> I have 2 queries. One takes 4 hours to run and returns 21 rows, and
>> the other, which has 1 additional where clause, takes 3 minutes and
>> returns 20 rows. The main table being selected from is largish
>> (37,247,884 rows with 282 columns). Caching is off for my testing, so
>> it's not related to that. To short circuit anyone asking, these
>> queries are generated by python code, which is why there's an IN
>> clause with 1 value, as oppose to an =.
>>
>> Here are the queries and their explains. The significant difference is
>> that the faster query has "Using
>> intersect(data_cst_bbccbce0,data_cst_fba12377)" in the query plan -
>> those 2 indexes are on the 2 columns in the where clause, so that's
>> why the second one is faster. But I am wondering what I can do to make
>> the first one faster.
>>
>>
>> 4 hour query:
>>
>> SELECT MIN(data_tool.name) as tool,
>> MIN(data_cst.date_time) "start",
>> MAX(data_cst.date_time) "end",
>> MIN(data_target.name) as target,
>> MIN(data_lot.name) as lot,
>> MIN(data_wafer.name) as wafer,
>> MIN(measname) as measname,
>> MIN(data_recipe.name) as recipe
>> FROM data_cst
>> INNER JOIN data_tool ON data_tool.id = data_cst.tool_id
>> INNER JOIN data_target ON data_target.id = data_cst.target_name_id
>> INNER JOIN data_lot ON data_lot.id = data_cst.lot_id
>> INNER JOIN data_wafer ON data_wafer.id = data_cst.wafer_id
>> INNER JOIN data_measparams ON data_measparams.id =
>> data_cst.meas_params_name_id
>> INNER JOIN data_recipe ON data_recipe.id = data_cst.recipe_id
>> WHERE data_target.id IN (172) AND
>>data_cst.date_time BETWEEN '2015-01-26 00:00:00' AND '2015-01-26
>> 23:59:59'
>> GROUP BY wafer_id, data_cst.lot_id, target_name_id
>>
>
> ... snipped ...
>
>>
>>
>> Faster query:
>>
>> SELECT MIN(data_tool.name) as tool,
>> MIN(data_cst.date_time) "start",
>> MAX(data_cst.date_time) "end",
>> MIN(data_target.name) as target,
>> MIN(data_lot.name) as lot,
>> MIN(data_wafer.name) as wafer,
>> MIN(measname) as measname,
>> MIN(data_recipe.name) as recipe
>> FROM data_cst
>> INNER JOIN data_tool ON data_tool.id = data_cst.tool_id
>> INNER JOIN data_target ON data_target.id = data_cst.target_name_id
>> INNER JOIN data_lot ON data_lot.id = data_cst.lot_id
>> INNER JOIN data_wafer ON data_wafer.id = data_cst.wafer_id
>> INNER JOIN data_measparams ON data_measparams.id =
>> data_cst.meas_params_name_id
>> INNER JOIN data_recipe ON data_recipe.id = data_cst.recipe_id
>> WHERE data_target.id IN (172) AND
>>data_recipe.id IN (148) AND
>>data_cst.date_time BETWEEN '2015-01-26 00:00:00' AND '2015-01-26
>> 23:59:59'
>> GROUP BY wafer_id, data_cst.lot_id, target_name_id
>>
> ... snip ...
>>
>>
>> Thanks for taking the time to read this, and for any help or pointers
>> you can give me.
>>
>
> The biggest difference is the added selectivity generated by the WHERE term
> against the data_recipe table.
>
> Compare the two EXPLAINS, in the faster query you see that data_recipe is
> listed second. This allows the additional term a chance to reduce the number
> of row combinations for the entire query.
>
> To really get at the logic behind how the Optimizer chooses its execution
> plan, get an optimizer trace. Look at the "cost" estimates for each phase
> being considered.
> http://dev.mysql.com/doc/refman/5.6/en/optimizer-trace-table.html
> http://dev.mysql.com/doc/internals/en/optimizer-tracing.html

Thanks very much Shawn for the reply and the links. I will check those
out and I'm sure I will find them very useful.

Meanwhile I changed the query to select from data_cst using the where
clause into a temp table and then I join the temp table with the other
tables. That has improved the slow query from 4 hours to 10 seconds
(!)

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Help improving query performance

2015-02-01 Thread Larry Martell

I have 2 queries. One takes 4 hours to run and returns 21 rows, and
the other, which has 1 additional where clause, takes 3 minutes and
returns 20 rows. The main table being selected from is largish
(37,247,884 rows with 282 columns). Caching is off for my testing, so
it's not related to that. To short circuit anyone asking, these
queries are generated by python code, which is why there's an IN
clause with 1 value, as oppose to an =.

Here are the queries and their explains. The significant difference is
that the faster query has "Using
intersect(data_cst_bbccbce0,data_cst_fba12377)" in the query plan -
those 2 indexes are on the 2 columns in the where clause, so that's
why the second one is faster. But I am wondering what I can do to make
the first one faster.


4 hour query:

SELECT MIN(data_tool.name) as tool,
   MIN(data_cst.date_time) "start",
   MAX(data_cst.date_time) "end",
   MIN(data_target.name) as target,
   MIN(data_lot.name) as lot,
   MIN(data_wafer.name) as wafer,
   MIN(measname) as measname,
   MIN(data_recipe.name) as recipe
FROM data_cst
INNER JOIN data_tool ON data_tool.id = data_cst.tool_id
INNER JOIN data_target ON data_target.id = data_cst.target_name_id
INNER JOIN data_lot ON data_lot.id = data_cst.lot_id
INNER JOIN data_wafer ON data_wafer.id = data_cst.wafer_id
INNER JOIN data_measparams ON data_measparams.id = data_cst.meas_params_name_id
INNER JOIN data_recipe ON data_recipe.id = data_cst.recipe_id
WHERE data_target.id IN (172) AND
  data_cst.date_time BETWEEN '2015-01-26 00:00:00' AND '2015-01-26 23:59:59'
GROUP BY wafer_id, data_cst.lot_id, target_name_id

Explain:

++-+-++---+---+-+-+--+-+
| id | select_type | table   | type   | possible_keys

| key   | key_len | ref
 | rows | Extra   |
++-+-++---+---+-+-+--+-+
|  1 | SIMPLE  | data_target | const  | PRIMARY

| PRIMARY   | 4   | const
 |1 | Using temporary; Using filesort |
|  1 | SIMPLE  | data_measparams | index  | PRIMARY

| PRIMARY   | 4   | NULL
 |  767 | Using index |
|  1 | SIMPLE  | data_cst| ref|
data_cst_4262c4f4,data_cst_01213b5c,data_cst_bbccbce0,data_cst_b0229011,data_cst_65c0,data_cst_fba12377,data_cst_634020d0
| data_cst_634020d0 | 5   | motor_gf.data_measparams.id |   48 |
Using where |
|  1 | SIMPLE  | data_tool   | eq_ref | PRIMARY

| PRIMARY   | 4   |
motor_gf.data_cst.tool_id   |1 | NULL|
|  1 | SIMPLE  | data_recipe | eq_ref | PRIMARY

| PRIMARY   | 4   |
motor_gf.data_cst.recipe_id |1 | NULL|
|  1 | SIMPLE  | data_lot| eq_ref | PRIMARY

| PRIMARY   | 4   |
motor_gf.data_cst.lot_id|1 | NULL|
|  1 | SIMPLE  | data_wafer  | eq_ref | PRIMARY

| PRIMARY   | 4   |
motor_gf.data_cst.wafer_id  |1 | NULL|
++-+-++---+---+-+-+--+-+

Faster query:

SELECT MIN(data_tool.name) as tool,
   MIN(data_cst.date_time) "start",
   MAX(data_cst.date_time) "end",
   MIN(data_target.name) as target,
   MIN(data_lot.name) as lot,
   MIN(data_wafer.name) as wafer,
   MIN(measname) as measname,
   MIN(data_recipe.name) as recipe
FROM data_cst
INNER JOIN data_tool ON data_tool.id = data_cst.tool_id
INNER JOIN data_target ON data_target.id = data_cst.target_name_id
INNER JOIN data_lot ON data_lot.id = data_cst.lot_id
INNER JOIN data_wafer ON data_wafer.id = data_cst.wafer_id
INNER JOIN data_measparams ON data_measparams.id = data_cst.meas_params_name_id
INNER JOIN data_recipe ON data_recipe.id = data_cst.recipe_id
WHERE data_target.id IN (172) AND
  data_recipe.id IN (148) AND
  data_cst.date_time BETWEEN '2015-01-26 00:00:00' AND '2015-01-26 23:59:59'
GROUP BY wafer_id, data_cst.lot_id,

Re: Xml data import

2014-12-13 Thread Larry Martell

On Fri, Dec 12, 2014 at 8:31 PM, Sayth Renshaw  wrote:
> And does that then lead you to use Fabric?
>
> http://de.slideshare.net/mobile/nixnutz/mysql-57-fabric-high-availability-and-sharding

No, I've never used that. I just process the the data in python.


> On Sat, 13 Dec 2014 9:06 AM Larry Martell  wrote:
>>
>> On Fri, Dec 12, 2014 at 4:52 PM, Sayth Renshaw 
>> wrote:
>> > So it is definitely achievable, I see other db's converting xml2json etc
>> > to
>> > get it in.
>>
>> I use this https://github.com/hay/xml2json
>>
>> > Sends odd that xml had done great document qualities but as a data
>> > format
>> > it seems rather hard to work with.
>>
>> Indeed.
>>
>>
>> > On Fri, 12 Dec 2014 9:55 PM Johan De Meersman 
>> > wrote:
>> >
>> >>
>> >> - Original Message -
>> >> > From: "Sayth Renshaw" 
>> >> > Subject: Xml data import
>> >> >
>> >> > I have an xml data feed with xsd, it's complex in elements not size.
>> >> > Wray
>> >> > are the best way to get data into mysql, do I have to hack with
>> >> > xquery?
>> >>
>> >> That's going to depend on the complexity of your XML and how much of
>> >> that
>> >> complexity you want preserved. There's definitely libraries out there
>> >> that
>> >> can map XML (tree) structures onto relational structures; although it's
>> >> been so long since I played with XML that I really couldn't tell you
>> >> which
>> >> ones.
>> >>
>> >> > My goal is to be able create queries and send csv files out for
>> >> > analysis
>> >> in
>> >> > R and plots in ggplot2. Also render done other off the initial days
>> >> > to
>> >> the
>> >> > Web usually xslt from xml.
>> >>
>> >> I suppose that worstcase you could use XSLT to transform into a flatter
>> >> structure, maybe even CSV?
>> >>
>> >>
>> >> --
>> >> Unhappiness is discouraged and will be corrected with kitten pictures.
>> >>
>>
>> --
>> MySQL General Mailing List
>> For list archives: http://lists.mysql.com/mysql
>> To unsubscribe:http://lists.mysql.com/mysql
>>
>

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: Xml data import

2014-12-12 Thread Larry Martell

On Fri, Dec 12, 2014 at 4:52 PM, Sayth Renshaw  wrote:
> So it is definitely achievable, I see other db's converting xml2json etc to
> get it in.

I use this https://github.com/hay/xml2json

> Sends odd that xml had done great document qualities but as a data format
> it seems rather hard to work with.

Indeed.


> On Fri, 12 Dec 2014 9:55 PM Johan De Meersman  wrote:
>
>>
>> - Original Message -
>> > From: "Sayth Renshaw" 
>> > Subject: Xml data import
>> >
>> > I have an xml data feed with xsd, it's complex in elements not size. Wray
>> > are the best way to get data into mysql, do I have to hack with xquery?
>>
>> That's going to depend on the complexity of your XML and how much of that
>> complexity you want preserved. There's definitely libraries out there that
>> can map XML (tree) structures onto relational structures; although it's
>> been so long since I played with XML that I really couldn't tell you which
>> ones.
>>
>> > My goal is to be able create queries and send csv files out for analysis
>> in
>> > R and plots in ggplot2. Also render done other off the initial days to
>> the
>> > Web usually xslt from xml.
>>
>> I suppose that worstcase you could use XSLT to transform into a flatter
>> structure, maybe even CSV?
>>
>>
>> --
>> Unhappiness is discouraged and will be corrected with kitten pictures.
>>

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: How to retrieve next record?

2014-12-11 Thread Larry Martell

Please reply to the list and not to me directly

On Thu, Dec 11, 2014 at 5:10 PM, Trianon 33  wrote:
>
> Larry Martell schreef op 11-12-14 om 23:04:
>>
>> On Thu, Dec 11, 2014 at 4:58 PM, Trianon 33  wrote:
>>>
>>> Larry Martell schreef op 11-12-14 om 22:43:
>>>>
>>>> On Thu, Dec 11, 2014 at 4:34 PM, Trianon 33  wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> It seems like a simple question, however I'm not able to find the
>>>>> answer.
>>>>>
>>>>> Let me explain. I have a DB with lots of columns, one is labeled as
>>>>> 'kenteken' and it is also the primary key in the DB. I poke into this
>>>>> DB
>>>>> with the command
>>>>>
>>>>> $sDBQuery1 = "SELECT * FROM kentekenlogtest WHERE kenteken =
>>>>> '$sActueelkenteken'";
>>>>>
>>>>> This is ok and gives me the correct result. However, then I want the
>>>>> next
>>>>> row (according to the primary key), but with what command?
>>>>>
>>>>> That's were I'm stuck.
>>>>>
>>>>> Didn't find anything like NEXT, so... what to look for?
>>>>
>>>>
>>>> What do you mean by 'next' row? Since you're querying by primary key
>>>> you will only get one row, right?
>>>>
>>> Sorry for not expressing me right.
>>>
>>> Let's say may DB contains the following rowd:
>>>
>>> 0001ES bla bla bla
>>> 0002ES bla bla bla
>>> 0010DD bla bla bla
>>> 0012FF bla bla bla
>>>
>>> The first row I retrieve with my statement where my variable
>>> $sActueelkenteken = 0001ES. This is succesfully. after that I want the
>>> next
>>> record which happens to be 0002ES etc. That's the next record I would
>>> like
>>> to retrieve.
>>>
>>> Does this help for the question?
>>
>> The rows in the database are not ordered. What makes 0002ES the row
>> you want? You have to change your query to select the rows you want
>> and control the order with a sort or within your program.
>
> Hello Larry, hm.
>
> My rows are ordered.

No, the data in the database is not ordered.

> 0001ES, comes before 0002ES, etc. But so far that
> doesn't really matter.
>
> My DB consists of approx 10 mio records, all with this 6-digit field
> "kenteken" which is 6 positions. All records are sorted by primary key which
> is on the field "kenteken".
>
> All I want to do is select a record (not neccesarily the first in the DB)
> and from there I want simply the next record based on the order of the
> primary key. Somewhere in the middle I would see record FF, followed by
> ZZ for example and somewhere to the end I would see ZZ.
>
> Makes this explanation it somewhat more understandable?

You would have to know the range of the items you want, e.g.:

SELECT * FROM kentekenlogtest WHERE kenteken >= $firstOneYouWant and
kenteken <= $lastOneYouWant

Or if you know how many you want you could do:

SELECT * FROM kentekenlogtest WHERE kenteken >= $firstOneYouWant limit
$howManyYouWant

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: How to retrieve next record?

2014-12-11 Thread Larry Martell

On Thu, Dec 11, 2014 at 4:34 PM, Trianon 33  wrote:
> Hi all,
>
> It seems like a simple question, however I'm not able to find the
> answer.
>
> Let me explain. I have a DB with lots of columns, one is labeled as
> 'kenteken' and it is also the primary key in the DB. I poke into this DB
> with the command
>
> $sDBQuery1 = "SELECT * FROM kentekenlogtest WHERE kenteken =
> '$sActueelkenteken'";
>
> This is ok and gives me the correct result. However, then I want the next
> row (according to the primary key), but with what command?
>
> That's were I'm stuck.
>
> Didn't find anything like NEXT, so... what to look for?


What do you mean by 'next' row? Since you're querying by primary key
you will only get one row, right?

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Stored procedure debuggers

2014-08-20 Thread Larry Martell

Does anyone know of any debuggers for stored procs that run on Mac and/or Linux?

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: Case sensitivity

2014-05-21 Thread Larry Martell

On Wed, May 21, 2014 at 4:50 AM, Hartmut Holzgraefe  wrote:
> On 05/21/2014 08:14 AM, Johan De Meersman wrote:
>
>>> ALTER TABLE `my_table` DEFAULT CHARACTER SET latin1 COLLATE
>>> latin1_general_ci;
>>
>> Purely from memory, doesn't that change the table but add the old setting to 
>> individual text columns?
>
> right, the above will only change the default for new columns
> added later, but existing columns will keep their current
> character set and collation.
>
>   ALTER TABLE tbl_name
> CONVERT TO CHARACTER SET charset_name
> [COLLATE collation_name]
>
> is needed to change the default *and* all already existing
> columns

Thanks for all the help Hartmut.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: Case sensitivity

2014-05-20 Thread Larry Martell

On Tue, May 20, 2014 at 5:52 PM, Hartmut Holzgraefe  wrote:
> On 05/20/2014 11:10 PM, Larry Martell wrote:
>
>> 5.5 server:
>>
>> mysql> select distinct(pnlCurrency) from PnLDetail;
>> +-+
>> | pnlCurrency |
>> +-+
>> | USD |
>> | USd |
>> +-+
>> 2 rows in set (0.00 sec)
>>
>> 5.0 server:
>>
>> mysql> select distinct(pnlCurrency) from PnLDetail;
>> +-+
>> | pnlCurrency |
>> +-+
>> | USD |
>> +-+
>> 1 row in set (0.01 sec)
>>
>> Both have the same data in them.
>
> ok, in this case the interesting piece of information
> would be the "SHOW CREATE TABLE pnlCurrency;" output
> from both.
>
> As the collation_database variables differ between the
> two servers ("utf8_bin" vs. "latin1_swedish_ci") I assume
> that the same is the case for the actual tables, and that
> would perfectly explain why one treads lower/upper case
> as distinct while the other doesn't

The create table on 5.5 has:

DEFAULT CHARSET=uft8
COLLATE=uft8_bin

and on the 5.0 server it has:

DEFAULT CHARSET=latin1

But the tables on both servers were created from DDL files that do not
specify either one of these. So are these inherited from the server
settings at the time the tables are created? how do I fix this?
Specify DEFAULT CHARSET and COLLATE on the CREATE TABLE statements in
the DDL file and drop and recreate all the tables? Is there a way fo
change this without doing that? Can the collation_database variables
be set in the conf file? I didn't see them mentioned in the docs.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: Case sensitivity

2014-05-20 Thread Larry Martell

On Tue, May 20, 2014 at 9:38 AM, Hartmut Holzgraefe  wrote:
> On 05/20/2014 03:26 PM, Larry Martell wrote:
>> Yet the 5.5.25a server is case sensitive as you would expect, but the
>> 5.0.77 is not. How can I make the 5.0.77 server case sensitive?
>>
>
> please provide the output of
>
>   SHOW VARIABLES LIKE 'character_set_%';
>
> and
>
>   SHOW VARIABLES LIKE 'collation_%';

Here it is from the server that is case sensitive:

mysql> show variables like 'character_set_%';
+--++
| Variable_name| Value  |
+--++
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database   | utf8   |
| character_set_filesystem | binary |
| character_set_results| latin1 |
 | character_set_server | utf8   |
| character_set_system | utf8   |
 | character_sets_dir   | /export/home/mysql/share/charsets/
| +--++
8 rows in set (0.00 sec)

mysql> show variables like 'collation_%';
+--+---+
| Variable_name| Value |
+--+---+
| collation_connection | latin1_swedish_ci |
| collation_database   | utf8_bin  |
| collation_server | utf8_bin  |
+--+---+
3 rows in set (0.00 sec)

And here is it from the server that is not case sensitive:

mysql> show variables like 'character_set_%';
+--++
| Variable_name| Value  |
+--++
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database   | latin1 |
| character_set_filesystem | binary |
| character_set_results| latin1 |
| character_set_server | utf8   |
| character_set_system | utf8   |
| character_sets_dir   | /usr/share/mysql/charsets/ |
+--++
8 rows in set (0.00 sec)

mysql> show variables like 'collation_%';
+--+---+
| Variable_name| Value |
+--+---+
| collation_connection | latin1_swedish_ci |
| collation_database   | latin1_swedish_ci |
| collation_server | utf8_bin  |
+--+---+
3 rows in set (0.00 sec)

Both servers have this in their config files, with no other character
set or collation settings:

character_set_server=utf8
collation_server=utf8_bin

Clearly they're different, but I don't know how to get them to be the same.

> and explain what exactly you mean by "is case sensitive".
> Are you referring to database and table names being case
> sensitive here, or to comparisons (=, LIKE) on textual
> values?

I mean when querying data.


> Seeing a few example queries and the different
> results returned by 5.0 and 5.5 would sure help, too ...

5.5 server:

mysql> select distinct(pnlCurrency) from PnLDetail;
+-+
| pnlCurrency |
+-+
| USD |
| USd |
+-+
2 rows in set (0.00 sec)

5.0 server:

mysql> select distinct(pnlCurrency) from PnLDetail;
+-+
| pnlCurrency |
+-+
| USD |
+-+
1 row in set (0.01 sec)

Both have the same data in them.

Thanks!

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Case sensitivity

2014-05-20 Thread Larry Martell

I have 2 servers, one is running 5.5.25a and the other 5.0.77.

They both have the same collation and character set:

mysql> SELECT COLLATION(VERSION());
+--+
| COLLATION(VERSION()) |
+--+
| utf8_general_ci  |
+--+
1 row in set (0.00 sec)

mysql> show variables like '%_server';
+--+--+
| Variable_name| Value|
+--+--+
| character_set_server | utf8 |
| collation_server | utf8_bin |
+--+--+
2 rows in set (0.00 sec)

Yet the 5.5.25a server is case sensitive as you would expect, but the
5.0.77 is not. How can I make the 5.0.77 server case sensitive?

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: Performance boost by splitting up large table?

2014-05-15 Thread Larry Martell

On Thu, May 15, 2014 at 11:01 AM, Johan De Meersman  wrote:
> - Original Message -
>> From: "Larry Martell" 
>> Subject: Re: Performance boost by splitting up large table?
>>
>> This table is queried based on requests from the users. There are 10
>> different lookup columns they can specify, and they can provide any or
>
> That makes it rather more of a bother, as MySQL can't (yet) skip columns in 
> an index, as far as I'm aware. Someone please correct me if I'm wrong here.
>
>
>> all of these. Currently each one of the columns has an index on it.
>
> I'm a bit fuzzy on multiple-index queries, but I think support isn't all too 
> sharp.
>
>
>> Would it be beneficial to create an index with all 10? Rarely are all
>> 10 specified in the query. Typically it's 3 or 4. Would it be
>> worthwhile to see which are much commonly specified and create an
>> index with just those? Or would it be better to put the commonly
>> selected columns on the index with the lookup columns?
>
>
> You may want to grab a day or week's worth of queries (either general_log 
> with all the overhead and disk space that entails, or tcpdump) and pump that 
> through pt-query-digest (Percona Toolkit) to see which combinations of fields 
> are most often used, and add the necessary covering indices to help those 
> queries.

We have a log going back over 2 years with over 200,000 queries, and
I've written python scripts to give the frequency of the column pulled
for the report and the column used for filtering. Not surprising, a
very small number of each are used most of the time, and a huge number
used just 1 or 2 times. I'll be digging into that next week.

> A few points to keep in mind during analysis:
>  * order of fields in the where clause is largely irrelevant (although 
> most-selective-first is preferred)
>  * not all the fields in the index must be queried; but you MUST query a full 
> prefix set
>-> ie, for index (a, b, c, d) you can query (a, b) but for (a, c) only (a) 
> will be used
>  * every index constitutes a (small) performance penalty upon table updates, 
> so don't go too wild either :-)
>
> Also helpful:
>  * plenty of memory, at least enough to keep the working set in memory
>  * faster disks (SSD, or FusionIO things) helps bundles for data that doesn't 
> fit in memory

The db host is running with 250GB memory, db size is 470GB, spinning
2.5” 15k rpm drives, and 40 cores.

>  * if you notice a tendency for multiple users (say, a dozen) to run 
> identical queries (and that means /bitwise/ identical query text, down to the 
> last space), the query cache might help. Don't make it too big, though, a 
> couple of meg should suffice.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: Performance boost by splitting up large table?

2014-05-15 Thread Larry Martell

On Thu, May 15, 2014 at 4:14 AM, Johan De Meersman  wrote:
>
> You've already had some good advice, but there's something much more simpler 
> that will also give you a significant boost: a covering index.
>
> Simply put, the engine is smart enough to not bother with row lookups if 
> everything you asked for is already in the index it was using. You'll need to 
> keep the index requirements in mind, of course (most selective fields first, 
> order fields after selection fields, etc) and then append any other fields to 
> the same index.
>
> Hard to say more without actual use cases, of course, but it's well worth 
> looking into as it requires no other changes in application or schema.

This table is queried based on requests from the users. There are 10
different lookup columns they can specify, and they can provide any or
all of these. Currently each one of the columns has an index on it.
Would it be beneficial to create an index with all 10? Rarely are all
10 specified in the query. Typically it's 3 or 4. Would it be
worthwhile to see which are much commonly specified and create an
index with just those? Or would it be better to put the commonly
selected columns on the index with the lookup columns?

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Performance boost by splitting up large table?

2014-05-14 Thread Larry Martell

We have a table with 254 columns in it. 80% of the time, a very small
subset of these columns are queried. The other columns are rarely, if
ever, queried. (But they could be at any time, so we do need to
maintain them.). Would I expect to get a marked performance boost if I
split my table up into 2 tables, one with the few frequently queried
columns and another with less frequently queried ones? Doing this will
require a lot of code changes, so I don't want to go down this path if
it won't be beneficial. Can folks here offer their experiences and
learned opinions about this?

Thanks!

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: Calling function, that operates on another database

2014-04-08 Thread Larry Martell

On Tue, Apr 8, 2014 at 4:15 PM, bars0.bars0.bars0
 wrote:
> Hi all.
>
> I have standard select statement and on one column I want to run function,
> that will connect to another database (same server).
> Is this possible?
>
> High level example:
>
> SELECT db1.clients.client_id, getTurnover(db1.clients.client_id) FROM
> db1.clients;
>
> AND getTurnover($id) body would be something like:
>
> SELECT SUM(db2.turnover.amount) FROM db2.turnover WHERE
> db2.turnover.client_id = $id;
>
>
> So for some data, I need to make lookup to another database table.
> Is this even possible?

Yes, using just the syntax you have: db.table

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: Conditional in update

2014-02-12 Thread Larry Martell

On Wed, Feb 12, 2014 at 5:35 PM,   wrote:
>>>>> 2014/02/11 18:14 -0500, Larry Martell >>>>
> set LIMIT = sign(LIMIT) * 100 * floor(0.01 + (sign(LIMIT) * LIMIT
> * ratio/100)
> <<<<<<<<
> The function TRUNCATE can be useful here:
> set LIMIT = TRUNCATE(LIMIT * ratio + 0.01 * sign(LIMIT), -2)
> , if it works as advertized. In any case,
> ABS(LIMIT) = sign(LIMIT) * LIMIT
> .
>
> As for limiting the value, see this (clipping can be useful to you):
>
>
> 11.2.6. Out-of-Range and Overflow Handling
>
> When MySQL stores a value in a numeric column that is outside the permissible 
> range of the column data type, the result depends on the SQL mode in effect 
> at the time:
>
> * If strict SQL mode is enabled, MySQL rejects the out-of-range value 
> with an error, and the insert fails, in accordance with the SQL standard.
> * If no restrictive modes are enabled, MySQL clips the value to the 
> appropriate endpoint of the range and stores the resulting value instead. 
> When an out-of-range value is assigned to an integer column, MySQL stores the 
> value representing the corresponding endpoint of the column data type range. 
> If you store 256 into a 
> TINYINT
>  or TINYINT UNSIGNED column, MySQL stores 127 or 255, respectively.
> When a floating-point or fixed-point column is assigned a value that 
> exceeds the range implied by the specified (or default) precision and scale, 
> MySQL stores the value representing the corresponding endpoint of that range.
>
> Column-assignment conversions that occur due to clipping when MySQL is not 
> operating in strict mode are reported as warnings for 
> ALTER
>  TABLE, 
> LOAD
>  DATA INFILE, 
> UPDATE,
>  and multiple-row 
> INSERT 
> statements. In strict mode, these statements fail, and some or all the values 
> will not be inserted or changed, depending on whether the table is a 
> transactional table and other factors. For details, see 
> Section
>  5.1.7, “Server SQL Modes”.


Thanks for the reply. I was able to do this with a case statement, but
then the requirements were changed and I had to know when I
constrained the limit so I could log it to a file. So I ended up just
doing the update as it was originally, then adding a select after to
find any rows that exceeded the limit, and then updating those to the
max or min, and then I could log them to a file.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: Narrowing a SELECT statement by multiple hits

2014-02-12 Thread Larry Martell

On Wed, Feb 12, 2014 at 7:35 PM, Jennifer  wrote:
> Hello,
>
> I have the following SQL statement that I'd like to add to.  It's 
> used to create a report that's emailed to me showing hits to our site that 
> didn't provide a referrer.  However, I only want to report on multiple hits 
> from the same IP address - not just a single hit by someone.
>
> How can I add a condition to only show hits by someone who's hit the 
> site 2 or more times with the same IP?  I tried GROUP BY but that didn't 
> return all the hits - one one per IP.
>
> SELECT `ip`,`page`,`url`,`time_stamp`
> FROM `ip_addresses`
> WHERE (`time_stamp` BETWEEN date_add( CURDATE(), INTERVAL -1 DAY ) AND 
> CURDATE() - INTERVAL 1 SECOND)
> AND TRIM(`referrer`) LIKE ''
> ORDER BY INET_ATON(`ip`), `time_stamp`
>
> I hope I'm explaining this correctly.

Try adding a having clause, e.g.:

SELECT `ip`,`page`,`url`,`time_stamp`
FROM `ip_addresses`
WHERE (`time_stamp` BETWEEN date_add( CURDATE(), INTERVAL -1 DAY ) AND
CURDATE() - INTERVAL 1 SECOND)
AND TRIM(`referrer`) LIKE ''
HAVING COUNT(ip) >2
ORDER BY INET_ATON(`ip`), `time_stamp`

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Conditional in update

2014-02-11 Thread Larry Martell

Is there some way I can have a conditional in an update?

I have this update sql (paraphrased):

update LMPROC_LIMITS
set LIMIT = sign(LIMIT) * 100 * floor(0.01 + (sign(LIMIT) * LIMIT
* ratio/100)
where SYMBOL_ID = symbolId
and CLASSTYPE = LimitType
and TYPE_ in ('minClusterPosition', 'maxClusterPosition',
'minProductPosition', 'maxProductPosition', 'minBookPosition',
'maxBookPosition', 'maxShortShares')

(ratio, symbolId, and LimitType all are filled in at run time)

What i want to do is limit (no pun intended) the value LIMIT is set to
to +/-2,147,483,647 - i.e. if the result of (sign(LIMIT) * 100 *
floor(0.01 + (sign(LIMIT) * LIMIT * ratio/100)) is more or less
then that, I want to set it to +/-2,147,483,647 Can I do this in SQL?

Thanks!
-larry

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: grouping by the difference between values in rows

2014-01-15 Thread Larry Martell

On Wed, Jan 15, 2014 at 4:06 PM, shawn l.green  wrote:
> Hello Larry,
>
>
> On 1/13/2014 2:43 AM, Peter Brawley wrote:
>>
>>
>> On 2014-01-12 9:13 PM, Larry Martell wrote:
>>>
>>> On Sun, Jan 12, 2014 at 2:47 PM, Peter Brawley
>>>  wrote:
>>>>
>>>> On 2014-01-12 1:17 PM, Larry Martell wrote:
>>>>>
>>>>> I've been asked to do something that I do not think is possible in SQL.
>>>>>
>>>>> I have a query that has this basic form:
>>>>>
>>>>> SELECT a, b, c, d, AVG(e), STD(e), CONCAT(x, ',', y) as f
>>>>> FROM t
>>>>> GROUP BY a, b, c, d, f
>>>>>
>>>>> x and y are numbers (378.18, 2213.797 or 378.218, 2213.949 or
>>>>> 10053.490, 2542.094).
>>>>>
>>>>> The business issue is that if either x or y in 2 rows that are in the
>>>>> same a, b, c, d group are within 1 of each other then they should be
>>>>> grouped together. And to make it more complicated, the tolerance is
>>>>> applied as a rolling continuum. For example, if the x and y in a set
>>>>> of grouped rows are:
>>>>>
>>>>> row 1: 1.5, 9.5
>>>>> row 2: 2.4, 20.8
>>>>> row 3: 3.3, 40.6
>>>>> row 4: 4.2, 2.5
>>>>> row 5: 5.1, 10.1
>>>>> row 6: 6.0, 7.9
>>>>> row 7: 8.0, 21.0
>>>>> row 8: 100, 200
>>>>>
>>>>> 1 through 6 get combined because all their X values are within the
>>>>> tolerance of some other X in the set that's been combined. 7's Y value
>>>>> is within the tolerance of 2's Y, so that should be combined as well.
>>>>> 8 is not combined because neither the X or Y value is within the
>>>>> tolerance of any X or Y in the set that was combined.
>>>>>
>>>>> In python I can easily parse the data and identify the rows that need
>>>>> to be combined, but then I've lost the ability to calculate the
>>>>> average and std. The only way I can think of to do this is to remove
>>>>> the grouping from the SQL and do all the grouping and aggregating
>>>>> myself. But this query often returns 20k to 30k rows after grouping.
>>>>> It could easily be 80k to 100k rows that I have to process if I remove
>>>>> the grouping and I think that will be very slow.
>>>>>
>>>>> Anyone have any ideas?
>>>>
>>>>
>>>> Could you compute the row-to-row values & write them to a temp table,
>>>> then
>>>> run the SQL that incorporates that result column?
>>>
>>> I thought of temp tables, but I could not come up with a way to use
>>> them for this. How can I apply the x/y tolerance grouping in sql?
>>
>>
>> Run the query you showed, saving the result to a temp table. In an sproc
>> or your preferred app language, do the row-to-row processing to generate
>> a new column in the temp table from the biz rules you outlined, now
>> query the revised temp table as desired.
>>
>
> This is a very interesting problem but I am not sure what to do with a
> grouping. To me this appears to be a two-dimensional, nearest-neighbor-type
> sorting problem.
>
> The way you outline it you could end up with all of the elements in your
> analysis plane in a single group depending on how closely they are to each
> other. Are long strings of points and blobs (clusters of points) in your 2-D
> search space (your plane is defined by your {a,b,c,d} tuple) acceptable
> results of your re-grouping process?
>
> If I had to draw this out graphically, you are stamping each of your {x,y}
> points with a 2x2 boundary box (each point is in intersection in the middle)
> and you want to know which sets of those boxes form a contiguous
> intersection area (they overlap).
>
> Is that what you are looking for? One technique would be to start with
> Peter's suggestion. Then reprocess that list to generate an acyclic directed
> graph (by eliminating the reverse matches from your set), then tracing down
> the tree.  If point 2 links to point 7,then 7 also links to 2 (both would be
> found by your nearness test. Eliminate any matches where the first point
> appears after the second point from your list.


Thanks much for the reply Shawn. Unfortunately this project was put on
the back burner. When I get back to it I'll be sure to try the
suggestions given here and I will report back as to what worked.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: grouping by the difference between values in rows

2014-01-13 Thread Larry Martell

On Mon, Jan 13, 2014 at 2:43 AM, Peter Brawley
 wrote:
>
> On 2014-01-12 9:13 PM, Larry Martell wrote:
>>
>> On Sun, Jan 12, 2014 at 2:47 PM, Peter Brawley
>>  wrote:
>>>
>>> On 2014-01-12 1:17 PM, Larry Martell wrote:
>>>>
>>>> I've been asked to do something that I do not think is possible in SQL.
>>>>
>>>> I have a query that has this basic form:
>>>>
>>>> SELECT a, b, c, d, AVG(e), STD(e), CONCAT(x, ',', y) as f
>>>> FROM t
>>>> GROUP BY a, b, c, d, f
>>>>
>>>> x and y are numbers (378.18, 2213.797 or 378.218, 2213.949 or
>>>> 10053.490, 2542.094).
>>>>
>>>> The business issue is that if either x or y in 2 rows that are in the
>>>> same a, b, c, d group are within 1 of each other then they should be
>>>> grouped together. And to make it more complicated, the tolerance is
>>>> applied as a rolling continuum. For example, if the x and y in a set
>>>> of grouped rows are:
>>>>
>>>> row 1: 1.5, 9.5
>>>> row 2: 2.4, 20.8
>>>> row 3: 3.3, 40.6
>>>> row 4: 4.2, 2.5
>>>> row 5: 5.1, 10.1
>>>> row 6: 6.0, 7.9
>>>> row 7: 8.0, 21.0
>>>> row 8: 100, 200
>>>>
>>>> 1 through 6 get combined because all their X values are within the
>>>> tolerance of some other X in the set that's been combined. 7's Y value
>>>> is within the tolerance of 2's Y, so that should be combined as well.
>>>> 8 is not combined because neither the X or Y value is within the
>>>> tolerance of any X or Y in the set that was combined.
>>>>
>>>> In python I can easily parse the data and identify the rows that need
>>>> to be combined, but then I've lost the ability to calculate the
>>>> average and std. The only way I can think of to do this is to remove
>>>> the grouping from the SQL and do all the grouping and aggregating
>>>> myself. But this query often returns 20k to 30k rows after grouping.
>>>> It could easily be 80k to 100k rows that I have to process if I remove
>>>> the grouping and I think that will be very slow.
>>>>
>>>> Anyone have any ideas?
>>>
>>>
>>> Could you compute the row-to-row values & write them to a temp table,
>>> then
>>> run the SQL that incorporates that result column?
>>
>> I thought of temp tables, but I could not come up with a way to use
>> them for this. How can I apply the x/y tolerance grouping in sql?
>
>
> Run the query you showed, saving the result to a temp table. In an sproc or
> your preferred app language, do the row-to-row processing to generate a new
> column in the temp table from the biz rules you outlined, now query the
> revised temp table as desired.

Not too clear on how this will help me. The issue with the query I
showed is that I don't have the individual rows that make up the
aggregate data of the rows I need to combine. I think I have to run a
query with no group by and do all the grouping and aggregation myself.

In any case, unfortunately this has been made a low priority task and
I've been put on to something else (I hate when they do that). I'll
revive this thread when I'm allowed to get back on this.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: grouping by the difference between values in rows

2014-01-12 Thread Larry Martell

On Sun, Jan 12, 2014 at 2:47 PM, Peter Brawley
 wrote:
> On 2014-01-12 1:17 PM, Larry Martell wrote:
>>
>> I've been asked to do something that I do not think is possible in SQL.
>>
>> I have a query that has this basic form:
>>
>> SELECT a, b, c, d, AVG(e), STD(e), CONCAT(x, ',', y) as f
>> FROM t
>> GROUP BY a, b, c, d, f
>>
>> x and y are numbers (378.18, 2213.797 or 378.218, 2213.949 or
>> 10053.490, 2542.094).
>>
>> The business issue is that if either x or y in 2 rows that are in the
>> same a, b, c, d group are within 1 of each other then they should be
>> grouped together. And to make it more complicated, the tolerance is
>> applied as a rolling continuum. For example, if the x and y in a set
>> of grouped rows are:
>>
>> row 1: 1.5, 9.5
>> row 2: 2.4, 20.8
>> row 3: 3.3, 40.6
>> row 4: 4.2, 2.5
>> row 5: 5.1, 10.1
>> row 6: 6.0, 7.9
>> row 7: 8.0, 21.0
>> row 8: 100, 200
>>
>> 1 through 6 get combined because all their X values are within the
>> tolerance of some other X in the set that's been combined. 7's Y value
>> is within the tolerance of 2's Y, so that should be combined as well.
>> 8 is not combined because neither the X or Y value is within the
>> tolerance of any X or Y in the set that was combined.
>>
>> In python I can easily parse the data and identify the rows that need
>> to be combined, but then I've lost the ability to calculate the
>> average and std. The only way I can think of to do this is to remove
>> the grouping from the SQL and do all the grouping and aggregating
>> myself. But this query often returns 20k to 30k rows after grouping.
>> It could easily be 80k to 100k rows that I have to process if I remove
>> the grouping and I think that will be very slow.
>>
>> Anyone have any ideas?
>
>
> Could you compute the row-to-row values & write them to a temp table, then
> run the SQL that incorporates that result column?

I thought of temp tables, but I could not come up with a way to use
them for this. How can I apply the x/y tolerance grouping in sql?

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

grouping by the difference between values in rows

2014-01-12 Thread Larry Martell

I've been asked to do something that I do not think is possible in SQL.

I have a query that has this basic form:

SELECT a, b, c, d, AVG(e), STD(e), CONCAT(x, ',', y) as f
FROM t
GROUP BY a, b, c, d, f

x and y are numbers (378.18, 2213.797 or 378.218, 2213.949 or
10053.490, 2542.094).

The business issue is that if either x or y in 2 rows that are in the
same a, b, c, d group are within 1 of each other then they should be
grouped together. And to make it more complicated, the tolerance is
applied as a rolling continuum. For example, if the x and y in a set
of grouped rows are:

row 1: 1.5, 9.5
row 2: 2.4, 20.8
row 3: 3.3, 40.6
row 4: 4.2, 2.5
row 5: 5.1, 10.1
row 6: 6.0, 7.9
row 7: 8.0, 21.0
row 8: 100, 200

1 through 6 get combined because all their X values are within the
tolerance of some other X in the set that's been combined. 7's Y value
is within the tolerance of 2's Y, so that should be combined as well.
8 is not combined because neither the X or Y value is within the
tolerance of any X or Y in the set that was combined.

In python I can easily parse the data and identify the rows that need
to be combined, but then I've lost the ability to calculate the
average and std. The only way I can think of to do this is to remove
the grouping from the SQL and do all the grouping and aggregating
myself. But this query often returns 20k to 30k rows after grouping.
It could easily be 80k to 100k rows that I have to process if I remove
the grouping and I think that will be very slow.

Anyone have any ideas?

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: regexp in conditional

2014-01-07 Thread Larry Martell

On Tue, Jan 7, 2014 at 12:33 PM, Michael Dykman  wrote:
> I think you just have a misplaced parenthesis.  try:
>  SELECT IF(r REGEXP '^ED[HMUZ][0-9]$', 'yes', 'no') FROM s;

Yes, thanks!

>
>
>
> On Tue, Jan 7, 2014 at 2:22 PM, Larry Martell wrote:
>
>> Can I use an regexp in a conditional? I need to do something like this:
>>
>> SELECT (IF r REGEXP '^ED[HMUZ][0-9]$', 'yes', 'no') FROM s;
>>
>> I'm getting an error from that. Is there some way to do this?

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: regexp in conditional

2014-01-07 Thread Larry Martell

On Tue, Jan 7, 2014 at 12:31 PM, Peter Brawley
 wrote:
>
> On 2014-01-07 1:22 PM, Larry Martell wrote:
>>
>> Can I use an regexp in a conditional? I need to do something like this:
>>
>> SELECT (IF r REGEXP '^ED[HMUZ][0-9]$', 'yes', 'no') FROM s;
>>
>> I'm getting an error from that. Is there some way to do this?
>
>
> SELECT IF( r REGEXP '^ED[HMUZ][0-9]$', 'yes', 'no' )
> FROM s;


Thanks!

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

regexp in conditional

2014-01-07 Thread Larry Martell

Can I use an regexp in a conditional? I need to do something like this:

SELECT (IF r REGEXP '^ED[HMUZ][0-9]$', 'yes', 'no') FROM s;

I'm getting an error from that. Is there some way to do this?

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: XML to RDB

2013-11-25 Thread Larry Martell

On Mon, Nov 25, 2013 at 8:29 AM, Steven Siebert  wrote:

> Larry,
>
>
> If that's the only type of queries you're gonna make, sounds like MySQL
> can be a good solution (from an indexing perspective, those queries will be
> fine).  Basically, you would be using MySQL as a read-only copy of the XML
> content...so your goal here is to understand how you'll be needing to
> query, and build the best model (in MySQL) you can to achieve these
> queries.  Going back to your initial question, I don't believe there is an
> organic MySQL tool (or even a 3rd party tool that I know about) that will
> do this for you - you'll probably have to write something, very small, to
> do this.  Not sure how much programming experience you have...but something
> like a simple SAX parser producing XML element and attribute "rows" that
> you (probably batch) insert into the schema you make will work quite
> easily...with very little code.
>

I've been programing for over 30 years. I already have written some python
code using ElementTree that parses the file.


> Be careful with the schema, especially the IDs.  The schema suggested by
> Johan will work in simple cases, but might become a problem with unbounded
> elements and if you wanted to break out XML arrays, for example (ie more
> than one instance of a node's ID).  If you need to preserve sequence, that
> might be something you need to consider too.  If you don't foresee the need
> to query the parent hierarchy, I wouldn't add it (no parent_id)...not only
> would it not be necessary, you have to consider how you want to query
> hierarchy...if you need to, checked out the Nested Set model as opposed to
> the adjacent list (see h
> ttp://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/).
> Again, this all depends on what you need...but based on the queries you
> indicated, you wouldn't need it.
>

Thanks for the link - the  Nested Set model looks very interesting, but as
you say, I probably won't need that. This will most likely be an iterative
process as how it will be queried will probably evolve.


> Good luck!
>
> S
>
>
>
>
> On Mon, Nov 25, 2013 at 4:29 AM, Johan De Meersman wrote:
>
>>  ___
>> | Nodes |
>> |---|
>> | id| ---
>> | [more fields] |   |
>> | parent_id | >--
>> |___|
>>
>>
>> And then you join the table with itself as needed. Do note that you'll
>> need a self-join for every level you query, and that every self-join
>> incrementally slows down the query.
>>
>> It may be beneficial (actually, I'm pretty sure it will be :-) ) to
>> implement the recursion in software as necessary, simply follow the branch
>> you want and check wether there are children at each turn; or pick up the
>> child and go back up until parent_id is NULL.
>>
>> Depending on what you want/need, you could also add a depth field, so you
>> don't have to loop to figure out where you are in the tree, etc.
>>
>> Storing node x/y/z as an identifier, as you suggest, is also an option
>> with it's own benefits and downsides; nothing prevents you from taking any
>> or all of those approaches at the same time.
>>
>>
>> - Original Message -
>> > From: "Larry Martell" 
>> > To: "Steven Siebert" 
>> > Cc: "mysql mailing list" 
>> > Sent: Friday, 22 November, 2013 3:17:44 PM
>> > Subject: Re: XML to RDB
>> >
>> > Yes, I will need to query the data, and yes, it's app specific to the
>> data.
>> > The parent-node structure will meet my needs if I can figure out how to
>> > preserver the unbound multi level nodes. There will be a higher level
>> > table, which will have the file name, date received, and other
>> identifying
>> > info that will be start of the query. The nodes will have a foreign key
>> > back to the file table. The queries will be something like "get the abcd
>> > parameter from node x/y/z from the file foo.xml from 10/10/13". I guess
>> I
>> > can just store the x/y/z in the node table (as opposed to trying to
>> > represent z is child of y which is a child of x in 3 different tables or
>> > rows.
>> >
>>
>> --
>> Unhappiness is discouraged and will be corrected with kitten pictures.
>>
>
>

Re: XML to RDB

2013-11-25 Thread Larry Martell

On Mon, Nov 25, 2013 at 4:29 AM, Johan De Meersman wrote:

>  ___
> | Nodes |
> |---|
> | id| ---
> | [more fields] |   |
> | parent_id | >--
> |___|
>
>
> And then you join the table with itself as needed. Do note that you'll
> need a self-join for every level you query, and that every self-join
> incrementally slows down the query.
>
> It may be beneficial (actually, I'm pretty sure it will be :-) ) to
> implement the recursion in software as necessary, simply follow the branch
> you want and check wether there are children at each turn; or pick up the
> child and go back up until parent_id is NULL.
>
> Depending on what you want/need, you could also add a depth field, so you
> don't have to loop to figure out where you are in the tree, etc.
>
> Storing node x/y/z as an identifier, as you suggest, is also an option
> with it's own benefits and downsides; nothing prevents you from taking any
> or all of those approaches at the same time.
>

Thanks for the reply. I've had very bad performance in the past with
multiple self joins on large tables. I think I will end up going with using
x/y/z.


>
> - Original Message -
> > From: "Larry Martell" 
> > To: "Steven Siebert" 
> > Cc: "mysql mailing list" 
> > Sent: Friday, 22 November, 2013 3:17:44 PM
> > Subject: Re: XML to RDB
> >
> > Yes, I will need to query the data, and yes, it's app specific to the
> data.
> > The parent-node structure will meet my needs if I can figure out how to
> > preserver the unbound multi level nodes. There will be a higher level
> > table, which will have the file name, date received, and other
> identifying
> > info that will be start of the query. The nodes will have a foreign key
> > back to the file table. The queries will be something like "get the abcd
> > parameter from node x/y/z from the file foo.xml from 10/10/13". I guess I
> > can just store the x/y/z in the node table (as opposed to trying to
> > represent z is child of y which is a child of x in 3 different tables or
> > rows.
> >
>
> --
> Unhappiness is discouraged and will be corrected with kitten pictures.
>

Re: XML to RDB

2013-11-22 Thread Larry Martell

On Fri, Nov 22, 2013 at 8:05 AM, Steven Siebert  wrote:

> Hi Larry,
>
> I'm trying to figure out what your wanting to do with the data once its in
> mysql?  At first it seemed you didn't want to put it in as a lob because
> you might want to query on the data in different ways (kind of an
> assumption on my part, but a common reason to do this).  Then, you thought
> about making a very generic schema (parent-node) which really doesn't lend
> itself well to this goal.
>
> Could you explain what you plan to do with the data in mysql and maybe a
> little about the goal of the application?  Is this app specific to this
> data, or are you looking to build an app that can take any schema defined
> XML file, ingest it, and allow you to do some work with it?
>
> Why is a lob not appropriate for your needs?
>

Yes, I will need to query the data, and yes, it's app specific to the data.
The parent-node structure will meet my needs if I can figure out how to
preserver the unbound multi level nodes. There will be a higher level
table, which will have the file name, date received, and other identifying
info that will be start of the query. The nodes will have a foreign key
back to the file table. The queries will be something like "get the abcd
parameter from node x/y/z from the file foo.xml from 10/10/13". I guess I
can just store the x/y/z in the node table (as opposed to trying to
represent z is child of y which is a child of x in 3 different tables or
rows.

On Nov 22, 2013 7:24 AM, "Larry Martell"  wrote:
>
>> I have a need to store data that comes in an XML file in MySQL. I don't
>> want to store the entire file as a CBLOB, and it doesn't lend itself to
>> loading with Load XML. I am looking for tools that will help me create the
>> schema and parse and load the data. I have googled for this and I've found
>> many scholarly papers written about it, but the examples are always with
>> very simple XML files, and I haven't found any tools. I do have the xsd
>> schema for the XML file, but what's confusing me about how to design the
>> RDB schema is the unbounded recursion of the data. The data is basically
>> nodes that have parameters which have items with values. My first thought
>> was to simply have a node table and a parameter table with a foreign key
>> back to the node it belongs to. But when digging into the data I found
>> that
>> nodes can have sub nodes - in one file I have this goes down for 7 levels,
>> but in theory there is no bound on that. I'm now sure how to best
>> represent
>> that.
>>
>> Any experiences, advice, or pointers are very welcome.
>>
>> Thanks-
>> larry
>>
>

XML to RDB

2013-11-22 Thread Larry Martell

I have a need to store data that comes in an XML file in MySQL. I don't
want to store the entire file as a CBLOB, and it doesn't lend itself to
loading with Load XML. I am looking for tools that will help me create the
schema and parse and load the data. I have googled for this and I've found
many scholarly papers written about it, but the examples are always with
very simple XML files, and I haven't found any tools. I do have the xsd
schema for the XML file, but what's confusing me about how to design the
RDB schema is the unbounded recursion of the data. The data is basically
nodes that have parameters which have items with values. My first thought
was to simply have a node table and a parameter table with a foreign key
back to the node it belongs to. But when digging into the data I found that
nodes can have sub nodes - in one file I have this goes down for 7 levels,
but in theory there is no bound on that. I'm now sure how to best represent
that.

Any experiences, advice, or pointers are very welcome.

Thanks-
larry

Re: Problem with having

2013-09-24 Thread Larry Martell

On Tue, Sep 24, 2013 at 9:05 AM, shawn green wrote:

> Hello Larry,
>
>
> On 9/23/2013 6:22 PM, Larry Martell wrote:
>
>> On Mon, Sep 23, 2013 at 3:15 PM, shawn green **
>> wrote:
>>
>>  Hi Larry,
>>>
>>>
>>> On 9/23/2013 3:58 PM, Larry Martell wrote:
>>>
>>>  On Mon, Sep 23, 2013 at 1:51 PM, Sukhjinder K. Narula
>>>> wrote:
>>>>
>>>>   Hi,
>>>>
>>>>>
>>>>> In your second query, you seem to have MIN(date_time), but you are
>>>>> talking about maximum. So your group by query is actually pulling the
>>>>> minimum date for this recipe.
>>>>>
>>>>>
>>>>>  I pasted the wrong query in. I get the same results regardless of if I
>>>> have
>>>> MIN or MAX - I get the id of the max, but the date_time of the min.
>>>>
>>>>
>>>>
>>>>   On Mon, Sep 23, 2013 at 3:35 PM, Larry Martell <
>>>> larry.mart...@gmail.com>
>>>>
>>>>> **wrote:
>>>>>
>>>>>
>>>>>   I want to find the rows from a table that have the max date_time for
>>>>>
>>>>>> each
>>>>>> recipe. I know I've done this before with group by and having, but I
>>>>>> can't
>>>>>> seem to get it to work now. I get the correct row id, but not the
>>>>>> correct
>>>>>> date_time. I'm sure I'm missing something simple.
>>>>>>
>>>>>> For purposes of showing an example, I'll use one recipe, 19166.
>>>>>>
>>>>>>
>>>>>> For that recipe here's the row I would want:
>>>>>>
>>>>>> mysql> select id, MAX(date_time) from data_cstmeta  where recipe_id =
>>>>>> 19166;
>>>>>> +-+-----+
>>>>>> | id  | MAX(date_time)  |
>>>>>> +-+-----+
>>>>>>
>>>>>> | 1151701 | 2013-02-07 18:38:13 |
>>>>>> +-+-----+
>>>>>>
>>>>>> 1 row in set (0.01 sec)
>>>>>>
>>>>>> I would think this query would give me that - it gives me the correct
>>>>>> id,
>>>>>> but not the correct date_time:
>>>>>>
>>>>>> mysql> SELECT id, date_time as MaxDateTime  FROM data_cstmeta  where
>>>>>> recipe_id = 19166 group by recipe_id HAVING MIN(date_time);
>>>>>> +-+-----+
>>>>>> | id  | MaxDateTime |
>>>>>> +-+-----+
>>>>>>
>>>>>> | 1151701 | 2010-12-13 16:16:55 |
>>>>>> +-+-----+
>>>>>>
>>>>>> 1 row in set (0.01 sec)
>>>>>>
>>>>>> How can I fix this?
>>>>>>
>>>>>> Thanks!
>>>>>> -larry
>>>>>>
>>>>>>
>>>>>>  You have to do a two-stage match. One stage to find the MAX() of a
>>> value
>>> for each recipe_id, the other to match that MAX() to one or more rows to
>>> give you the best ID values.
>>>
>>> Here's a subquery method of doing it.  There are many many others (google
>>> for "groupwize maximum")
>>>
>>> SELECT a.id, b.MaxDateTime
>>> FROM data_cstmeta a
>>> INNER JOIN (
>>>  SELECT MAX(date_time) MaxDateTime
>>>  FROM data_cstmeta
>>>  WHERE recipe_id = 19166
>>> ) b
>>>on b.MaxDateTime = a.date_time
>>> WHERE recipe_id = 19166;
>>>
>>>
>>>  Having the recipe_id in the query was just to show an example. I really
>> want the id's with the max date for each recipe_id:
>>
>> This is what I changed it to, which works, but is too slow. I need to find
>> a more efficient solution:
>>
>> SELECT d1.id, d1.date_time as MaxDateTime
>> FROM data_cstmeta d1
>> LEFT JOIN data_cstmeta d2 ON d1.recipe_id = d2.recipe_id AND
>> d1.date_time < d2.date_time
>> WHERE d2.recipe_id IS NULL
>>
>>
> As I said, there are many many ways to solve this problem. Here is one
> that is going to perform much better for the generic case than what you are
> doing.
>
> CREATE TEMPORARY TABLE tmpMaxDates (key(recipe_id,maxdatetime))
> SELECT recipe_id, max(date_time) maxdatetime
> FROM data_cstmeta
> GROUP BY recipe_id;
>
> SELECT a.id, b.maxdatetime
> FROM data_cstmeta a
> INNER JOIN tmpMaxDates b
>   on a.recipe_id = b.recipe_id
>   and a.date_time = b.maxdatetime;
>
> DROP TEMPORARY TABLE tmpMaxDates;
>
>
> Of course, an appropriate multi-column index on data_cstmeta would also
> make your technique much faster than it is today.
>
>
Thanks much Shawn! This ran in a few  seconds vs. 30 minutes for my
solution.

Re: Problem with having

2013-09-23 Thread Larry Martell

On Mon, Sep 23, 2013 at 2:17 PM, Sukhjinder K. Narula
wrote:

> Hi,
>
> I see that. So the query seems to be picking the first entry out of the
> after grouping by a field and displaying it. And it seems to make sense
> since Having clause seems incomplete. I believe we need to complete the
> condition by HAVING MIN(date_time) <, > or = something.
>

After reading this, I see what the problem is:

http://dev.mysql.com/doc/refman/5.0/en/group-by-extensions.html

Then I read this:

http://dev.mysql.com/doc/refman/5.0/en/example-maximum-column-group-row.html

and changed it to this:

SELECT d1.id, d1.date_time as MaxDateTime
   FROM data_cstmeta d1
   LEFT JOIN data_cstmeta d2 ON d1.recipe_id = d2.recipe_id AND
d1.date_time < d2.date_time
   WHERE d2.recipe_id IS NULL

Which works, but is painfully slow. For a table with 200k rows it's been
running for 25 minutes and isn't done yet. That will be unacceptable to my
users.


> On Mon, Sep 23, 2013 at 3:58 PM, Larry Martell wrote:
>
>> On Mon, Sep 23, 2013 at 1:51 PM, Sukhjinder K. Narula <
>> narula...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> In your second query, you seem to have MIN(date_time), but you are
>>> talking about maximum. So your group by query is actually pulling the
>>> minimum date for this recipe.
>>>
>>
>> I pasted the wrong query in. I get the same results regardless of if I
>> have MIN or MAX - I get the id of the max, but the date_time of the min.
>>
>>
>>
>>> On Mon, Sep 23, 2013 at 3:35 PM, Larry Martell 
>>> wrote:
>>>
>>>> I want to find the rows from a table that have the max date_time for
>>>> each
>>>> recipe. I know I've done this before with group by and having, but I
>>>> can't
>>>> seem to get it to work now. I get the correct row id, but not the
>>>> correct
>>>> date_time. I'm sure I'm missing something simple.
>>>>
>>>> For purposes of showing an example, I'll use one recipe, 19166.
>>>>
>>>>
>>>> For that recipe here's the row I would want:
>>>>
>>>> mysql> select id, MAX(date_time) from data_cstmeta  where recipe_id =
>>>> 19166;
>>>> +-+-+
>>>> | id  | MAX(date_time)  |
>>>> +-+-+
>>>> | 1151701 | 2013-02-07 18:38:13 |
>>>> +-+-+
>>>> 1 row in set (0.01 sec)
>>>>
>>>> I would think this query would give me that - it gives me the correct
>>>> id,
>>>> but not the correct date_time:
>>>>
>>>> mysql> SELECT id, date_time as MaxDateTime  FROM data_cstmeta  where
>>>> recipe_id = 19166 group by recipe_id HAVING MIN(date_time);
>>>> +-+-+
>>>> | id  | MaxDateTime |
>>>> +-+-+
>>>> | 1151701 | 2010-12-13 16:16:55 |
>>>> +-+-+
>>>> 1 row in set (0.01 sec)
>>>>
>>>> How can I fix this?
>>>>
>>>> Thanks!
>>>> -larry
>>>>
>>>
>>>
>>
>

Re: Problem with having

2013-09-23 Thread Larry Martell

On Mon, Sep 23, 2013 at 3:15 PM, shawn green wrote:

> Hi Larry,
>
>
> On 9/23/2013 3:58 PM, Larry Martell wrote:
>
>> On Mon, Sep 23, 2013 at 1:51 PM, Sukhjinder K. Narula
>> wrote:
>>
>>  Hi,
>>>
>>> In your second query, you seem to have MIN(date_time), but you are
>>> talking about maximum. So your group by query is actually pulling the
>>> minimum date for this recipe.
>>>
>>>
>> I pasted the wrong query in. I get the same results regardless of if I
>> have
>> MIN or MAX - I get the id of the max, but the date_time of the min.
>>
>>
>>
>>  On Mon, Sep 23, 2013 at 3:35 PM, Larry Martell 
>>> **wrote:
>>>
>>>  I want to find the rows from a table that have the max date_time for
>>>> each
>>>> recipe. I know I've done this before with group by and having, but I
>>>> can't
>>>> seem to get it to work now. I get the correct row id, but not the
>>>> correct
>>>> date_time. I'm sure I'm missing something simple.
>>>>
>>>> For purposes of showing an example, I'll use one recipe, 19166.
>>>>
>>>>
>>>> For that recipe here's the row I would want:
>>>>
>>>> mysql> select id, MAX(date_time) from data_cstmeta  where recipe_id =
>>>> 19166;
>>>> +-+---**--+
>>>> | id  | MAX(date_time)  |
>>>> +-+---**--+
>>>> | 1151701 | 2013-02-07 18:38:13 |
>>>> +-+---**--+
>>>> 1 row in set (0.01 sec)
>>>>
>>>> I would think this query would give me that - it gives me the correct
>>>> id,
>>>> but not the correct date_time:
>>>>
>>>> mysql> SELECT id, date_time as MaxDateTime  FROM data_cstmeta  where
>>>> recipe_id = 19166 group by recipe_id HAVING MIN(date_time);
>>>> +-+---**--+
>>>> | id  | MaxDateTime |
>>>> +-+---**--+
>>>> | 1151701 | 2010-12-13 16:16:55 |
>>>> +-+---**--+
>>>> 1 row in set (0.01 sec)
>>>>
>>>> How can I fix this?
>>>>
>>>> Thanks!
>>>> -larry
>>>>
>>>>
> You have to do a two-stage match. One stage to find the MAX() of a value
> for each recipe_id, the other to match that MAX() to one or more rows to
> give you the best ID values.
>
> Here's a subquery method of doing it.  There are many many others (google
> for "groupwize maximum")
>
> SELECT a.id, b.MaxDateTime
> FROM data_cstmeta a
> INNER JOIN (
> SELECT MAX(date_time) MaxDateTime
> FROM data_cstmeta
> WHERE recipe_id = 19166
> ) b
>   on b.MaxDateTime = a.date_time
> WHERE recipe_id = 19166;
>
>
Having the recipe_id in the query was just to show an example. I really
want the id's with the max date for each recipe_id:

This is what I changed it to, which works, but is too slow. I need to find
a more efficient solution:

SELECT d1.id, d1.date_time as MaxDateTime
   FROM data_cstmeta d1
   LEFT JOIN data_cstmeta d2 ON d1.recipe_id = d2.recipe_id AND
d1.date_time < d2.date_time
   WHERE d2.recipe_id IS NULL

Re: Problem with having

2013-09-23 Thread Larry Martell

On Mon, Sep 23, 2013 at 1:51 PM, Sukhjinder K. Narula
wrote:

> Hi,
>
> In your second query, you seem to have MIN(date_time), but you are
> talking about maximum. So your group by query is actually pulling the
> minimum date for this recipe.
>

I pasted the wrong query in. I get the same results regardless of if I have
MIN or MAX - I get the id of the max, but the date_time of the min.



> On Mon, Sep 23, 2013 at 3:35 PM, Larry Martell wrote:
>
>> I want to find the rows from a table that have the max date_time for each
>> recipe. I know I've done this before with group by and having, but I can't
>> seem to get it to work now. I get the correct row id, but not the correct
>> date_time. I'm sure I'm missing something simple.
>>
>> For purposes of showing an example, I'll use one recipe, 19166.
>>
>>
>> For that recipe here's the row I would want:
>>
>> mysql> select id, MAX(date_time) from data_cstmeta  where recipe_id =
>> 19166;
>> +-+-+
>> | id  | MAX(date_time)  |
>> +-+-+
>> | 1151701 | 2013-02-07 18:38:13 |
>> +-+-+
>> 1 row in set (0.01 sec)
>>
>> I would think this query would give me that - it gives me the correct id,
>> but not the correct date_time:
>>
>> mysql> SELECT id, date_time as MaxDateTime  FROM data_cstmeta  where
>> recipe_id = 19166 group by recipe_id HAVING MIN(date_time);
>> +-+-+
>> | id  | MaxDateTime |
>> +-+-+
>> | 1151701 | 2010-12-13 16:16:55 |
>> +-+-+
>> 1 row in set (0.01 sec)
>>
>> How can I fix this?
>>
>> Thanks!
>> -larry
>>
>
>

Problem with having

2013-09-23 Thread Larry Martell

I want to find the rows from a table that have the max date_time for each
recipe. I know I've done this before with group by and having, but I can't
seem to get it to work now. I get the correct row id, but not the correct
date_time. I'm sure I'm missing something simple.

For purposes of showing an example, I'll use one recipe, 19166.


For that recipe here's the row I would want:

mysql> select id, MAX(date_time) from data_cstmeta  where recipe_id = 19166;
+-+-+
| id  | MAX(date_time)  |
+-+-+
| 1151701 | 2013-02-07 18:38:13 |
+-+-+
1 row in set (0.01 sec)

I would think this query would give me that - it gives me the correct id,
but not the correct date_time:

mysql> SELECT id, date_time as MaxDateTime  FROM data_cstmeta  where
recipe_id = 19166 group by recipe_id HAVING MIN(date_time);
+-+-+
| id  | MaxDateTime |
+-+-+
| 1151701 | 2010-12-13 16:16:55 |
+-+-+
1 row in set (0.01 sec)

How can I fix this?

Thanks!
-larry

Re: 1 file

2013-07-04 Thread Larry Martell

On Wed, Jul 3, 2013 at 9:51 AM, shawn green  wrote:
> Hello Larry,
>
>
> On 7/3/2013 11:27 AM, Larry Martell wrote:
>>
>> We recently changed from in memory files to InnoDB files. Today we
>> noticed that in every server's data dir there is file called '1' that
>> seems to get updated every time the iddata1 file gets updated. On some
>> servers it's comparable in size to the iddata1 file, on other servers
>> it's 10-15x larger, and on others it's 1/2 the size. What is this
>> file. Googling revealed nothing about this.
>>
>
> That is not something an official MySQL build would do. Consult with the
> person (or group) that compiled your binaries.
>
> Now, if you have enabled --innodb-file-per-table and if you have named your
> table '1' then that file is probably '1.ibd'.  That would be expected. But
> that seems unlikely based on your other details.
>
> Did you also enable a separate undo log, perhaps? Although if you had, it
> should be 'undo1' not just '1'
> http://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_undo_tablespaces
>
> So, that simple '1' file also seems unusual to me.

Thanks for the reply.

I asked our DBA group and here's the answer I got:

The file is currently accessed by mysqld, please don’t delete it.
Looking at the file header, it appeared to be an innodb datafile.
But no idea how it was created.

Sigh.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

1 file

2013-07-03 Thread Larry Martell

We recently changed from in memory files to InnoDB files. Today we
noticed that in every server's data dir there is file called '1' that
seems to get updated every time the iddata1 file gets updated. On some
servers it's comparable in size to the iddata1 file, on other servers
it's 10-15x larger, and on others it's 1/2 the size. What is this
file. Googling revealed nothing about this.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: 1 file

2013-07-03 Thread Larry Martell

On Wed, Jul 3, 2013 at 11:34 AM, Peterson, Timothy R
 wrote:
>
>
>> -Original Message-----
>> From: Larry Martell [mailto:larry.mart...@gmail.com]
>> Sent: Wednesday, July 03, 2013 11:29 AM
>> To: shawn green
>> Cc: mysql@lists.mysql.com
>> Subject: Re: 1 file
>>
>> On Wed, Jul 3, 2013 at 9:51 AM, shawn green 
>> wrote:
>> > Hello Larry,
>> >
>> >
>> > On 7/3/2013 11:27 AM, Larry Martell wrote:
>> >>
>> >> We recently changed from in memory files to InnoDB files. Today we
>> >> noticed that in every server's data dir there is file called '1' that
>> >> seems to get updated every time the iddata1 file gets updated. On some
>> >> servers it's comparable in size to the iddata1 file, on other servers
>> >> it's 10-15x larger, and on others it's 1/2 the size. What is this
>> >> file. Googling revealed nothing about this.
>> >>
>> >
>> > That is not something an official MySQL build would do. Consult with the
>> > person (or group) that compiled your binaries.
>> >
>> > Now, if you have enabled --innodb-file-per-table and if you have named
>> your
>> > table '1' then that file is probably '1.ibd'.  That would be expected. But
>> > that seems unlikely based on your other details.
>> >
>> > Did you also enable a separate undo log, perhaps? Although if you had, it
>> > should be 'undo1' not just '1'
>> > http://dev.mysql.com/doc/refman/5.6/en/innodb-
>> parameters.html#sysvar_innodb_undo_tablespaces
>> >
>> > So, that simple '1' file also seems unusual to me.
>>
>> Thanks for the reply.
>>
>> I asked our DBA group and here's the answer I got:
>>
>> The file is currently accessed by mysqld, please don’t delete it.
>> Looking at the file header, it appeared to be an innodb datafile.
>> But no idea how it was created.
>>
>> Sigh.
>>
>> --
>
> I believe this is due to one of the variables in your my.cnf file.
> You probably set it to 1, thinking this enabled it, but in actuality, the 
> variable can be set by just putting it the my.cnf file
> If it has an assignment, then the assignment is the file name to use.
> I think there are several variables that this is the case for
> For example, log and log_error
> If you have log_error=1 or log=1 this would be the file
> There may be others variables, but one of those is where I have seen the 1 
> file being created before
> If it isn't one of those two (which it may not, since you said the header 
> looks to be innodb),

I didn't say that - one of my DBA's said that. I didn't have
permission to look at the file. But one of my sysadmins did and she
found that it was ascii text.

> I would review all the variables that are set to =1, and see if that is the 
> proper assignment

I did that and I found it:

log-slow-queries = 1

mysql> show variables like 'slow_query_log_file';
+-+---+
| Variable_name   | Value |
+-+---+
| slow_query_log_file | 1 |
+-+---+
1 row in set (0.00 sec)

Thanks!

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: 1 file

2013-07-03 Thread Larry Martell

On Wed, Jul 3, 2013 at 12:36 PM, Rick James  wrote:
> I once found a slowlog called simply "1".  But I did not track down the 
> cause.  Possibly it was a not-so-correct configuration script.
>
> SHOW VARIABLES LIKE '%dir%';
>

Nothing was set to 1


> ibdata1 grows (never shrinks) when data is added, ALTER is done, etc.  It 
> will reuse free space within itself.
>
> innodb_file_per_table=1 is recommended
>
> Having an explicit PRIMARY KEY on InnoDB tables is recommended.  (MEMORY did 
> not care much.)
>
>> -Original Message-
>> From: Larry Martell [mailto:larry.mart...@gmail.com]
>> Sent: Wednesday, July 03, 2013 9:29 AM
>> To: shawn green
>> Cc: mysql@lists.mysql.com
>> Subject: Re: 1 file
>>
>> On Wed, Jul 3, 2013 at 9:51 AM, shawn green 
>> wrote:
>> > Hello Larry,
>> >
>> >
>> > On 7/3/2013 11:27 AM, Larry Martell wrote:
>> >>
>> >> We recently changed from in memory files to InnoDB files. Today we
>> >> noticed that in every server's data dir there is file called '1' that
>> >> seems to get updated every time the iddata1 file gets updated. On
>> >> some servers it's comparable in size to the iddata1 file, on other
>> >> servers it's 10-15x larger, and on others it's 1/2 the size. What is
>> >> this file. Googling revealed nothing about this.
>> >>
>> >
>> > That is not something an official MySQL build would do. Consult with
>> > the person (or group) that compiled your binaries.
>> >
>> > Now, if you have enabled --innodb-file-per-table and if you have named
>> > your table '1' then that file is probably '1.ibd'.  That would be
>> > expected. But that seems unlikely based on your other details.
>> >
>> > Did you also enable a separate undo log, perhaps? Although if you had,
>> > it should be 'undo1' not just '1'
>> > http://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_i
>> > nnodb_undo_tablespaces
>> >
>> > So, that simple '1' file also seems unusual to me.
>>
>> Thanks for the reply.
>>
>> I asked our DBA group and here's the answer I got:
>>
>> The file is currently accessed by mysqld, please don’t delete it.
>> Looking at the file header, it appeared to be an innodb datafile.
>> But no idea how it was created.
>>
>> Sigh.
>>
>> --
>> MySQL General Mailing List
>> For list archives: http://lists.mysql.com/mysql
>> To unsubscribe:http://lists.mysql.com/mysql
>

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: Performance of delete using in

2013-04-24 Thread Larry Martell

I changed it to delete one row at a time and it's taking 3 minutes.


On Wed, Apr 24, 2013 at 6:52 PM, Larry Martell  wrote:
> That is the entire sql statement - I didn't think I needed to list the
> 1500 ints that are in the in clause.
>
> Also want to mention that I ran explain on it, and it is using the
> index on event_id.
>
> On Wed, Apr 24, 2013 at 6:49 PM, Michael Dykman  wrote:
>> You would have to show us the whole sql statement but often 'in' clauses can
>> be refactored into equivalent joins which tend to improve performance
>> tremendously.
>>
>>  - michael dykman
>>
>>
>> On Wed, Apr 24, 2013 at 8:19 PM, Larry Martell 
>> wrote:
>>>
>>> I have a table that has 2.5 million rows and 9 columns that are all
>>> int except for 2 varchar(255) - i.e. not that big of a table. I am
>>> executing a delete from that table like this:
>>>
>>> delete from cdsem_event_message_idx where event_id in ()
>>>
>>> The in clause has around 1,500 items in it. event_id is an int, and
>>> there is an index on event_id. This statement is taking 1 hour and 5
>>> minutes to run. There is nothing else hitting the database at that
>>> time, and the machine it's running on is 97% idle and has plenty of
>>> free memory. This seems extremely excessive to me. I would guess it's
>>> because of the in clause. Is there some better way to do a delete like
>>> this?
>>>
>>> --
>>> MySQL General Mailing List
>>> For list archives: http://lists.mysql.com/mysql
>>> To unsubscribe:http://lists.mysql.com/mysql
>>>
>>
>>
>>
>> --
>>  - michael dykman
>>  - mdyk...@gmail.com
>>
>>  May the Source be with you.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: Performance of delete using in

2013-04-24 Thread Larry Martell

That is the entire sql statement - I didn't think I needed to list the
1500 ints that are in the in clause.

Also want to mention that I ran explain on it, and it is using the
index on event_id.

On Wed, Apr 24, 2013 at 6:49 PM, Michael Dykman  wrote:
> You would have to show us the whole sql statement but often 'in' clauses can
> be refactored into equivalent joins which tend to improve performance
> tremendously.
>
>  - michael dykman
>
>
> On Wed, Apr 24, 2013 at 8:19 PM, Larry Martell 
> wrote:
>>
>> I have a table that has 2.5 million rows and 9 columns that are all
>> int except for 2 varchar(255) - i.e. not that big of a table. I am
>> executing a delete from that table like this:
>>
>> delete from cdsem_event_message_idx where event_id in ()
>>
>> The in clause has around 1,500 items in it. event_id is an int, and
>> there is an index on event_id. This statement is taking 1 hour and 5
>> minutes to run. There is nothing else hitting the database at that
>> time, and the machine it's running on is 97% idle and has plenty of
>> free memory. This seems extremely excessive to me. I would guess it's
>> because of the in clause. Is there some better way to do a delete like
>> this?
>>
>> --
>> MySQL General Mailing List
>> For list archives: http://lists.mysql.com/mysql
>> To unsubscribe:http://lists.mysql.com/mysql
>>
>
>
>
> --
>  - michael dykman
>  - mdyk...@gmail.com
>
>  May the Source be with you.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: Query Help

2013-04-19 Thread Larry Martell

On Fri, Apr 19, 2013 at 8:24 AM, Richard Reina  wrote:
> Hello All,
>
> Happy Friday! I know how to do the following query:
>
>>select count(*) from sales where WEEK(sale_date)=15 AND
> YEAR(sale_date)=2013;
>
> But can someone tell me I can do a query that will give me:
>
> the count(*) for each week of 2013 so that I end up with:
>
> WEEK | COUNT
> 1   | 22
> 2   | 31
> 3   | 29
> etc...

select WEEK(sale_date), count(*) from sales where YEAR(sale_date)=2013
group by WEEK(sale_date);

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

logging in using host alias not working

2013-04-18 Thread Larry Martell

We use host aliases to connect to MySQL all the time, never had an
issue before. Today we added a new alias, and we cannot connect to the
server using that one alias but only when we are on the local machine.

Here is the NIS entry for this host:

# ypmatch -k ubshp2 hosts
ubshp2 192.132.2.143ubshp2.predict.com ubshp2 ti-us-dev
intradb-au-qa intradb-us-alpha intradb-us-dev ti-test-dr
intradb-test-dr

I can connect from all these aliases, except intradb-test-dr when I am
on ubshp2, e.g.:

>From ubshp2 using the intradb-us-dev alias, works:

ti-test-dr@ubshp2: mysql -h intradb-us-dev -u tradeinfra -p intradb_test_dr
Welcome to the MySQL monitor.  Commands end with ; or \g.

>From ubshp2 using the intradb-au-qa alias, works:

ti-test-dr@ubshp2: mysql -h intradb-au-qa  -u tradeinfra -p intradb_test_dr
Welcome to the MySQL monitor.  Commands end with ; or \g.

>From ubshp2 using the intradb-test-dr alias, does not work:

ti-test-dr@ubshp2: mysql -h intradb-test-dr -u tradeinfra -p intradb_test_dr
ERROR 1045 (28000): Access denied for user
'tradeinfra'@'ubshp2.predict.com' (using password: YES)

But from a different host using the intradb-test-dr alias, works:

ti-test-dr@hades: mysql -h intradb-test-dr -u tradeinfra -p intradb_test_dr
Welcome to the MySQL monitor.  Commands end with ; or \g.

Anyone have any ideas as to why this one is not working? I've been
messing with this for 2 days. Nothing in the error log. I've dropped
and re-added the user, I've bounced the server, I've removed and
re-added the alias. I've googled and googled and found nothing.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: logging in using host alias not working

2013-04-18 Thread Larry Martell

Found the problem. There was cached DNS info on the host. Restarted
nscd and then it worked.

On Thu, Apr 18, 2013 at 10:59 AM, Larry Martell  wrote:
> We use host aliases to connect to MySQL all the time, never had an
> issue before. Today we added a new alias, and we cannot connect to the
> server using that one alias but only when we are on the local machine.
>
> Here is the NIS entry for this host:
>
> # ypmatch -k ubshp2 hosts
> ubshp2 192.132.2.143ubshp2.predict.com ubshp2 ti-us-dev
> intradb-au-qa intradb-us-alpha intradb-us-dev ti-test-dr
> intradb-test-dr
>
> I can connect from all these aliases, except intradb-test-dr when I am
> on ubshp2, e.g.:
>
> From ubshp2 using the intradb-us-dev alias, works:
>
> ti-test-dr@ubshp2: mysql -h intradb-us-dev -u tradeinfra -p 
> intradb_test_dr
> Welcome to the MySQL monitor.  Commands end with ; or \g.
>
> From ubshp2 using the intradb-au-qa alias, works:
>
> ti-test-dr@ubshp2: mysql -h intradb-au-qa  -u tradeinfra -p 
> intradb_test_dr
> Welcome to the MySQL monitor.  Commands end with ; or \g.
>
> From ubshp2 using the intradb-test-dr alias, does not work:
>
> ti-test-dr@ubshp2: mysql -h intradb-test-dr -u tradeinfra -p 
> intradb_test_dr
> ERROR 1045 (28000): Access denied for user
> 'tradeinfra'@'ubshp2.predict.com' (using password: YES)
>
> But from a different host using the intradb-test-dr alias, works:
>
> ti-test-dr@hades: mysql -h intradb-test-dr -u tradeinfra -p 
> intradb_test_dr
> Welcome to the MySQL monitor.  Commands end with ; or \g.
>
> Anyone have any ideas as to why this one is not working? I've been
> messing with this for 2 days. Nothing in the error log. I've dropped
> and re-added the user, I've bounced the server, I've removed and
> re-added the alias. I've googled and googled and found nothing.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: MySQL on Mac OS 10.8

2013-04-17 Thread Larry Martell

On Wed, Apr 17, 2013 at 8:32 AM, Reindl Harald  wrote:
>
>
> Am 17.04.2013 15:32, schrieb Larry Martell:
>> On Tue, Apr 16, 2013 at 11:31 AM, Reindl Harald  
>> wrote:
>>> Am 16.04.2013 18:34, schrieb Larry Martell:
>>>> I just set up mysql on Mac OS 10.8. Each time after the machine is
>>>> rebooted the server fails to start with:
>>>>
>>>> 2013-04-13 14:09:54 1 [ERROR] /usr/local/mysql/bin/mysqld: Can't
>>>> create/write to file '/var/run/mysqld/mysqld.pid' (Errcode: 2 - No
>>>> such file or directory)
>>>>
>>>> Because the /var/run/mysqld dir does not exist. I have to create it
>>>> manually then the server starts. But I have to do this after each
>>>> reboot, which is a pain. Anyone know why the /var/run/mysqld dir gets
>>>> deleted, or how to get around this?
>>>
>>> you need to consult your operating system's documentation
>>> on linux with systemd it would be the following config
>>>
>>> [root@srv-rhsoft:~]$ cat /etc/tmpfiles.d/mysql.conf
>>> d /var/run/mysqld 0755 mysql mysql -
>>
>> Doesn't seem to be anything like that running on MacOS. I googled and
>> didn't find anything. I posted to a mac forum, but haven't received
>> any responses yet. I'll have to try SO
>
> you need to modify the StartupItem to create the directory
> before starting mysqld, as said: OS manual
>
> http://dev.mysql.com/doc/mysql-macosx-excerpt/5.5/en/macosx-installation-startupitem.html

Yes, I have this, but it doesn't work if /var/run/mysqld is not there.
I still have to manually create that dir after each reboot.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: MySQL on Mac OS 10.8

2013-04-17 Thread Larry Martell

On Tue, Apr 16, 2013 at 11:31 AM, Reindl Harald  wrote:
> Am 16.04.2013 18:34, schrieb Larry Martell:
>> I just set up mysql on Mac OS 10.8. Each time after the machine is
>> rebooted the server fails to start with:
>>
>> 2013-04-13 14:09:54 1 [ERROR] /usr/local/mysql/bin/mysqld: Can't
>> create/write to file '/var/run/mysqld/mysqld.pid' (Errcode: 2 - No
>> such file or directory)
>>
>> Because the /var/run/mysqld dir does not exist. I have to create it
>> manually then the server starts. But I have to do this after each
>> reboot, which is a pain. Anyone know why the /var/run/mysqld dir gets
>> deleted, or how to get around this?
>
> you need to consult your operating system's documentation
> on linux with systemd it would be the following config
>
> [root@srv-rhsoft:~]$ cat /etc/tmpfiles.d/mysql.conf
> d /var/run/mysqld 0755 mysql mysql -

Doesn't seem to be anything like that running on MacOS. I googled and
didn't find anything. I posted to a mac forum, but haven't received
any responses yet. I'll have to try SO.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: MySQL on Mac OS 10.8

2013-04-16 Thread Larry Martell

On Tue, Apr 16, 2013 at 10:50 AM, Michael Dykman  wrote:
> Larry,
>
> Nothing in the mysql startup files ever removes any directories of any kind.
>
> At a guess: only my clients who work on Macs ever report this king of
> 'disappearing folder' behaviour.  And every time it turn out to be Time
> Machine.  Ask around on those lists.

I'm not running time machine on this box. I realize it's Mac specific,
I just thought someone here might have experienced and solved this.
Thanks for the reply.

> On Tue, Apr 16, 2013 at 12:34 PM, Larry Martell 
> wrote:
>>
>> I just set up mysql on Mac OS 10.8. Each time after the machine is
>> rebooted the server fails to start with:
>>
>> 2013-04-13 14:09:54 1 [ERROR] /usr/local/mysql/bin/mysqld: Can't
>> create/write to file '/var/run/mysqld/mysqld.pid' (Errcode: 2 - No
>> such file or directory)
>>
>> Because the /var/run/mysqld dir does not exist. I have to create it
>> manually then the server starts. But I have to do this after each
>> reboot, which is a pain. Anyone know why the /var/run/mysqld dir gets
>> deleted, or how to get around this?
>>
>> --
>> MySQL General Mailing List
>> For list archives: http://lists.mysql.com/mysql
>> To unsubscribe:http://lists.mysql.com/mysql
>>
>
>
>
> --
>  - michael dykman
>  - mdyk...@gmail.com
>
>  May the Source be with you.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: Please Help. selectcol_arrayref problem

2013-04-02 Thread Larry Martell

On Tue, Apr 2, 2013 at 11:30 AM, Richard Reina  wrote:
> use DBI;
> my $dbh = DBI->connect( "DBI:mysql:rushload;192.168.0.1", $usrr, $passw, {
> RaiseError => 3 } );
> my $dbs = $dbh->selectcol_arrayref("show databases");
>
> #my $dsn = "dbi:mysql:information_schema:192.168.0.1:3306";
> #my $dbh = DBI->connect($dsn, $usrr, $passw);
>
> my $dbs = $dbh->selectcol_arrayref('show databases');
>
> print "@$dbs\n";
>
> When I query the server for a list of databases with the code above it
> returns the name of just two and there are over 10.
>
> Any ideas?

Permissions - the user you're logging in as probably only has
permission to see the 2 that are being returned.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: log sequence number InnoDB: is in the future!?

2013-02-03 Thread Larry Martell

On Sun, Feb 3, 2013 at 7:23 AM, walter harms  wrote:
>
>
> Am 02.02.2013 01:34, schrieb Larry Martell:
>> On Mon, Jan 28, 2013 at 5:01 AM, walter harms  wrote:
>>> hi list,
>>>
>>> i am using mysql 5.1.53.
>>> after a crash i have the follwing error in my log:
>>>
>>> 130128 10:45:25  InnoDB: Error: page 61 log sequence number 0 2871649158
>>> InnoDB: is in the future! Current system log sequence number 0 2494349480.
>>> InnoDB: Your database may be corrupt or you may have copied the InnoDB
>>> InnoDB: tablespace but not the InnoDB log files. See
>>> InnoDB: http://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html
>>> InnoDB: for more information.
>>>
>>> according to the doc's at 
>>> http://dev.mysql.com/doc/refman/5.1/en/forcing-innodb-recovery.html
>>> I need to restore the database from scratch (short version). What i do not 
>>> understand is what
>>> exactly is broken ?  Whole DBM ? One Instance ? (mysqlcheck says all tables 
>>> are ok).
>>>
>>> Not all tables are INNODB. Is is possible to restore only immodb tables ? 
>>> (Having fun with forgein keys)
>>>
>>> Or is there a better way to handle this ?
>>
>>
>> We had the same thing happen to us today. We had a power hit and when
>> the server came back we got the log sequences numbers in the future
>> message. We were able to dump the
>> affected tables, but when we tried to restore them we were not able to
>> drop the old tables. When we tried the server crashed with:
>>
>> InnoDB: Failing assertion not_full_n_used >= descr_n_used
>>
>> We did try booting with innodb_force_recovery at all levels from 1 to
>> 6 with the same results.
>>
>> We still have not figured out what to do. Pretty big disaster.
>>
>
> Yep, a serious problem.
> I tried several thinks that came to my mind but this was all useless
> i had to drop the database and manualy rm ib_datalog0/1 (?).
>
> Did you already got the funny errormsg about rawpartions ?
>
> I must admit that we made several test before using innodb but we
> never had such problem, actualy we are happy with that but that
> kind of problems cost me three days of backup replay.

We also ended up dropping the database and restoring from dumps.
However all recent dumps ended up having a similar corruption and we
were still getting the same errors. We had to go back to an October
dump before it would come up cleanly. And our db is fairly large, and
it takes around 4 hours to load a dump. We were working on this Friday
from 8:30 AM until 4AM Saturday before we got it going. And now we're
trying to recall all the alters we did since then, and reload all the
data since then, most of which is in files we can import. The only
thing we don't have is the manual updates done. All in all a total
disaster and something that will make us rethink our procedures.
Perhaps we'll look at replication, although I don't know if that would
have helped in this case.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: log sequence number InnoDB: is in the future!?

2013-02-01 Thread Larry Martell

On Mon, Jan 28, 2013 at 5:01 AM, walter harms  wrote:
> hi list,
>
> i am using mysql 5.1.53.
> after a crash i have the follwing error in my log:
>
> 130128 10:45:25  InnoDB: Error: page 61 log sequence number 0 2871649158
> InnoDB: is in the future! Current system log sequence number 0 2494349480.
> InnoDB: Your database may be corrupt or you may have copied the InnoDB
> InnoDB: tablespace but not the InnoDB log files. See
> InnoDB: http://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html
> InnoDB: for more information.
>
> according to the doc's at 
> http://dev.mysql.com/doc/refman/5.1/en/forcing-innodb-recovery.html
> I need to restore the database from scratch (short version). What i do not 
> understand is what
> exactly is broken ?  Whole DBM ? One Instance ? (mysqlcheck says all tables 
> are ok).
>
> Not all tables are INNODB. Is is possible to restore only immodb tables ? 
> (Having fun with forgein keys)
>
> Or is there a better way to handle this ?


We had the same thing happen to us today. We had a power hit and when
the server came back we got the log sequences numbers in the future
message. We were able to dump the
affected tables, but when we tried to restore them we were not able to
drop the old tables. When we tried the server crashed with:

InnoDB: Failing assertion not_full_n_used >= descr_n_used

We did try booting with innodb_force_recovery at all levels from 1 to
6 with the same results.

We still have not figured out what to do. Pretty big disaster.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: adding column breaks query

2013-01-28 Thread Larry Martell

Not sure what you mean.

On Mon, Jan 28, 2013 at 8:49 AM, Douglas Nelson
 wrote:
> two different fields??
>
> Thanks Doug
>
> SELECT data_ppl.name as 'P/P/L', data_recipe.name as Recipe,
> data_cstmeta.date_time as 'Last Run'
> FROM data_recipe, data_ppl, data_cstmeta
>INNER JOIN (SELECT id,recipe_id, MAX(date_time) as MaxDateTime
>FROM data_cstmeta
>GROUP BY recipe_id) grouped
>
> ON data_cstmeta.id = grouped.id
>AND data_cstmeta.date_time = grouped.MaxDateTime
> WHERE data_cstmeta.ppl_id = data_ppl.id
> AND data_cstmeta.recipe_id = data_recipe.id
> ORDER BY data_cstmeta.date_time desc;
>
>
> Now I need to add one more column to it, data_tool.name, so I did this:
>
>
> SELECT data_ppl.name as 'P/P/L', data_tool.name as Tool,
> data_recipe.name as Recipe, data_cstmeta.date_time as 'Last Run'
> FROM data_recipe, data_ppl, data_cstmeta, data_tool
>INNER JOIN (SELECT id,tool_id,recipe_id, MAX(date_time) as MaxDateTime
>FROM data_cstmeta
>GROUP BY recipe_id) grouped
>ON data_cstmeta.id = grouped.id
>AND data_cstmeta.date_time = grouped.MaxDateTime
> WHERE data_cstmeta.ppl_id = data_ppl.id
> AND data_cstmeta.recipe_id = data_recipe.id
> AND data_cstmeta.tool_id = data_tool.id
> ORDER BY data_cstmeta.date_time desc;
>
> And now it fails with:
>
> ERROR 1054 (42S22): Unknown column 'data_cstmeta.id' in 'on clause'
>
> Larry Martell wrote:
>
> SELECT data_ppl.name as 'P/P/L', data_recipe.name as Recipe,
> data_cstmeta.date_time as 'Last Run'
> FROM data_recipe, data_ppl, data_cstmeta
>INNER JOIN (SELECT id,recipe_id, MAX(date_time) as MaxDateTime
>FROM data_cstmeta
>GROUP BY recipe_id) grouped
>ON data_cstmeta.id = grouped.id
>AND data_cstmeta.date_time = grouped.MaxDateTime
> WHERE data_cstmeta.ppl_id = data_ppl.id
> AND data_cstmeta.recipe_id = data_recipe.id
> ORDER BY data_cstmeta.date_time desc;
>
>
> Now I need to add one more column to it, data_tool.name, so I did this:
>
>
> SELECT data_ppl.name as 'P/P/L', data_tool.name as Tool,
> data_recipe.name as Recipe, data_cstmeta.date_time as 'Last Run'
> FROM data_recipe, data_ppl, data_cstmeta, data_tool
>INNER JOIN (SELECT id,tool_id,recipe_id, MAX(date_time) as MaxDateTime
>FROM data_cstmeta
>GROUP BY recipe_id) grouped
>ON data_cstmeta.id = grouped.id
>AND data_cstmeta.date_time = grouped.MaxDateTime
> WHERE data_cstmeta.ppl_id = data_ppl.id
> AND data_cstmeta.recipe_id = data_recipe.id
> AND data_cstmeta.tool_id = data_tool.id
> ORDER BY data_cstmeta.date_time desc;
>
> And now it fails with:
>
> ERROR 1054 (42S22): Unknown column 'data_cstmeta.id' in 'on clause'
>
>
> --
> Oracle
> Douglas C. Nelson
> Advanced Customer Services Principal Advanced Support Systems Analyst 5
> Mobile - 919 760 9221
>

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: adding column breaks query

2013-01-28 Thread Larry Martell

On Mon, Jan 28, 2013 at 8:56 AM, Shawn Green  wrote:
> Hello Larry,
>
>
> On 1/28/2013 10:42 AM, Larry Martell wrote:
>>
>> I have a query that works fine:
>>
>>
>> SELECT data_ppl.name as 'P/P/L', data_recipe.name as Recipe,
>> data_cstmeta.date_time as 'Last Run'
>> FROM data_recipe, data_ppl, data_cstmeta
>> INNER JOIN (SELECT id,recipe_id, MAX(date_time) as MaxDateTime
>> FROM data_cstmeta
>> GROUP BY recipe_id) grouped
>> ON data_cstmeta.id = grouped.id
>> AND data_cstmeta.date_time = grouped.MaxDateTime
>> WHERE data_cstmeta.ppl_id = data_ppl.id
>> AND data_cstmeta.recipe_id = data_recipe.id
>> ORDER BY data_cstmeta.date_time desc;
>>
>>
>> Now I need to add one more column to it, data_tool.name, so I did this:
>>
>>
>> SELECT data_ppl.name as 'P/P/L', data_tool.name as Tool,
>> data_recipe.name as Recipe, data_cstmeta.date_time as 'Last Run'
>> FROM data_recipe, data_ppl, data_cstmeta, data_tool
>> INNER JOIN (SELECT id,tool_id,recipe_id, MAX(date_time) as MaxDateTime
>> FROM data_cstmeta
>> GROUP BY recipe_id) grouped
>> ON data_cstmeta.id = grouped.id
>> AND data_cstmeta.date_time = grouped.MaxDateTime
>> WHERE data_cstmeta.ppl_id = data_ppl.id
>> AND data_cstmeta.recipe_id = data_recipe.id
>> AND data_cstmeta.tool_id = data_tool.id
>> ORDER BY data_cstmeta.date_time desc;
>>
>> And now it fails with:
>>
>> ERROR 1054 (42S22): Unknown column 'data_cstmeta.id' in 'on clause'
>>
>> I've messed around with this for a day and I can't figure out what
>> stupid mistake I'm making.
>>
>
> You didn't add just one column, you added an entire table to the FROM
> clause.  Also, you used a comma-join instead of an ANSI JOIN clause.
>
> Because of changes we made in 5.0.12 to make MySQL behave more like the SQL
> standard requires, the priority of the comma operator in join operations was
> demoted. This means that when the explicit JOIN declaration is being
> evaluated (which has a higher order of precedence) only the derived table
> (grouped) and the base table `data_tool` are visible to the ON clause. The
> `data_cstmeta` implicit join will be evaluated later as it is based on a
> comma-join.
>
> I suggest you rewrite your queries to only use the explicit "JOIN ... ON
> ..." syntax for all of your joins to avoid these problems in the future.
> http://dev.mysql.com/doc/refman/5.5/en/join.html
>
> Regards,
> --
> Shawn Green
> MySQL Principal Technical Support Engineer
> Oracle USA, Inc. - Hardware and Software, Engineered to Work Together.
> Office: Blountville, TN
>
>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:http://lists.mysql.com/mysql
>

Thanks. I know the JOIN clause is preferred over the comma join, just
surprised this query worked until I added a new table. In any case I
changed it as shown below, and it works now. Thanks!


SELECT data_ppl.name as 'P/P/L',
data_tool.name as Tool,
data_recipe.name as Recipe,
data_cstmeta.date_time as 'Last Run'
FROM data_cstmeta
INNER JOIN (SELECT id,tool_id,recipe_id, MAX(date_time) as MaxDateTime
   FROM data_cstmeta
   GROUP BY recipe_id) grouped
ON data_cstmeta.id = grouped.id AND data_cstmeta.date_time =
grouped.MaxDateTime
INNER JOIN data_ppl ON data_cstmeta.ppl_id = data_ppl.id
INNER JOIN data_tool ON data_cstmeta.tool_id = data_tool.id
INNER JOIN data_recipe ON data_cstmeta.recipe_id = data_recipe.id
ORDER BY data_cstmeta.date_time desc;

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: blank line when column changes

2012-12-18 Thread Larry Martell

Yeah, it's easy enough to do in python, but my client wanted a SQL query.

Thanks


On Tue, Dec 18, 2012 at 5:48 PM, Rick James  wrote:
> GROUP BY .. WITH ROLLUP
> could get you something like that.
>
> Otherwise, I would advise that you use your application language (PHP, Java, 
> etc) to do "presentation" type stuff.
>
> Yes, it could be done is SQL, but my brain might explode if I try to conjure 
> up such.
>
>> -Original Message-
>> From: Larry Martell [mailto:larry.mart...@gmail.com]
>> Sent: Monday, December 17, 2012 5:18 AM
>> To: mysql mailing list
>> Subject: blank line when column changes
>>
>> I have this query:
>>
>> SELECT q1.t, data_eventlog.message, q1.dt, data_eventlog.date_time FROM
>> (SELECT cdsem_event_message_idx.message m,
>>  cdsem_event_message_idx.date_time dt,
>>  data_tool.name t,
>>  cdsem_event_message_idx.tool_id tid
>>   FROM data_tool, cdsem_event_message_idx
>>   WHERE data_tool.id = cdsem_event_message_idx.tool_id
>>   AND cdsem_event_message_idx.message_idx = 'ExitingToOn') q1,
>> data_eventlog WHERE data_eventlog.date_time <= q1.dt AND
>> TIMESTAMPDIFF(SECOND, data_eventlog.date_time, q1.dt) < 120 AND
>> data_eventlog.tool_id = q1.tid ORDER BY q1.t, q1.dt,
>> data_eventlog.date_time
>>
>> What I want to do is output a blank line whenever (q1.t, q1.dt)
>> changes. Is there any way to do that in SQL?
>>
>> --
>> MySQL General Mailing List
>> For list archives: http://lists.mysql.com/mysql
>> To unsubscribe:http://lists.mysql.com/mysql
>

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: Help with left outer join

2012-12-12 Thread Larry Martell

On Tue, Dec 11, 2012 at 7:22 PM,   wrote:
>>>>> 2012/12/11 16:19 -0500, Larry Martell >>>>
> I have this query:
>
> SELECT data_target.name, ep, wafer_id, lot_id,
>date_time, data_file_id, data_cstimage.name,
>bottom, wf_file_path_id, data_measparams.name,
>vacc, data_category.name
> FROM data_cst, data_target, data_cstimage, data_measparams,
>  data_category, data_tool
> WHERE data_cst.target_name_id IN (38018, 29947, 28330)
> AND data_cst.date_time BETWEEN '2012-09-01 00:00:00' AND '2012-09-07 00:00:00'
> AND data_target.id = data_cst.target_name_id
> AND data_cstimage.id = data_cst.image_measurer_id
> AND data_measparams.id = data_cst.meas_params_name_id
> AND data_category.id = data_tool.category_id
> AND data_tool.id = data_cst.tool_id
> ORDER BY target_name_id, ep, wafer_id, lot_id, date_time
>
> My problem is that when data_cst.image_measurer_id is NULL I don't get
> that data_cst row even though all the other part of the where clause
> are TRUE. I understand why that is, but in that case I want the row,
> but with NULL in the data_cstimage.name column. I think I need a left
> outer join, but I've been messing with this for hours, and I can't get
> the syntax right. I've googled it, but all the examples are simple
> with just 2 tables. Can someone help me with this?
> <<<<<<<<
> Modern forms do not give a left join if one uses WHERE-clause to reduce a 
> full cross-join to an inner join. It is better to start with something like 
> this,
>
> FROM data_cst JOIN data_target
> ON data_target.id = data_cst.target_name_id JOIN data_cstimage
> ON data_cstimage.id = data_cst.image_measurer_id JOIN data_measparams
> ON data_measparams.id = data_cst.meas_params_name_id JOIN
>  (data_category JOIN data_tool
> ON data_category.id = data_tool.category_id)
> ON data_tool.id = data_cst.tool_id
>
> but I am not too sure where to bracket "data_tool". When you have put it into 
> a 'FROM'-clause with 'JOIN', not comma, separating the tables, with the same 
> outcome as now, then you have to decide before which 'JOIN' to put the 
> 'LEFT'. Maybe you want it between "data_cstimage" and "data_measparams".
>
> (Are all the distinct "id"s really needed? When one joins on a field with the 
> same name in both tables, one may use 'USING', and only the common field, 
> with neither NULL, shows up in the output.)

I'm not familiar with the USING clause. I'll have to look into that. Thanks!

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: Help with left outer join

2012-12-12 Thread Larry Martell

On Tue, Dec 11, 2012 at 8:48 PM, Peter Brawley
 wrote:
>>ERROR 1054 (42S22): Unknown column 'data_tool.category_id' in 'on clause'
>>But category_id is a column in data_tool.
>
> Then a bit of reordering is required ...
>
> SELECT data_target.name, ep, wafer_id, lot_id,
>date_time, data_file_id, data_cstimage.name,
>bottom, wf_file_path_id, data_measparams.name,
>vacc, data_category.name
> FROM data_cst
> RIGHT JOIN data_cstimage   ON data_cstimage.id = data_cst.image_measurer_id
> INNER JOIN data_tool   ON data_tool.id = data_cst.tool_id
> INNER JOIN data_target ON data_target.id = data_cst.target_name_id
> INNER JOIN data_measparams ON data_measparams.id =
> data_cst.meas_params_name_id
> INNER JOIN data_category   ON data_category.id = data_tool.category_id
> WHERE data_cst.target_name_id IN (38018, 29947, 28330)
>   AND data_cst.date_time BETWEEN '2012-09-01 00:00:00' AND '2012-09-07
> 00:00:00'
> ORDER BY target_name_id, ep, wafer_id, lot_id, date_time;

Your query worked exactly like my original one - i.e. when
data_cst.image_measurer_id is NULL I don't get that data_cst row. But
I changed the RIGHT JOIN to a LEFT JOIN and then it was doing exactly
what I wanted. Thanks for all the help!

> On 2012-12-11 5:11 PM, Larry Martell wrote:
>
> SELECT data_target.name, ep, wafer_id, lot_id,
>>
>>date_time, data_file_id, data_cstimage.name,
>>
>>bottom, wf_file_path_id, data_measparams.name,
>>
>>vacc, data_category.name
>>
>> FROM data_cst
>>
>> RIGHT JOIN data_cstimage   ON data_cstimage.id =
>> data_cst.image_measurer_id
>>
>> INNER JOIN data_target ON data_target.id = data_cst.target_name_id
>>
>> INNER JOIN data_measparams ON data_measparams.id =
>> data_cst.meas_params_name_id
>>
>> INNER JOIN data_category   ON data_category.id = data_tool.category_id
>>
>> INNER JOIN data_tool   ON data_tool.id = data_cst.tool_id
>>
>> WHERE data_cst.target_name_id IN (38018, 29947, 28330)
>>
>>AND data_cst.date_time BETWEEN '2012-09-01 00:00:00' AND '2012-09-07
>> 00:00:00'
>>
>> ORDER BY target_name_id, ep, wafer_id, lot_id, date_time;
>
> Thanks very much for the reply. This is giving me:
>
> ERROR 1054 (42S22): Unknown column 'data_tool.category_id' in 'on clause'
>
> But category_id is a column in data_tool.
>
>
>

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: Help with left outer join

2012-12-12 Thread Larry Martell

On Wed, Dec 12, 2012 at 8:25 AM, Shawn Green  wrote:
> On 12/11/2012 7:22 PM, h...@tbbs.net wrote:
>>
>> ... (Are all the distinct "id"s really needed? When one joins on a
>>
>> field with the same name in both tables, one may use 'USING', and
>> only the common field, with neither NULL, shows up in the output.)
>>
>
> This is a perfectly acceptable naming convention to use. For example if you
> have a field on the `art` table that references the ID column of the
> `person` table to indicate the owner and another field to indicate the
> person who created the art, you might want to use the names
> `owner_person_id` and `artist_person_id` to keep them separate from the `id`
> column used to uniquely identify the work of art itself.
>
> In this design pattern, each table has a numeric ID column (string-based
> primary keys are perfectly legal but have their drawbacks and should be used
> with care) and to reference it from another table you can use the pattern
> _id.  It keeps your naming conventions clean and predictable.
>
> If I were to try to use a USING operator in my opening example, I would be
> trying to match the PK fields of two separate types of data.
>
> (the USING example)
> SELECT ... FROM art INNER JOIN person USING(id)...
>
> Let's say that I renamed the id fields to art_id and person_id to make them
> table-specific. This still fails because a person's identifier as an owner
> is not the same as a work of art's creator. It also does not allow me to use
> the `person` table more than once in a single query.
>
> (FAIL: a renamed USING example)
> SELECT ...
> FROM art
> INNER JOIN person USING(person_id) <--- does this refer to the owner or the
> creator of the art?
>
> (the name template example)
> SELECT ...
> FROM art
> INNER JOIN person owner
>   on art.owner_person_id = owner.id
> INNER JOIN person artist
>   on art.artist_person_id = artist.id
> ...
>
> I admit I am a control freak when it comes to the accuracy of my queries,
> the integrity of my data, and the ease of maintenance for my SQL statements.
> Because of this, I much prefer the regular predictability of the ANSI JOIN
> syntax (demonstrated above) for any JOIN except for an intentional Cartesian
> product.  For that case alone, I prefer a comma join
>
> (Cartesian product example)
> SELECT ...
> FROM table1, table2
> ...
>
> Not only is the ANSI syntax the only way to specify an OUTER join (such as
> LEFT JOIN or RIGHT JOIN) it forces the author of the statement to recognize
> that they do or do not have an ON clause for this table relationship.
> Separating the same conditions into a WHERE clause makes it very easy to
> overlook a table relationship and accidentally create a Cartesian product
> which can often devastate query performance. It is perfectly legal to use
> the comma-join syntax with MySQL but I strongly recommend against it just
> because it can only be used for INNER joins or CROSS joins and because it
> forces you to put your relationship conditions in the WHERE clause.
>
> Another strike (to me, anyway) against the comma join is that in order to
> process joins more like the specifications in the SQL standards, we demoted
> the precedence of the comma operator with 5.0.12. These changes also
> affected the behavior of the USING and NATURAL JOIN operators.
> http://dev.mysql.com/doc/refman/5.5/en/join.html

Thanks very much Shawn for this very informative post. I learned SQL
in the early 1980's and I was not taught the JOIN syntax, only the
comma join and WHERE. It's really just force of habit that I write
queries that way. I will try very hard in the future to break that
habit and use the JOIN syntax.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: Help with left outer join

2012-12-11 Thread Larry Martell

On Tue, Dec 11, 2012 at 5:12 PM, Peter Brawley
 wrote:
>
> On 2012-12-11 3:19 PM, Larry Martell wrote:
>
> I have this query:
>
> SELECT data_target.name, ep, wafer_id, lot_id,
>date_time, data_file_id, data_cstimage.name,
>bottom, wf_file_path_id, data_measparams.name,
>vacc, data_category.name
> FROM data_cst, data_target, data_cstimage, data_measparams,
>  data_category, data_tool
> WHERE data_cst.target_name_id IN (38018, 29947, 28330)
> AND data_cst.date_time BETWEEN '2012-09-01 00:00:00' AND '2012-09-07
> 00:00:00'
> AND data_target.id = data_cst.target_name_id
> AND data_cstimage.id = data_cst.image_measurer_id
> AND data_measparams.id = data_cst.meas_params_name_id
> AND data_category.id = data_tool.category_id
> AND data_tool.id = data_cst.tool_id
> ORDER BY target_name_id, ep, wafer_id, lot_id, date_time
>
> My problem is that when data_cst.image_measurer_id is NULL I don't get
> that data_cst row even though all the other part of the where clause
> are TRUE. I understand why that is, but in that case I want the row,
> but with NULL in the data_cstimage.name column.
>
> Explicit join syntax makes such queries easier to read, write, develop &
> debug. Is this what you mean?
>
>
> SELECT data_target.name, ep, wafer_id, lot_id,
>
>date_time, data_file_id, data_cstimage.name,
>
>bottom, wf_file_path_id, data_measparams.name,
>
>vacc, data_category.name
>
> FROM data_cst
>
> RIGHT JOIN data_cstimage   ON data_cstimage.id = data_cst.image_measurer_id
>
> INNER JOIN data_target ON data_target.id = data_cst.target_name_id
>
> INNER JOIN data_measparams ON data_measparams.id =
> data_cst.meas_params_name_id
>
> INNER JOIN data_category   ON data_category.id = data_tool.category_id
>
> INNER JOIN data_tool   ON data_tool.id = data_cst.tool_id
>
> WHERE data_cst.target_name_id IN (38018, 29947, 28330)
>
>AND data_cst.date_time BETWEEN '2012-09-01 00:00:00' AND '2012-09-07
> 00:00:00'
>
> ORDER BY target_name_id, ep, wafer_id, lot_id, date_time;

Thanks very much for the reply. This is giving me:

ERROR 1054 (42S22): Unknown column 'data_tool.category_id' in 'on clause'

But category_id is a column in data_tool.


> I think I need a left
> outer join, but I've been messing with this for hours, and I can't get
> the syntax right. I've googled it, but all the examples are simple
> with just 2 tables. Can someone help me with this?
>
> TIA!
> -larry
>
>

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: Issue regarding the import of the date from csv file to the table in the database in mysql

2012-11-13 Thread Larry Martell

On Tue, Nov 13, 2012 at 9:56 PM, sagar bs  wrote:
> Hi,
>
> As i have the data with some 25 variables in csv file and i need to import
> to mysql.
> The issue is that the date format in csv file is dd/mm/ and mysql takes
> the date format like /mm/dd.
> The  number of variables in the csv file are same in the table in database
> of mysql.
> Please help me out.

Convert the data in your CSV file into the format you need. So many
ways to do that, e.g.: bring in into excel and change the column's
format, use sed, use python, use vi, 

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: mysqld cannot start.

2012-11-12 Thread Larry Martell

On Mon, Nov 12, 2012 at 2:11 PM, Tianyin Xu  wrote:
> Hi, guys,
>
> I'm new to MySQL. I installed MySQL step-by-step according to the manual
> from source code. The version is mysql-5.5.28.
>
> When I start the server using:
> #bin/mysqld_safe --user=mysql
>
> The server failed to start with the following message in the log (attached
> below).
>
> There're two problems according to the log messages:
>
> 1.
>
> 121112 13:00:59 [ERROR] Can't read from messagefile
> '/usr/local/mysql/data/errmsg.sys
>
> I'm confused because I do have this file which is owned by user "mysql":
>
> #ll /usr/local/mysql/data/errmsg.sys
> -rw-r--r-- 1 mysql mysql 0 Nov 12 12:35 /usr/local/mysql/data/errmsg.sys

Check the permissions of all the dirs in that path: /usr, /usr/local,
/usr/local/mysql, /usr/local/mysql/data. The mysql user will need r-x
access to all of them.

> 2.
>
> 121112 13:00:59 [ERROR] An old style --language value with language
> specific part detected: /usr/local/mysql/data/
>
> I commented the "lc-messages-dir" configuration entry in the my.cnf, but
> the error still exists...

Don't know what platform you're on, but on my Mac I had this issue,
and I got around it by starting the server with:

--lc-messages-dir="/usr/local/mysql/share/"

On CentOS and RHEL I did not have this issue.

HTH-
larry

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: checking progress of alter table on an InnoDB table (Was: Re: checking progress of alter table on a MyISAM table)

2012-09-28 Thread Larry Martell

On Thu, Sep 27, 2012 at 3:02 PM, Akshay Suryavanshi
 wrote:
> Hi,
>
> 22 indexes are simply too many, assuming they are not composite, which means
> you already have a very large table. Secondly the most important bottleneck
> is shown by the inserts/sec, only 405 inserts is very very slow. This could
> take ages to complete. And the 405 inserts/sec are averages calculated over
> some small period of time mostly under a minute, not from the uptime, you
> can see that at the top (initial lines) of the Show engine innodb status \G
> output.
>
> Depending upon the machine footprint, inserts/sec should atleast be more
> than 1 inserts/sec even on a busy server.
>
> Indexes are slowing this down. your calculation of 79 hours should be
> correct, only if there are no unique indexes, otherwise this will slow down
> more as the data increases.

It finally finished after 55 hours.


> On Fri, Sep 28, 2012 at 2:22 AM, Larry Martell 
> wrote:
>>
>> On Thu, Sep 27, 2012 at 2:37 PM, Akshay Suryavanshi
>>  wrote:
>> > Hi,
>> >
>> > The alter taking such a long time, could be due to composite indexes on
>> > the
>> > table.
>>
>> There are 22 indexes on the table, but none are composites.
>>
>> > we understand the table is big but not so big to take such a long
>> > time. Also we can get a hold of the process looking at the disk space
>> > consumed. Usually a tmp table created in data directory would also give
>> > a
>> > good understanding of the process, remember sizes need not be exact
>> > since
>> > there might be some defragmentation at file level.
>>
>> Ok, I will look for a temp file.
>>
>> > Next you can check inserts/sec in Show engine innodb status \G and
>> > calculate
>> > the time it should take for the number of rows in the table.
>>
>> The ROW OPERATIONS section has this:
>>
>> 1 queries inside InnoDB, 0 queries in queue
>> 1 read views open inside InnoDB
>> Main thread process no. 7913, id 14020684432, state: sleeping
>> Number of rows inserted 75910241, updated 15602, deleted 70, read
>> 9130481311
>> 405.80 inserts/s, 0.00 updates/s, 0.00 deletes/s, 405.80 reads/s
>>
>> This appears to be statistics over the life of the invocation of the
>> server, correct? But using 405.80 inserts/s give that the alter will
>> take almost 79 hours.
>>
>> >
>> > Usually, you carry this operation by adding the secondary indexes after
>> > the
>> > data import or such alters are complete.
>> >
>> > Regards,
>> > Akshay Suryavanshi
>> >
>> >
>> > On Fri, Sep 28, 2012 at 1:56 AM, Rick James 
>> > wrote:
>> >>
>> >> Isn't ALTER a DDL, not DML?  So I don't think you would find anything
>> >> in
>> >> undo logs.
>> >>
>> >> > -Original Message-
>> >> > From: Larry Martell [mailto:larry.mart...@gmail.com]
>> >> > Sent: Thursday, September 27, 2012 1:20 PM
>> >> > To: Rick James
>> >> > Cc: mysql mailing list
>> >> > Subject: checking progress of alter table on an InnoDB table (Was:
>> >> > Re:
>> >> > checking progress of alter table on a MyISAM table)
>> >> >
>> >> > So we changed the table from MyISAM to InnoDB. I read that the 'undo
>> >> > log entries' shown in 'show engine innodb status' would correspond to
>> >> > the number of rows that have been operated on throughout the process
>> >> > of
>> >> > the ALTER. The table we're altering has 115,096,205 rows, and the
>> >> > alter's been running for 28 hours, and the undo log entries is 9309.
>> >> > Also that number seems to go up and down. So clearly, it's not what I
>> >> > think.
>> >> >
>> >> > So anyone know a way to monitor the status of the alter now that it's
>> >> > an InnoDB table?
>> >> >
>> >> >
>> >> > On Wed, Sep 26, 2012 at 10:31 AM, Rick James 
>> >> > wrote:
>> >> > > Not really.
>> >> > > You could look at the .TYD and .TYI file sizes and compare to the
>> >> > .MYD and .MYI, but that can be deceptive.  If the table is really
>> >> > big,
>> >> > and has lots of indexes, the generation of the indexes might go
>> >> > slower
>> >> > and slower --

Re: checking progress of alter table on an InnoDB table (Was: Re: checking progress of alter table on a MyISAM table)

2012-09-27 Thread Larry Martell

On Thu, Sep 27, 2012 at 3:02 PM, Akshay Suryavanshi
 wrote:
> Hi,
>
> 22 indexes are simply too many, assuming they are not composite, which means
> you already have a very large table.

Yeah, I agree the table has too many columns and too many indexes. I
didn't design it, and I have no control over it. There's a huge app
that's built around it, and changing the table would require too many
code changes.

> Secondly the most important bottleneck
> is shown by the inserts/sec, only 405 inserts is very very slow. This could
> take ages to complete. And the 405 inserts/sec are averages calculated over
> some small period of time mostly under a minute, not from the uptime, you
> can see that at the top (initial lines) of the Show engine innodb status \G
> output.
>
> Depending upon the machine footprint, inserts/sec should atleast be more
> than 1 inserts/sec even on a busy server.

I don't know anything about the machine. I don't have direct access to
it (it's at my client's client's site) so I'm doing everything by
proxy.

> Indexes are slowing this down. your calculation of 79 hours should be
> correct, only if there are no unique indexes, otherwise this will slow down
> more as the data increases.

It is what it is - there's no free lunch - dropping the indexes, doing
the alter, and re-adding the indexes would probably take just as long.

It would be very nice to speed up the alter, but what I was really
looking for here was a way to monitor its progress.


> On Fri, Sep 28, 2012 at 2:22 AM, Larry Martell 
> wrote:
>>
>> On Thu, Sep 27, 2012 at 2:37 PM, Akshay Suryavanshi
>>  wrote:
>> > Hi,
>> >
>> > The alter taking such a long time, could be due to composite indexes on
>> > the
>> > table.
>>
>> There are 22 indexes on the table, but none are composites.
>>
>> > we understand the table is big but not so big to take such a long
>> > time. Also we can get a hold of the process looking at the disk space
>> > consumed. Usually a tmp table created in data directory would also give
>> > a
>> > good understanding of the process, remember sizes need not be exact
>> > since
>> > there might be some defragmentation at file level.
>>
>> Ok, I will look for a temp file.
>>
>> > Next you can check inserts/sec in Show engine innodb status \G and
>> > calculate
>> > the time it should take for the number of rows in the table.
>>
>> The ROW OPERATIONS section has this:
>>
>> 1 queries inside InnoDB, 0 queries in queue
>> 1 read views open inside InnoDB
>> Main thread process no. 7913, id 14020684432, state: sleeping
>> Number of rows inserted 75910241, updated 15602, deleted 70, read
>> 9130481311
>> 405.80 inserts/s, 0.00 updates/s, 0.00 deletes/s, 405.80 reads/s
>>
>> This appears to be statistics over the life of the invocation of the
>> server, correct? But using 405.80 inserts/s give that the alter will
>> take almost 79 hours.
>>
>> >
>> > Usually, you carry this operation by adding the secondary indexes after
>> > the
>> > data import or such alters are complete.
>> >
>> > Regards,
>> > Akshay Suryavanshi
>> >
>> >
>> > On Fri, Sep 28, 2012 at 1:56 AM, Rick James 
>> > wrote:
>> >>
>> >> Isn't ALTER a DDL, not DML?  So I don't think you would find anything
>> >> in
>> >> undo logs.
>> >>
>> >> > -Original Message-
>> >> > From: Larry Martell [mailto:larry.mart...@gmail.com]
>> >> > Sent: Thursday, September 27, 2012 1:20 PM
>> >> > To: Rick James
>> >> > Cc: mysql mailing list
>> >> > Subject: checking progress of alter table on an InnoDB table (Was:
>> >> > Re:
>> >> > checking progress of alter table on a MyISAM table)
>> >> >
>> >> > So we changed the table from MyISAM to InnoDB. I read that the 'undo
>> >> > log entries' shown in 'show engine innodb status' would correspond to
>> >> > the number of rows that have been operated on throughout the process
>> >> > of
>> >> > the ALTER. The table we're altering has 115,096,205 rows, and the
>> >> > alter's been running for 28 hours, and the undo log entries is 9309.
>> >> > Also that number seems to go up and down. So clearly, it's not what I
>> >> > think.
>> >> >
>> >> > So anyone know a

Re: checking progress of alter table on an InnoDB table (Was: Re: checking progress of alter table on a MyISAM table)

2012-09-27 Thread Larry Martell

On Thu, Sep 27, 2012 at 2:37 PM, Akshay Suryavanshi
 wrote:
> Hi,
>
> The alter taking such a long time, could be due to composite indexes on the
> table.

There are 22 indexes on the table, but none are composites.

> we understand the table is big but not so big to take such a long
> time. Also we can get a hold of the process looking at the disk space
> consumed. Usually a tmp table created in data directory would also give a
> good understanding of the process, remember sizes need not be exact since
> there might be some defragmentation at file level.

Ok, I will look for a temp file.

> Next you can check inserts/sec in Show engine innodb status \G and calculate
> the time it should take for the number of rows in the table.

The ROW OPERATIONS section has this:

1 queries inside InnoDB, 0 queries in queue
1 read views open inside InnoDB
Main thread process no. 7913, id 14020684432, state: sleeping
Number of rows inserted 75910241, updated 15602, deleted 70, read 9130481311
405.80 inserts/s, 0.00 updates/s, 0.00 deletes/s, 405.80 reads/s

This appears to be statistics over the life of the invocation of the
server, correct? But using 405.80 inserts/s give that the alter will
take almost 79 hours.

>
> Usually, you carry this operation by adding the secondary indexes after the
> data import or such alters are complete.
>
> Regards,
> Akshay Suryavanshi
>
>
> On Fri, Sep 28, 2012 at 1:56 AM, Rick James  wrote:
>>
>> Isn't ALTER a DDL, not DML?  So I don't think you would find anything in
>> undo logs.
>>
>> > -Original Message-
>> > From: Larry Martell [mailto:larry.mart...@gmail.com]
>> > Sent: Thursday, September 27, 2012 1:20 PM
>> > To: Rick James
>> > Cc: mysql mailing list
>> > Subject: checking progress of alter table on an InnoDB table (Was: Re:
>> > checking progress of alter table on a MyISAM table)
>> >
>> > So we changed the table from MyISAM to InnoDB. I read that the 'undo
>> > log entries' shown in 'show engine innodb status' would correspond to
>> > the number of rows that have been operated on throughout the process of
>> > the ALTER. The table we're altering has 115,096,205 rows, and the
>> > alter's been running for 28 hours, and the undo log entries is 9309.
>> > Also that number seems to go up and down. So clearly, it's not what I
>> > think.
>> >
>> > So anyone know a way to monitor the status of the alter now that it's
>> > an InnoDB table?
>> >
>> >
>> > On Wed, Sep 26, 2012 at 10:31 AM, Rick James 
>> > wrote:
>> > > Not really.
>> > > You could look at the .TYD and .TYI file sizes and compare to the
>> > .MYD and .MYI, but that can be deceptive.  If the table is really big,
>> > and has lots of indexes, the generation of the indexes might go slower
>> > and slower -- hence any math on the sizes would be optimistic.
>> > >
>> > >> -Original Message-
>> > >> From: Larry Martell [mailto:larry.mart...@gmail.com]
>> > >> Sent: Wednesday, September 26, 2012 8:52 AM
>> > >> To: mysql mailing list
>> > >> Subject: checking progress of alter table on a MyISAM table
>> > >>
>> > >> Is there any way to check on the progress of a long running alter
>> > >> table on a MyISAM table? I know it can be done with an InnoDB table,
>> > >> but I haven't found a way to do it on with a MyISAM table.
>> >
>> > --
>> > MySQL General Mailing List
>> > For list archives: http://lists.mysql.com/mysql
>> > To unsubscribe:http://lists.mysql.com/mysql
>>
>>
>> --
>> MySQL General Mailing List
>> For list archives: http://lists.mysql.com/mysql
>> To unsubscribe:http://lists.mysql.com/mysql
>>
>

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: checking progress of alter table on an InnoDB table (Was: Re: checking progress of alter table on a MyISAM table)

2012-09-27 Thread Larry Martell

I read that here:

http://gabrielcain.com/blog/2009/08/05/mysql-alter-table-and-how-to-observe-progress/comment-page-1/

On Thu, Sep 27, 2012 at 2:26 PM, Rick James  wrote:
> Isn't ALTER a DDL, not DML?  So I don't think you would find anything in undo 
> logs.
>
>> -Original Message-----
>> From: Larry Martell [mailto:larry.mart...@gmail.com]
>> Sent: Thursday, September 27, 2012 1:20 PM
>> To: Rick James
>> Cc: mysql mailing list
>> Subject: checking progress of alter table on an InnoDB table (Was: Re:
>> checking progress of alter table on a MyISAM table)
>>
>> So we changed the table from MyISAM to InnoDB. I read that the 'undo
>> log entries' shown in 'show engine innodb status' would correspond to
>> the number of rows that have been operated on throughout the process of
>> the ALTER. The table we're altering has 115,096,205 rows, and the
>> alter's been running for 28 hours, and the undo log entries is 9309.
>> Also that number seems to go up and down. So clearly, it's not what I
>> think.
>>
>> So anyone know a way to monitor the status of the alter now that it's
>> an InnoDB table?
>>
>>
>> On Wed, Sep 26, 2012 at 10:31 AM, Rick James 
>> wrote:
>> > Not really.
>> > You could look at the .TYD and .TYI file sizes and compare to the
>> .MYD and .MYI, but that can be deceptive.  If the table is really big,
>> and has lots of indexes, the generation of the indexes might go slower
>> and slower -- hence any math on the sizes would be optimistic.
>> >
>> >> -Original Message-
>> >> From: Larry Martell [mailto:larry.mart...@gmail.com]
>> >> Sent: Wednesday, September 26, 2012 8:52 AM
>> >> To: mysql mailing list
>> >> Subject: checking progress of alter table on a MyISAM table
>> >>
>> >> Is there any way to check on the progress of a long running alter
>> >> table on a MyISAM table? I know it can be done with an InnoDB table,
>> >> but I haven't found a way to do it on with a MyISAM table.
>>
>> --
>> MySQL General Mailing List
>> For list archives: http://lists.mysql.com/mysql
>> To unsubscribe:http://lists.mysql.com/mysql
>

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

checking progress of alter table on an InnoDB table (Was: Re: checking progress of alter table on a MyISAM table)

2012-09-27 Thread Larry Martell

So we changed the table from MyISAM to InnoDB. I read that the 'undo
log entries' shown in 'show engine innodb status' would correspond to
the number of rows that have been operated on throughout the process
of the ALTER. The table we're altering has 115,096,205 rows, and the
alter's been running for 28 hours, and the undo log entries is 9309.
Also that number seems to go up and down. So clearly, it's not what I
think.

So anyone know a way to monitor the status of the alter now that it's
an InnoDB table?

On Wed, Sep 26, 2012 at 10:31 AM, Rick James  wrote:
> Not really.
> You could look at the .TYD and .TYI file sizes and compare to the .MYD and 
> .MYI, but that can be deceptive.  If the table is really big, and has lots of 
> indexes, the generation of the indexes might go slower and slower -- hence 
> any math on the sizes would be optimistic.
>
>> -Original Message-
>> From: Larry Martell [mailto:larry.mart...@gmail.com]
>> Sent: Wednesday, September 26, 2012 8:52 AM
>> To: mysql mailing list
>> Subject: checking progress of alter table on a MyISAM table
>>
>> Is there any way to check on the progress of a long running alter table
>> on a MyISAM table? I know it can be done with an InnoDB table, but I
>> haven't found a way to do it on with a MyISAM table.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

checking progress of alter table on a MyISAM table

2012-09-26 Thread Larry Martell

Is there any way to check on the progress of a long running alter
table on a MyISAM table? I know it can be done with an InnoDB table,
but I haven't found a way to do it on with a MyISAM table.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: getting certain rows from a group by

2012-09-20 Thread Larry Martell

On Wed, Sep 19, 2012 at 6:56 PM, Rick James  wrote:
>> my counts are 3 times too much.
> Without studying the code, I would guess that there is a JOIN between he data 
> that needs COUNTing and the GROUP BY for the COUNT.  That is, it collects 
> more 'joined' rows before counting.  Fixing it will probably make the query 
> even messier.

The issue was with the last run join. I had to change the where clause
in the join from:

lr.target_name_id IN (775, 776, 777, 778, 779, 780, 45, 44, 116, 117,
118, 119, 120, 121)

to:

lr.target_name_id = q1.target_name_id AND lr.ep = q1.ep

In fact I had to do this for all the joins.

Thanks again for all your help!

>
>> -Original Message-
>> From: Larry Martell [mailto:larry.mart...@gmail.com]
>> Sent: Wednesday, September 19, 2012 3:22 PM
>> To: Rick James
>> Cc: peter.braw...@earthlink.net; mysql@lists.mysql.com
>> Subject: Re: getting certain rows from a group by
>>
>> Well, I'm getting the proper rows from the 4 joins I added, but now I'm
>> seeing a weird side effect - my counts are 3 times to much.
>>
>> The query is really huge and nasty now, but I'm going to paste it
>> below.
>>
>> In the outer most select, Wafers, Rerun, Runs, and Count are 3 times
>> what they should be. If I take off the joins, I get the correct counts.
>> I can't see why that's happening.
>>
>>
>> SELECT data_target.name as Target,
>>q1.ep as EP,
>>COUNT(*) as Wafers,
>>Lots,
>>FORMAT(SUM(numonep)/(COUNT(*)+(SUM(GREATEST(q1.reruns,
>> 0, 1) as 'Sites/Wafer',
>>DATE_FORMAT(MAX(LastRun), '%m/%d/%y') as "Last Run",
>>SUM(GREATEST(q1.reruns, 0)) as Rerun,
>>COUNT(*)+SUM(GREATEST(q1.reruns, 0)) as Runs,
>>FORMAT(avgbottom, 1) as "Avg Bottom",
>>FORMAT(3*stdbottom, 2) as "3 Sig",
>>FORMAT(maxbottom, 1) as Max,
>>FORMAT(minbottom, 1) as Min,
>>SUM(numonep) as Count,
>>SUM(numonep) - SUM(numbottoms) as NAs,
>>CONCAT(FORMAT(100*SUM(numbottoms)/SUM(numonep), 1), '
>> %') as "% Success",
>>FORMAT(3*stdbottom/avgbottom, 2) as "3Sig/Avg",
>>FORMAT(AVG(avgbottom), 1) as 'Wafer Avg',
>>FORMAT(AVG(Wafer3Sigma), 2) as 'Wafer 3 Sigma',
>>FORMAT(AVG(Ranges), 1) as 'Avg Range',
>>FORMAT(3*STD(Ranges), 2) as '3Sig of Ranges',
>>FORMAT(MAX(Ranges), 1) as 'Max Range',
>>FORMAT(MIN(Ranges), 1) as 'Min Range',
>>FORMAT((SUM(numonep) - SUM(numbottoms))/COUNT(*), 1) as
>> 'NAs/Wafer',
>>mx.maxImage as maxImage,
>>mn.minImage as minImage,
>>lr.lrImage as lrlrImage,
>>cm.cmImage as cmcmImage
>>FROM (SELECT target_name_id,
>> ep,
>> wafer_id,
>> COUNT(bottom) as numbottoms,
>> AVG(bottom) as avgbottom,
>> STD(bottom) as stdbottom,
>> MAX(bottom) as maxbottom,
>> MIN(bottom) as minbottom,
>> MAX(date_time) as "LastRun",
>> COUNT(*) as numonep,
>> COUNT(DISTINCT target_name_id, ep, lot_id,
>> data_file_id)-1 as reruns,
>> COUNT(DISTINCT(lot_id)) as Lots,
>> 3*STD(bottom) as Wafer3Sigma,
>> MAX(bottom) - MIN(bottom) as Ranges
>>FROM data_cst
>>WHERE TRUE
>>  AND data_cst.target_name_id IN (775, 776, 777, 778, 779, 780,
>> 45, 44, 116, 117, 118, 119, 120, 121)
>>  AND data_cst.date_time BETWEEN '2010-03-04 00:00:00' AND
>> '2010-03-04 23:59:59'
>>  GROUP BY target_name_id, ep, wafer_id
>>  HAVING count(*) < 999) q1
>>JOIN (SELECT data_cstimage.name as minImage,
>> data_cst.bottom,
>> data_cst.target_name_id,
>> data_cst.date_time,
>> data_cst.lot_id,
>> data_cst.tool_id,
>&g

Re: getting certain rows from a group by

2012-09-19 Thread Larry Martell

   data_cst.tool_id,
data_cst.roiname,
data_cst.recipe_id,
data_cst.ep
 FROM data_cstimage, data_cst
 WHERE data_cstimage.id = data_cst.image_measurer_id) lr
 ON lr.date_time = q1.LastRun
AND lr.target_name_id IN (775, 776, 777, 778, 779,
780, 45, 44, 116, 117, 118, 119, 120, 121)
AND lr.date_time BETWEEN '2010-03-04 00:00:00' AND
'2010-03-04 23:59:59'
   JOIN (SELECT data_cstimage.name as cmImage,
data_cst.bottom,
data_cst.target_name_id,
data_cst.date_time,
data_cst.lot_id,
data_cst.tool_id,
data_cst.roiname,
data_cst.recipe_id,
data_cst.ep
   FROM data_cstimage, data_cst
   WHERE data_cstimage.id = data_cst.image_measurer_id
AND data_cst.target_name_id IN (775, 776, 777, 778,
779, 780, 45, 44, 116, 117, 118, 119, 120, 121)
AND data_cst.date_time BETWEEN '2010-03-04 00:00:00'
AND '2010-03-04 23:59:59'
   GROUP BY target_name_id, ep
   HAVING count(*) < 999
   ORDER BY ABS(bottom - AVG(bottom))) cm
   ON cm.target_name_id = q1.target_name_id
   AND cm.ep = q1.ep
AND cm.target_name_id IN (775, 776, 777, 778, 779,
780, 45, 44, 116, 117, 118, 119, 120, 121)
AND cm.date_time BETWEEN '2010-03-04 00:00:00' AND
'2010-03-04 23:59:59' ,
data_target
WHERE data_target.id = q1.target_name_id
GROUP BY q1.target_name_id, q1.ep


On Wed, Sep 19, 2012 at 3:27 PM, Larry Martell  wrote:
> OK, I think I have this working. The last join was this:
>
> JOIN (select id, target_name_id, ep,date_time from data_cst WHERE
> target_name_id IN (775, 776, 777, 778, 779, 780, 45, 44, 116, 117,
> 118, 119, 120, 121) AND data_cst.date_time BETWEEN '2010-03-04
> 00:00:00' AND '2010-03-04 23:59:59' GROUP BY  target_name_id, ep
> HAVING count(*) < 999 ORDER BY  ABS(data_cst.bottom - AVG(bottom))) cm
> on cm.target_name_id = q1.target_name_id and cm.ep = q1.ep and
> cm.date_time BETWEEN '2010-03-04 00:00:00' AND '2010-03-04 23:59:59'
>
> I have do to more testing to ensure I'm getting the correct rows back.
>
> Thanks for all the help!
>
> On Wed, Sep 19, 2012 at 10:07 AM, Larry Martell  
> wrote:
>> On Tue, Sep 18, 2012 at 7:41 PM, Rick James  wrote:
>>> SELECT  ((the appropriate id))  -- <--
>>> FROM  data_cst, rollup, data_target
>>> WHERE  data_target.name = rollup.Target
>>>   AND  data_cst.ep = rollup.EP
>>>   AND  data_cst.date_time BETWEEN '2010-03-04 00:00:00' AND '2010-03-04 
>>> 23:59:59'
>>>   AND  data_target.id = data_cst.target_name_id
>>> GROUP BY  rollup.Target, rollup.EP
>>> ORDER BY  ABS(data_cst.bottom - rollup.`Avg Bottom`)
>>> LIMIT  1   -- <--
>>
>> rollup is the name of the temp table that I was testing with. It
>> contains the output of the original query. I don't have that in the
>> real world. That's my issue - I can't figure out how to join with the
>> result set and apply the order by to the join condition. What would I
>> use in place of rollup to get the 'Avg Bottom'?
>>
>> Here's what my query looks like now:
>>
>> SELECT data_target.name as Target,
>>q1.ep as EP,
>>COUNT(*) as Wafers,
>>Lots,
>>SUM(numonep)/(COUNT(*)+SUM(CASE WHEN reruns > 0 THEN reruns
>> ELSE 0 END)) as 'Sites/Wafer',
>>MAX(LastRun) as "Last Run",
>>SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END) as Rerun,
>>COUNT(*)+SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END) as Runs,
>>avgbottom as "Avg Bottom",
>>3*stdbottom as "3 Sig",
>>maxbottom as Max,
>>minbottom as Min,
>>SUM(numonep) as Count, SUM(numonep) - SUM(numbottoms) as NAs,
>>100-((SUM(numonep) - SUM(numbottoms))/SUM(numonep)*100) as "% 
>> Success",
>>3*stdbottom/avgbottom as "3Sig/Avg",
>>AVG(avgbottom) as 'Wafer Avg',
>>AVG(Wafer3Sigma) as 'Wafer 3 Sigma',
>>AVG(Ranges) as 'Avg Range',
>>3*STD(Ranges) as '3Sig of R

Re: getting certain rows from a group by

2012-09-19 Thread Larry Martell

OK, I think I have this working. The last join was this:

JOIN (select id, target_name_id, ep,date_time from data_cst WHERE
target_name_id IN (775, 776, 777, 778, 779, 780, 45, 44, 116, 117,
118, 119, 120, 121) AND data_cst.date_time BETWEEN '2010-03-04
00:00:00' AND '2010-03-04 23:59:59' GROUP BY  target_name_id, ep
HAVING count(*) < 999 ORDER BY  ABS(data_cst.bottom - AVG(bottom))) cm
on cm.target_name_id = q1.target_name_id and cm.ep = q1.ep and
cm.date_time BETWEEN '2010-03-04 00:00:00' AND '2010-03-04 23:59:59'

I have do to more testing to ensure I'm getting the correct rows back.

Thanks for all the help!

On Wed, Sep 19, 2012 at 10:07 AM, Larry Martell  wrote:
> On Tue, Sep 18, 2012 at 7:41 PM, Rick James  wrote:
>> SELECT  ((the appropriate id))  -- <--
>> FROM  data_cst, rollup, data_target
>> WHERE  data_target.name = rollup.Target
>>   AND  data_cst.ep = rollup.EP
>>   AND  data_cst.date_time BETWEEN '2010-03-04 00:00:00' AND '2010-03-04 
>> 23:59:59'
>>   AND  data_target.id = data_cst.target_name_id
>> GROUP BY  rollup.Target, rollup.EP
>> ORDER BY  ABS(data_cst.bottom - rollup.`Avg Bottom`)
>> LIMIT  1   -- <--
>
> rollup is the name of the temp table that I was testing with. It
> contains the output of the original query. I don't have that in the
> real world. That's my issue - I can't figure out how to join with the
> result set and apply the order by to the join condition. What would I
> use in place of rollup to get the 'Avg Bottom'?
>
> Here's what my query looks like now:
>
> SELECT data_target.name as Target,
>q1.ep as EP,
>COUNT(*) as Wafers,
>Lots,
>SUM(numonep)/(COUNT(*)+SUM(CASE WHEN reruns > 0 THEN reruns
> ELSE 0 END)) as 'Sites/Wafer',
>MAX(LastRun) as "Last Run",
>SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END) as Rerun,
>COUNT(*)+SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END) as Runs,
>avgbottom as "Avg Bottom",
>3*stdbottom as "3 Sig",
>maxbottom as Max,
>minbottom as Min,
>SUM(numonep) as Count, SUM(numonep) - SUM(numbottoms) as NAs,
>100-((SUM(numonep) - SUM(numbottoms))/SUM(numonep)*100) as "% Success",
>3*stdbottom/avgbottom as "3Sig/Avg",
>AVG(avgbottom) as 'Wafer Avg',
>AVG(Wafer3Sigma) as 'Wafer 3 Sigma',
>AVG(Ranges) as 'Avg Range',
>3*STD(Ranges) as '3Sig of Ranges',
>MAX(Ranges) as 'Max Range',
>MIN(Ranges) as 'Min Range', (SUM(numonep) -
> SUM(numbottoms))/COUNT(*) as 'NAs/Wafer',
>mx.id as maxID,
>mn.id as minID,
>lr.id as lrID
> FROM (SELECT target_name_id,
>  ep,
>  wafer_id,
>  COUNT(bottom) as numbottoms,
>  AVG(bottom) as avgbottom,
>  STD(bottom) as stdbottom,
>  MAX(bottom) as maxbottom,
>  MIN(bottom) as minbottom,
>  MAX(date_time) as "LastRun",
>  COUNT(*) as numonep,
>  COUNT(DISTINCT target_name_id, ep, lot_id,
> data_file_id)-1 as reruns,
>  COUNT(DISTINCT(lot_id)) as Lots,
>  3*STD(bottom) as Wafer3Sigma,
>  MAX(bottom) - MIN(bottom) as Ranges
>   FROM data_cst
>   WHERE target_name_id IN (775, 776, 777, 778, 779, 780, 45, 44,
> 116, 117, 118, 119, 120, 121)
>   AND data_cst.date_time BETWEEN '2010-03-04 00:00:00' AND
> '2010-03-04 23:59:59'
>   GROUP BY target_name_id, ep, wafer_id
>   HAVING count(*) < 999) q1
> join data_cst mn ON mn.bottom = q1.minbottom
> join data_cst mx on mx.bottom = q1.maxbottom
> join data_cst lr on lr.date_time = q1.LastRun,
> data_target
> WHERE data_target.id = q1.target_name_id
> GROUP BY q1.target_name_id, q1.ep;
>
>>
>>> -Original Message-
>>> From: Larry Martell [mailto:larry.mart...@gmail.com]
>>> Sent: Tuesday, September 18, 2012 2:57 PM
>>> To: Rick James
>>> Cc: peter.braw...@earthlink.net; mysql@lists.mysql.com
>>> Subject: Re: getting certain rows from a group by
>>>
>>> On Tue, Sep 18, 2012 at 4:01 PM, Rick James 
>>> wrote:
>>> > SELECT mn.*
>>> >mx.*
>>> >> >FROM ( SELECT @min = MIN(bottom),
>>> >> >  @max = MAX(bottom), ... )
>>> >> >JOIN data_cst mn  ON bo

Re: getting certain rows from a group by

2012-09-19 Thread Larry Martell

On Wed, Sep 19, 2012 at 10:26 AM, Larry Martell  wrote:
> On Wed, Sep 19, 2012 at 12:04 AM,   wrote:
>>>>>> 2012/09/18 06:53 -0400, Larry Martell >>>>
>> This works fine. But now I need to get a certain column
>> (image_measurer_id) with each row returned that corresponds to the row
>> from the group that has bottom = Min(bottom), bottom = Max(bottom),
>> bottom closest to Avg(bottom), and bottom from the row where date_time
>> = Max(date_time).
>> <<<<<<<<
>> There is a standard and ugly way of getting such a thing, but it entails 
>> repeating the table reference (supposing you want any, not every, to match):
>>
>> SELECT data_cst.target_name_id, image_measurer_id, nb, xb, vb, xt, bottom
>> FROM (SELECT Min(bottom) AS nb, Max(bottom) AS xb, MIN(ABS(Avg(bottom) - 
>> bottom)) AS vb, Max(date_time) AS xt
>> FROM data_cst
>>   WHERE target_name_id IN (775, 776, 777, 778, 779, 780, 45, 44, 116, 
>> 117, 118, 119, 120, 121)
>>   AND DATE(data_cst.date_time) = '2010-03-04'
>>   GROUP BY target_name_id, ep, wafer_id
>>   HAVING count(*) < 999) AS st
>> JOIN data_cst ON st.target_name_id = data_cst.target_name_id AND (nb = 
>> bottom OR xb = bottom OR vb = bottom OR date_time = xt)
>>
>> One record will be returned for every row that holds a relevant extremum, 
>> not guaranteed to be unique.
>>
>> This query pertains only to your original subquery, not the whole query. To 
>> get the result to which you refer, join this to your original query, to 
>> which you have added something like vb:
>>
>> SELECT *
>> FROM (original query) JOIN (this query) ON nb = minbottom AND xb = maxbottom 
>> AND xt = "Last Run" AND vb =  AND (original query).target_name_id = 
>> (this_query).target_name_id
>
> Thanks. I will look into doing it like this.
>
> I got the min, max, and date conditions to work by adding the joins
> that Rick suggested:
>
> join data_cst mn ON mn.bottom = q1.minbottom
> join data_cst mx ON mx.bottom = q1.maxbottom
> join data_cst lr ON lr.date_time = q1.LastRun
>
> But now you have me wondering - if this just matching on that one
> field, then it can be returning a row from data_cst that is not part
> of the original result set? Do I need to add all the same select
> criteria to each join?

I verified that this is the case. I was not getting the correct data
back. I had to add:

and mn.target_name_id = q1.target_name_id and mn.ep = q1.ep and
mn.date_time BETWEEN '2010-03-04 00:00:00' AND '2010-03-04 23:59:59'

to each join so I got the correct rows back. So now my only issue is
still how to work the bottom is closest to avg(bottom) join into this.

>
>> --but I am not confident in the result. There are problems in the original 
>> query, the biggest that of avgbottom, stdbottom, maxbottom, minbottom none 
>> are aggregated over wafer_id.
>
> The inner query aggregates over target_name_id, ep, wafer_id, and the
> the outer query aggregates that result set over target_name_id, ep.
> The avg and std are what the client wants - average of averages and
> standards deviations. As are the min and max - the outer query returns
> the min and max of all the wafers for each target, ep.
>
>> Therefore, it is not certain from which record from q1 they are returned. 
>> MySQL tends to pick the first that fits, but not only is nothing guaranteed, 
>> it is explicitly written that if such not aggregated fields appear, the 
>> output is not determinate unless all pertinent are equal, the possibility 
>> wherof the reason for allowing it.
>
> Yes, if case with dups probably had not been thought of by them, and
> will have to be addressed at some point.
>
>>
>> When that has been handled, it is needful to change the foregoing query to 
>> match that one in two levels, because averaging is not associative (there is 
>> also the problem of equality-testing on generated floating-point numbers). 
>> If it were only MAX and MIN, one level of not GROUPing BY wafer_id would be 
>> all right.
>>
>> By the way, in the original query, I suggest instead of
>> data_cst.date_time BETWEEN '2010-03-04 00:00:00' AND
>> '2010-03-04 23:59:59'
>> this,
>> DATE(data_cst.date_time) = '2010-03-04'
>
> The date_time was just an example. Depending on user input the query
> could have times in it.
>
>> Instead of
>>
>>  ... q1,
>>  data_target
>> WHERE data_target.id = target_name_id
>>
>> this would be better:
>>
>>  ... q1 JOIN data_target ON data_target.id = target_name_id
>
> Why is one better then the other?
>
>> I believe that
>>  100-((SUM(numonep) - SUM(numbottoms))/SUM(numonep)*100)
>>= 100*SUM(numbottoms)/SUM(numonep)
>> and
>>  SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END)
>>= SUM(GREATEST(reruns, 0))
>
> Yes, thanks - yours is clearer and I've made those changes.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: getting certain rows from a group by

2012-09-19 Thread Larry Martell

On Wed, Sep 19, 2012 at 12:04 AM,   wrote:
>>>>> 2012/09/18 06:53 -0400, Larry Martell >>>>
> This works fine. But now I need to get a certain column
> (image_measurer_id) with each row returned that corresponds to the row
> from the group that has bottom = Min(bottom), bottom = Max(bottom),
> bottom closest to Avg(bottom), and bottom from the row where date_time
> = Max(date_time).
> <<<<<<<<
> There is a standard and ugly way of getting such a thing, but it entails 
> repeating the table reference (supposing you want any, not every, to match):
>
> SELECT data_cst.target_name_id, image_measurer_id, nb, xb, vb, xt, bottom
> FROM (SELECT Min(bottom) AS nb, Max(bottom) AS xb, MIN(ABS(Avg(bottom) - 
> bottom)) AS vb, Max(date_time) AS xt
> FROM data_cst
>   WHERE target_name_id IN (775, 776, 777, 778, 779, 780, 45, 44, 116, 
> 117, 118, 119, 120, 121)
>   AND DATE(data_cst.date_time) = '2010-03-04'
>   GROUP BY target_name_id, ep, wafer_id
>   HAVING count(*) < 999) AS st
> JOIN data_cst ON st.target_name_id = data_cst.target_name_id AND (nb = bottom 
> OR xb = bottom OR vb = bottom OR date_time = xt)

This:

MIN(ABS(Avg(bottom) - bottom))

Is not valid. It gives:

ERROR  (HY000): Invalid use of group function

Which is why I was doing it with an order by. But I can figure out how
to work that into a join,

>
> One record will be returned for every row that holds a relevant extremum, not 
> guaranteed to be unique.
>
> This query pertains only to your original subquery, not the whole query. To 
> get the result to which you refer, join this to your original query, to which 
> you have added something like vb:
>
> SELECT *
> FROM (original query) JOIN (this query) ON nb = minbottom AND xb = maxbottom 
> AND xt = "Last Run" AND vb =  AND (original query).target_name_id = 
> (this_query).target_name_id
>
> --but I am not confident in the result. There are problems in the original 
> query, the biggest that of avgbottom, stdbottom, maxbottom, minbottom none 
> are aggregated over wafer_id. Therefore, it is not certain from which record 
> from q1 they are returned. MySQL tends to pick the first that fits, but not 
> only is nothing guaranteed, it is explicitly written that if such not 
> aggregated fields appear, the output is not determinate unless all pertinent 
> are equal, the possibility wherof the reason for allowing it.
>
> When that has been handled, it is needful to change the foregoing query to 
> match that one in two levels, because averaging is not associative (there is 
> also the problem of equality-testing on generated floating-point numbers). If 
> it were only MAX and MIN, one level of not GROUPing BY wafer_id would be all 
> right.
>
> By the way, in the original query, I suggest instead of
> data_cst.date_time BETWEEN '2010-03-04 00:00:00' AND
> '2010-03-04 23:59:59'
> this,
> DATE(data_cst.date_time) = '2010-03-04'
>
> Instead of
>
>  ... q1,
>  data_target
> WHERE data_target.id = target_name_id
>
> this would be better:
>
>  ... q1 JOIN data_target ON data_target.id = target_name_id
>
> I believe that
>  100-((SUM(numonep) - SUM(numbottoms))/SUM(numonep)*100)
>= 100*SUM(numbottoms)/SUM(numonep)
> and
>  SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END)
>= SUM(GREATEST(reruns, 0))
>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:http://lists.mysql.com/mysql
>

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: getting certain rows from a group by

2012-09-19 Thread Larry Martell

On Wed, Sep 19, 2012 at 12:04 AM,   wrote:
>>>>> 2012/09/18 06:53 -0400, Larry Martell >>>>
> This works fine. But now I need to get a certain column
> (image_measurer_id) with each row returned that corresponds to the row
> from the group that has bottom = Min(bottom), bottom = Max(bottom),
> bottom closest to Avg(bottom), and bottom from the row where date_time
> = Max(date_time).
> <<<<<<<<
> There is a standard and ugly way of getting such a thing, but it entails 
> repeating the table reference (supposing you want any, not every, to match):
>
> SELECT data_cst.target_name_id, image_measurer_id, nb, xb, vb, xt, bottom
> FROM (SELECT Min(bottom) AS nb, Max(bottom) AS xb, MIN(ABS(Avg(bottom) - 
> bottom)) AS vb, Max(date_time) AS xt
> FROM data_cst
>   WHERE target_name_id IN (775, 776, 777, 778, 779, 780, 45, 44, 116, 
> 117, 118, 119, 120, 121)
>   AND DATE(data_cst.date_time) = '2010-03-04'
>   GROUP BY target_name_id, ep, wafer_id
>   HAVING count(*) < 999) AS st
> JOIN data_cst ON st.target_name_id = data_cst.target_name_id AND (nb = bottom 
> OR xb = bottom OR vb = bottom OR date_time = xt)
>
> One record will be returned for every row that holds a relevant extremum, not 
> guaranteed to be unique.
>
> This query pertains only to your original subquery, not the whole query. To 
> get the result to which you refer, join this to your original query, to which 
> you have added something like vb:
>
> SELECT *
> FROM (original query) JOIN (this query) ON nb = minbottom AND xb = maxbottom 
> AND xt = "Last Run" AND vb =  AND (original query).target_name_id = 
> (this_query).target_name_id

Thanks. I will look into doing it like this.

I got the min, max, and date conditions to work by adding the joins
that Rick suggested:

join data_cst mn ON mn.bottom = q1.minbottom
join data_cst mx ON mx.bottom = q1.maxbottom
join data_cst lr ON lr.date_time = q1.LastRun

But now you have me wondering - if this just matching on that one
field, then it can be returning a row from data_cst that is not part
of the original result set? Do I need to add all the same select
criteria to each join?

> --but I am not confident in the result. There are problems in the original 
> query, the biggest that of avgbottom, stdbottom, maxbottom, minbottom none 
> are aggregated over wafer_id.

The inner query aggregates over target_name_id, ep, wafer_id, and the
the outer query aggregates that result set over target_name_id, ep.
The avg and std are what the client wants - average of averages and
standards deviations. As are the min and max - the outer query returns
the min and max of all the wafers for each target, ep.

> Therefore, it is not certain from which record from q1 they are returned. 
> MySQL tends to pick the first that fits, but not only is nothing guaranteed, 
> it is explicitly written that if such not aggregated fields appear, the 
> output is not determinate unless all pertinent are equal, the possibility 
> wherof the reason for allowing it.

Yes, if case with dups probably had not been thought of by them, and
will have to be addressed at some point.

>
> When that has been handled, it is needful to change the foregoing query to 
> match that one in two levels, because averaging is not associative (there is 
> also the problem of equality-testing on generated floating-point numbers). If 
> it were only MAX and MIN, one level of not GROUPing BY wafer_id would be all 
> right.
>
> By the way, in the original query, I suggest instead of
> data_cst.date_time BETWEEN '2010-03-04 00:00:00' AND
> '2010-03-04 23:59:59'
> this,
> DATE(data_cst.date_time) = '2010-03-04'

The date_time was just an example. Depending on user input the query
could have times in it.

> Instead of
>
>  ... q1,
>  data_target
> WHERE data_target.id = target_name_id
>
> this would be better:
>
>  ... q1 JOIN data_target ON data_target.id = target_name_id

Why is one better then the other?

> I believe that
>  100-((SUM(numonep) - SUM(numbottoms))/SUM(numonep)*100)
>= 100*SUM(numbottoms)/SUM(numonep)
> and
>  SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END)
>= SUM(GREATEST(reruns, 0))

Yes, thanks - yours is clearer and I've made those changes.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: getting certain rows from a group by

2012-09-19 Thread Larry Martell

On Tue, Sep 18, 2012 at 7:41 PM, Rick James  wrote:
> SELECT  ((the appropriate id))  -- <--
> FROM  data_cst, rollup, data_target
> WHERE  data_target.name = rollup.Target
>   AND  data_cst.ep = rollup.EP
>   AND  data_cst.date_time BETWEEN '2010-03-04 00:00:00' AND '2010-03-04 
> 23:59:59'
>   AND  data_target.id = data_cst.target_name_id
> GROUP BY  rollup.Target, rollup.EP
> ORDER BY  ABS(data_cst.bottom - rollup.`Avg Bottom`)
> LIMIT  1   -- <--

rollup is the name of the temp table that I was testing with. It
contains the output of the original query. I don't have that in the
real world. That's my issue - I can't figure out how to join with the
result set and apply the order by to the join condition. What would I
use in place of rollup to get the 'Avg Bottom'?

Here's what my query looks like now:

SELECT data_target.name as Target,
   q1.ep as EP,
   COUNT(*) as Wafers,
   Lots,
   SUM(numonep)/(COUNT(*)+SUM(CASE WHEN reruns > 0 THEN reruns
ELSE 0 END)) as 'Sites/Wafer',
   MAX(LastRun) as "Last Run",
   SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END) as Rerun,
   COUNT(*)+SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END) as Runs,
   avgbottom as "Avg Bottom",
   3*stdbottom as "3 Sig",
   maxbottom as Max,
   minbottom as Min,
   SUM(numonep) as Count, SUM(numonep) - SUM(numbottoms) as NAs,
   100-((SUM(numonep) - SUM(numbottoms))/SUM(numonep)*100) as "% Success",
   3*stdbottom/avgbottom as "3Sig/Avg",
   AVG(avgbottom) as 'Wafer Avg',
   AVG(Wafer3Sigma) as 'Wafer 3 Sigma',
   AVG(Ranges) as 'Avg Range',
   3*STD(Ranges) as '3Sig of Ranges',
   MAX(Ranges) as 'Max Range',
   MIN(Ranges) as 'Min Range', (SUM(numonep) -
SUM(numbottoms))/COUNT(*) as 'NAs/Wafer',
   mx.id as maxID,
   mn.id as minID,
   lr.id as lrID
FROM (SELECT target_name_id,
 ep,
 wafer_id,
 COUNT(bottom) as numbottoms,
 AVG(bottom) as avgbottom,
 STD(bottom) as stdbottom,
 MAX(bottom) as maxbottom,
 MIN(bottom) as minbottom,
 MAX(date_time) as "LastRun",
 COUNT(*) as numonep,
 COUNT(DISTINCT target_name_id, ep, lot_id,
data_file_id)-1 as reruns,
 COUNT(DISTINCT(lot_id)) as Lots,
 3*STD(bottom) as Wafer3Sigma,
 MAX(bottom) - MIN(bottom) as Ranges
  FROM data_cst
  WHERE target_name_id IN (775, 776, 777, 778, 779, 780, 45, 44,
116, 117, 118, 119, 120, 121)
  AND data_cst.date_time BETWEEN '2010-03-04 00:00:00' AND
'2010-03-04 23:59:59'
  GROUP BY target_name_id, ep, wafer_id
  HAVING count(*) < 999) q1
join data_cst mn ON mn.bottom = q1.minbottom
join data_cst mx on mx.bottom = q1.maxbottom
join data_cst lr on lr.date_time = q1.LastRun,
data_target
WHERE data_target.id = q1.target_name_id
GROUP BY q1.target_name_id, q1.ep;

>
>> -Original Message-
>> From: Larry Martell [mailto:larry.mart...@gmail.com]
>> Sent: Tuesday, September 18, 2012 2:57 PM
>> To: Rick James
>> Cc: peter.braw...@earthlink.net; mysql@lists.mysql.com
>> Subject: Re: getting certain rows from a group by
>>
>> On Tue, Sep 18, 2012 at 4:01 PM, Rick James 
>> wrote:
>> > SELECT mn.*
>> >mx.*
>> >> >FROM ( SELECT @min = MIN(bottom),
>> >> >  @max = MAX(bottom), ... )
>> >> >JOIN data_cst mn  ON bottom = @min
>> >> >JOIN data_cst mx  ON bottom = @max
>> >
>> > That is, the FROM finds the value for the desired row.
>> > The JOIN then gets to the rest of the fields.
>> > Caveat:  If two rows have bottom = @min, it returns both.
>>
>>
>> Thanks a lot Rick! This was super helpful. I've got it working for 3 of
>> the 4 cases - min(bottom), max(bottom), and max(date_time). But I can't
>> figure out how to work in the last case - where bottom is closest to
>> avg(bottom). In an individual query I can get it with an order by, like
>> this:
>>
>> SELECT rollup.Target, rollup.EP, rollup.`Avg Bottom`, data_cst.id,
>> data_cst.bottom FROM data_cst, rollup, data_target WHERE
>> data_target.name = rollup.Target AND data_cst.ep = rollup.EP AND
>> data_cst.date_time BETWEEN '2010-03-04 00:00:00' AND '2010-03-04
>> 23:59:59'
>> AND data_target.id = data_cst.target_name_id GROUP BY rollup.Target,
>> rollup.EP ORDER BY ABS(data_cst.bottom - rollup.`Avg Bottom`);
>>
>> Any

Re: getting certain rows from a group by

2012-09-18 Thread Larry Martell

On Tue, Sep 18, 2012 at 4:01 PM, Rick James  wrote:
> SELECT mn.*
>mx.*
>> >FROM ( SELECT @min = MIN(bottom),
>> >  @max = MAX(bottom), ... )
>> >JOIN data_cst mn  ON bottom = @min
>> >JOIN data_cst mx  ON bottom = @max
>
> That is, the FROM finds the value for the desired row.
> The JOIN then gets to the rest of the fields.
> Caveat:  If two rows have bottom = @min, it returns both.


Thanks a lot Rick! This was super helpful. I've got it working for 3
of the 4 cases - min(bottom), max(bottom), and max(date_time). But I
can't figure out how to work in the last case - where bottom is
closest to avg(bottom). In an individual query I can get it with an
order by, like this:

SELECT rollup.Target, rollup.EP, rollup.`Avg Bottom`, data_cst.id,
data_cst.bottom
FROM data_cst, rollup, data_target
WHERE data_target.name = rollup.Target
AND data_cst.ep = rollup.EP
AND data_cst.date_time BETWEEN '2010-03-04 00:00:00' AND '2010-03-04 23:59:59'
AND data_target.id = data_cst.target_name_id
GROUP BY rollup.Target, rollup.EP
ORDER BY ABS(data_cst.bottom - rollup.`Avg Bottom`);

Any way to work that into another join?

Thanks!

>
>> -Original Message-
>> From: Larry Martell [mailto:larry.mart...@gmail.com]
>> Sent: Tuesday, September 18, 2012 12:54 PM
>> To: Rick James
>> Cc: peter.braw...@earthlink.net; mysql@lists.mysql.com
>> Subject: Re: getting certain rows from a group by
>>
>> On Tue, Sep 18, 2012 at 2:05 PM, Rick James 
>> wrote:
>> > For single query...
>> >
>> > Plan A:
>> > Repeat the SELECT as once for each (MIN, AVG, etc) as a subquery in
>> >WHERE bottom = ( SELECT MIN(bottom) ... )
>> >
>> > Plan B:
>> >FROM ( SELECT @min = MIN(bottom),
>> >  @max = MAX(bottom), ... )
>> >JOIN data_cst mn  ON bottom = @min
>> >JOIN data_cst mx  ON bottom = @max
>>
>>
>> Can you elaborate on this? I don't see how this will give me what I
>> need. I'm not looking for the min or max bottom - I already have that
>> - I'm looking for the row from each group that has the min and max
>> bottom.
>>
>> > Plan C:
>> > Get rid of 3rd party packages that eventually get in your way instead
>> of 'helping'.
>>
>> It's not django that is getting in the way. django is a fantastic
>> framework for web development. It's just changing requirements. The
>> original developer (who no longer works here) created a few different
>> classes, and then based on the original requirements, chose to use the
>> one that supports a single query. That worked then, but may not work
>> now. There are other classes that allow multiple queries, but then you
>> have to parse the data and stuff it into the context that will be
>> picked up by the browser yourself. The single query class does that for
>> you. I may yet go that way, but I'm trying to avoid a big rewrite if I
>> can.
>>
>> >> -Original Message-
>> >> From: Larry Martell [mailto:larry.mart...@gmail.com]
>> >> Sent: Tuesday, September 18, 2012 4:57 AM
>> >> To: peter.braw...@earthlink.net
>> >> Cc: mysql@lists.mysql.com
>> >> Subject: Re: getting certain rows from a group by
>> >>
>> >> On Tue, Sep 18, 2012 at 7:52 AM, Peter Brawley
>> >>  wrote:
>> >> > On 2012-09-18 5:53 AM, Larry Martell wrote:
>> >> >>
>> >> >> I have this query:
>> >> >>
>> >> >> SELECT data_target.name as Target,
>> >> >> q1.ep as EP,
>> >> >> COUNT(*) as Wafers,
>> >> >> Lots,
>> >> >> SUM(numonep)/(COUNT(*)+SUM(CASE WHEN reruns > 0 THEN
>> >> >> reruns ELSE 0 END)) as 'Sites/Wafer',
>> >> >> MAX(LastRun) as "Last Run",
>> >> >> SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END) as
>> Rerun,
>> >> >> COUNT(*)+SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END)
>> >> >> as
>> >> Runs,
>> >> >> avgbottom as "Avg Bottom",
>> >> >> 3*stdbottom as "3 Sig",
>> >> >> maxbottom as Max,
>> >> >> minbottom as Min,
>> >> >> SUM(numonep) as Count,
>> >> >> SUM(numonep) - SUM(numbottoms) as NAs,
>> >&

Re: getting certain rows from a group by

2012-09-18 Thread Larry Martell

On Tue, Sep 18, 2012 at 2:05 PM, Rick James  wrote:
> For single query...
>
> Plan A:
> Repeat the SELECT as once for each (MIN, AVG, etc) as a subquery in
>WHERE bottom = ( SELECT MIN(bottom) ... )
>
> Plan B:
>FROM ( SELECT @min = MIN(bottom),
>  @max = MAX(bottom), ... )
>JOIN data_cst mn  ON bottom = @min
>JOIN data_cst mx  ON bottom = @max


Can you elaborate on this? I don't see how this will give me what I
need. I'm not looking for the min or max bottom - I already have that
- I'm looking for the row from each group that has the min and max
bottom.

> Plan C:
> Get rid of 3rd party packages that eventually get in your way instead of 
> 'helping'.

It's not django that is getting in the way. django is a fantastic
framework for web development. It's just changing requirements. The
original developer (who no longer works here) created a few different
classes, and then based on the original requirements, chose to use the
one that supports a single query. That worked then, but may not work
now. There are other classes that allow multiple queries, but then you
have to parse the data and stuff it into the context that will be
picked up by the browser yourself. The single query class does that
for you. I may yet go that way, but I'm trying to avoid a big rewrite
if I can.

>> -Original Message-
>> From: Larry Martell [mailto:larry.mart...@gmail.com]
>> Sent: Tuesday, September 18, 2012 4:57 AM
>> To: peter.braw...@earthlink.net
>> Cc: mysql@lists.mysql.com
>> Subject: Re: getting certain rows from a group by
>>
>> On Tue, Sep 18, 2012 at 7:52 AM, Peter Brawley
>>  wrote:
>> > On 2012-09-18 5:53 AM, Larry Martell wrote:
>> >>
>> >> I have this query:
>> >>
>> >> SELECT data_target.name as Target,
>> >> q1.ep as EP,
>> >> COUNT(*) as Wafers,
>> >> Lots,
>> >> SUM(numonep)/(COUNT(*)+SUM(CASE WHEN reruns > 0 THEN reruns
>> >> ELSE 0 END)) as 'Sites/Wafer',
>> >> MAX(LastRun) as "Last Run",
>> >> SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END) as Rerun,
>> >> COUNT(*)+SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END) as
>> Runs,
>> >> avgbottom as "Avg Bottom",
>> >> 3*stdbottom as "3 Sig",
>> >> maxbottom as Max,
>> >> minbottom as Min,
>> >> SUM(numonep) as Count,
>> >> SUM(numonep) - SUM(numbottoms) as NAs,
>> >> 100-((SUM(numonep) - SUM(numbottoms))/SUM(numonep)*100) as
>> "%
>> >> Success",
>> >> 3*stdbottom/avgbottom as "3Sig/Avg",
>> >> AVG(avgbottom) as 'Wafer Avg',
>> >> AVG(Wafer3Sigma) as 'Wafer 3 Sigma',
>> >> AVG(Ranges) as 'Avg Range',
>> >> 3*STD(Ranges) as '3Sig of Ranges',
>> >> MAX(Ranges) as 'Max Range',
>> >> MIN(Ranges) as 'Min Range',
>> >> (SUM(numonep) - SUM(numbottoms))/COUNT(*) as 'NAs/Wafer'
>> >> FROM (SELECT target_name_id,
>> >>   ep,
>> >>   wafer_id,
>> >>   COUNT(bottom) as numbottoms,
>> >>   AVG(bottom) as avgbottom,
>> >>   STD(bottom) as stdbottom,
>> >>   MAX(bottom) as maxbottom,
>> >>   MIN(bottom) as minbottom,
>> >>   MAX(date_time) as "LastRun",
>> >>   COUNT(*) as numonep,
>> >>   COUNT(DISTINCT target_name_id, ep, lot_id,
>> >> data_file_id)-1 as reruns,
>> >>   COUNT(DISTINCT(lot_id)) as Lots,
>> >>   3*STD(bottom) as Wafer3Sigma,
>> >>   MAX(bottom) - MIN(bottom) as Ranges
>> >>FROM data_cst
>> >>WHERE target_name_id IN (775, 776, 777, 778, 779, 780, 45,
>> 44,
>> >> 116, 117, 118, 119, 120, 121)
>> >>AND data_cst.date_time BETWEEN '2010-03-04 00:00:00' AND
>> >> '2010-03-04 23:59:59'
>> >>GROUP BY target_name_id, ep, wafer_id
>> >>HAVING count(*) < 999) q1,
>> >>   data_target
>> >> WHERE data_target.id = target_name_id GROUP BY q1.target_name_id,
>> >> q1.ep;
>> >>
>> >>
>>

Re: getting certain rows from a group by

2012-09-18 Thread Larry Martell

On Tue, Sep 18, 2012 at 7:56 AM, Larry Martell  wrote:
> On Tue, Sep 18, 2012 at 7:52 AM, Peter Brawley
>  wrote:
>> On 2012-09-18 5:53 AM, Larry Martell wrote:
>>>
>>> I have this query:
>>>
>>> SELECT data_target.name as Target,
>>> q1.ep as EP,
>>> COUNT(*) as Wafers,
>>> Lots,
>>> SUM(numonep)/(COUNT(*)+SUM(CASE WHEN reruns > 0 THEN reruns
>>> ELSE 0 END)) as 'Sites/Wafer',
>>> MAX(LastRun) as "Last Run",
>>> SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END) as Rerun,
>>> COUNT(*)+SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END) as Runs,
>>> avgbottom as "Avg Bottom",
>>> 3*stdbottom as "3 Sig",
>>> maxbottom as Max,
>>> minbottom as Min,
>>> SUM(numonep) as Count,
>>> SUM(numonep) - SUM(numbottoms) as NAs,
>>> 100-((SUM(numonep) - SUM(numbottoms))/SUM(numonep)*100) as "%
>>> Success",
>>> 3*stdbottom/avgbottom as "3Sig/Avg",
>>> AVG(avgbottom) as 'Wafer Avg',
>>> AVG(Wafer3Sigma) as 'Wafer 3 Sigma',
>>> AVG(Ranges) as 'Avg Range',
>>> 3*STD(Ranges) as '3Sig of Ranges',
>>> MAX(Ranges) as 'Max Range',
>>> MIN(Ranges) as 'Min Range',
>>> (SUM(numonep) - SUM(numbottoms))/COUNT(*) as 'NAs/Wafer'
>>> FROM (SELECT target_name_id,
>>>   ep,
>>>   wafer_id,
>>>   COUNT(bottom) as numbottoms,
>>>   AVG(bottom) as avgbottom,
>>>   STD(bottom) as stdbottom,
>>>   MAX(bottom) as maxbottom,
>>>   MIN(bottom) as minbottom,
>>>   MAX(date_time) as "LastRun",
>>>   COUNT(*) as numonep,
>>>   COUNT(DISTINCT target_name_id, ep, lot_id,
>>> data_file_id)-1 as reruns,
>>>   COUNT(DISTINCT(lot_id)) as Lots,
>>>   3*STD(bottom) as Wafer3Sigma,
>>>   MAX(bottom) - MIN(bottom) as Ranges
>>>FROM data_cst
>>>WHERE target_name_id IN (775, 776, 777, 778, 779, 780, 45, 44,
>>> 116, 117, 118, 119, 120, 121)
>>>AND data_cst.date_time BETWEEN '2010-03-04 00:00:00' AND
>>> '2010-03-04 23:59:59'
>>>GROUP BY target_name_id, ep, wafer_id
>>>HAVING count(*) < 999) q1,
>>>   data_target
>>> WHERE data_target.id = target_name_id
>>> GROUP BY q1.target_name_id, q1.ep;
>>>
>>>
>>> This works fine. But now I need to get a certain column
>>> (image_measurer_id) with each row returned that corresponds to the row
>>> from the group that has bottom = Min(bottom), bottom = Max(bottom),
>>> bottom closest to Avg(bottom), and bottom from the row where date_time
>>> = Max(date_time).
>>>
>>> Is this even possible from one query?
>>
>>
>> Might be, but what's the importance of doing it as one query?
>
> Because it's part of a django based web app, and the class that this
> is part of only supports having a single query. To use multiple
> queries will require a fairly major rewrite of the server side of that
> app.
>
>> I'd start by
>> saving this result to a temp table and developing the new query. When that's
>> running, see if you can to optimise a query built by replacing the reference
>> to the temp table with the original query.
>
> Thanks, I'll look into this.

So by creating a temp table, and then using that in subsequent
queries, I can get what I need. But trying to incorporate that into
original query will seem to require a bunch of hairy subqueries (and
there already is one in the original query).

After running the above query into a temp table, rollup, here are the
4 queries I came up with to get what I need:

SELECT rollup.Target, rollup.EP, rollup.`Avg Bottom`, data_cst.id,
data_cst.bottom
FROM data_cst, rollup, data_target
WHERE data_target.name = rollup.Target
AND data_cst.ep = rollup.EP
AND data_cst.date_time BETWEEN '2010-03-04 00:00:00' AND '2010-03-04 23:59:59'
AND data_target.id = data_cst.target_name_id
GROUP BY rollup.Target, rollup.EP
ORDER BY ABS(data_cst.bottom - rollup.`Avg Bottom`);

SELECT rollup.Target, rollup.EP, rollup.`Last Run`, data_cst.id,
data_cst.date_time
FROM data_cst, rollup, data_target
WHERE data_targe

Re: getting certain rows from a group by

2012-09-18 Thread Larry Martell

On Tue, Sep 18, 2012 at 7:52 AM, Peter Brawley
 wrote:
> On 2012-09-18 5:53 AM, Larry Martell wrote:
>>
>> I have this query:
>>
>> SELECT data_target.name as Target,
>> q1.ep as EP,
>> COUNT(*) as Wafers,
>> Lots,
>> SUM(numonep)/(COUNT(*)+SUM(CASE WHEN reruns > 0 THEN reruns
>> ELSE 0 END)) as 'Sites/Wafer',
>> MAX(LastRun) as "Last Run",
>> SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END) as Rerun,
>> COUNT(*)+SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END) as Runs,
>> avgbottom as "Avg Bottom",
>> 3*stdbottom as "3 Sig",
>> maxbottom as Max,
>> minbottom as Min,
>> SUM(numonep) as Count,
>> SUM(numonep) - SUM(numbottoms) as NAs,
>> 100-((SUM(numonep) - SUM(numbottoms))/SUM(numonep)*100) as "%
>> Success",
>> 3*stdbottom/avgbottom as "3Sig/Avg",
>> AVG(avgbottom) as 'Wafer Avg',
>> AVG(Wafer3Sigma) as 'Wafer 3 Sigma',
>> AVG(Ranges) as 'Avg Range',
>> 3*STD(Ranges) as '3Sig of Ranges',
>> MAX(Ranges) as 'Max Range',
>> MIN(Ranges) as 'Min Range',
>> (SUM(numonep) - SUM(numbottoms))/COUNT(*) as 'NAs/Wafer'
>> FROM (SELECT target_name_id,
>>   ep,
>>   wafer_id,
>>   COUNT(bottom) as numbottoms,
>>   AVG(bottom) as avgbottom,
>>   STD(bottom) as stdbottom,
>>   MAX(bottom) as maxbottom,
>>   MIN(bottom) as minbottom,
>>   MAX(date_time) as "LastRun",
>>   COUNT(*) as numonep,
>>   COUNT(DISTINCT target_name_id, ep, lot_id,
>> data_file_id)-1 as reruns,
>>   COUNT(DISTINCT(lot_id)) as Lots,
>>   3*STD(bottom) as Wafer3Sigma,
>>   MAX(bottom) - MIN(bottom) as Ranges
>>FROM data_cst
>>WHERE target_name_id IN (775, 776, 777, 778, 779, 780, 45, 44,
>> 116, 117, 118, 119, 120, 121)
>>AND data_cst.date_time BETWEEN '2010-03-04 00:00:00' AND
>> '2010-03-04 23:59:59'
>>GROUP BY target_name_id, ep, wafer_id
>>HAVING count(*) < 999) q1,
>>   data_target
>> WHERE data_target.id = target_name_id
>> GROUP BY q1.target_name_id, q1.ep;
>>
>>
>> This works fine. But now I need to get a certain column
>> (image_measurer_id) with each row returned that corresponds to the row
>> from the group that has bottom = Min(bottom), bottom = Max(bottom),
>> bottom closest to Avg(bottom), and bottom from the row where date_time
>> = Max(date_time).
>>
>> Is this even possible from one query?
>
>
> Might be, but what's the importance of doing it as one query?

Because it's part of a django based web app, and the class that this
is part of only supports having a single query. To use multiple
queries will require a fairly major rewrite of the server side of that
app.

> I'd start by
> saving this result to a temp table and developing the new query. When that's
> running, see if you can to optimise a query built by replacing the reference
> to the temp table with the original query.

Thanks, I'll look into this.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

getting certain rows from a group by

2012-09-18 Thread Larry Martell

I have this query:

SELECT data_target.name as Target,
   q1.ep as EP,
   COUNT(*) as Wafers,
   Lots,
   SUM(numonep)/(COUNT(*)+SUM(CASE WHEN reruns > 0 THEN reruns
ELSE 0 END)) as 'Sites/Wafer',
   MAX(LastRun) as "Last Run",
   SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END) as Rerun,
   COUNT(*)+SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END) as Runs,
   avgbottom as "Avg Bottom",
   3*stdbottom as "3 Sig",
   maxbottom as Max,
   minbottom as Min,
   SUM(numonep) as Count,
   SUM(numonep) - SUM(numbottoms) as NAs,
   100-((SUM(numonep) - SUM(numbottoms))/SUM(numonep)*100) as "% Success",
   3*stdbottom/avgbottom as "3Sig/Avg",
   AVG(avgbottom) as 'Wafer Avg',
   AVG(Wafer3Sigma) as 'Wafer 3 Sigma',
   AVG(Ranges) as 'Avg Range',
   3*STD(Ranges) as '3Sig of Ranges',
   MAX(Ranges) as 'Max Range',
   MIN(Ranges) as 'Min Range',
   (SUM(numonep) - SUM(numbottoms))/COUNT(*) as 'NAs/Wafer'
FROM (SELECT target_name_id,
 ep,
 wafer_id,
 COUNT(bottom) as numbottoms,
 AVG(bottom) as avgbottom,
 STD(bottom) as stdbottom,
 MAX(bottom) as maxbottom,
 MIN(bottom) as minbottom,
 MAX(date_time) as "LastRun",
 COUNT(*) as numonep,
 COUNT(DISTINCT target_name_id, ep, lot_id,
data_file_id)-1 as reruns,
 COUNT(DISTINCT(lot_id)) as Lots,
 3*STD(bottom) as Wafer3Sigma,
 MAX(bottom) - MIN(bottom) as Ranges
  FROM data_cst
  WHERE target_name_id IN (775, 776, 777, 778, 779, 780, 45, 44,
116, 117, 118, 119, 120, 121)
  AND data_cst.date_time BETWEEN '2010-03-04 00:00:00' AND
'2010-03-04 23:59:59'
  GROUP BY target_name_id, ep, wafer_id
  HAVING count(*) < 999) q1,
 data_target
WHERE data_target.id = target_name_id
GROUP BY q1.target_name_id, q1.ep;


This works fine. But now I need to get a certain column
(image_measurer_id) with each row returned that corresponds to the row
from the group that has bottom = Min(bottom), bottom = Max(bottom),
bottom closest to Avg(bottom), and bottom from the row where date_time
= Max(date_time).

Is this even possible from one query?

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: help with correlated subquery

2012-08-23 Thread Larry Martell

On Thu, Aug 23, 2012 at 8:08 AM, Larry Martell  wrote:
> On Tue, Aug 21, 2012 at 8:26 PM, Larry Martell  
> wrote:
>> On Tue, Aug 21, 2012 at 8:07 PM,   wrote:
>>>>>>> 2012/08/21 16:35 -0600, Larry Martell >>>>
>>> I am trying to write a query that selects from both a correlated
>>> subquery and a table in the main query, and I'm having a lot of
>>> trouble getting the proper row count. I'm sure this is very simple,
>>> and I'm just missing it. I'll try and present a simple example. For
>>> this example, there are 27 rows, organized like this:
>>>
>>>
>>> mysql> select count(*), target_name_id, ep, wafer_id from data_cst
>>> where target_name_id = 44 group by target_name_id, ep, wafer_id;
>>> +--++--+--+
>>> | count(*) | target_name_id | ep   | wafer_id |
>>> +--++--+--+
>>> |6 | 44 | 1,1  |   16 |
>>> |3 | 44 | 1,1  |   17 |
>>> |6 | 44 | 1,2  |   16 |
>>> |3 | 44 | 1,2  |   17 |
>>> |6 | 44 | 1,3  |   16 |
>>> |3 | 44 | 1,3  |   17 |
>>> +--++--+--+
>>> 6 rows in set (0.00 sec)
>>>
>>> I need to get an average of a column grouped by target_name_id, ep as
>>> well as the average of the averages grouped by target_name_id, ep,
>>> wafer_id, and I also need the count of the rows in the target_name_id,
>>> ep group. My query is getting the correct averages, but incorrect row
>>> counts:
>>>
>>> mysql> select count(*), target_name_id, ep, avg(bottom), avg(averages)
>>> from (select avg(bottom) as averages, target_name_id as t, ep as e
>>> from data_cst where target_name_id = 44 group by target_name_id, ep,
>>> wafer_id) x, data_cst where target_name_id = t and ep = e group by
>>> target_name_id, ep;
>>> +--++--+-+-+
>>> | count(*) | target_name_id | ep   | avg(bottom) | avg(averages)   |
>>> +--++--+-+-+
>>> |   18 | 44 | 1,1  |  21.8056667 |  21.8545833 |
>>> |   18 | 44 | 1,2  | 121.798 | 121.83983335000 |
>>> |   18 | 44 | 1,3  | 349.763 | 349.75016665000 |
>>> +--++--+-+-+
>>> 3 rows in set (0.01 sec)
>>>
>>> The count for each row should be 9. What do I need in my count() to be
>>> counting the right thing?
>>> <<<<<<<<
>>> Your trouble lys in the joining; in effect, you are joining a row with 
>>> wafer_id 16 with a row with wafer_id 17, and also a row with wafer_id 17 
>>> with a row with wafer_id 16.
>>>
>>> A further advantage to using the now standard form of joining, as Rick 
>>> James bids you do, is that one can add further conditions to it:
>>>
>>> select count(*), target_name_id, ep, avg(bottom), avg(averages)
>>> from (select avg(bottom) as averages, target_name_id, ep
>>> from data_cst
>>> where target_name_id = 44
>>> group by target_name_id, ep, wafer_id) x
>>> JOIN data_cst
>>> ON target_name_id = x.target_name_id and ep = x.ep and wafer_id < x.wafer_id
>>> group by target_name_id, ep
>>>
>>> The inequality, maybe, will give you what you want.
>>
>> Not, wafer_id < x.wafer_id, but wafer_id = x.wafer_id - adding that
>> makes it work the way I want. Thanks!!
>
> So now that I have this working, that have added another requirement.
> They also want a count of rows aggregated by a different set of
> columns. So I need to add another subquery, but I can't figure out how
> to get the row count. In this example query:
>
> mysql> select count(*) from data_cst where target_name_id=208082 and
> wafer_id=425845 group by target_name_id,wafer_id,lot_id,data_file_id;
> +--+
> | count(*) |
> +--+
> |   12 |
> |   12 |
> |   12 |
> |   12 |
> +--+
> 4 rows in set (0.00 sec)
>
>
> The result I need is 4 (i.e there were 4 distinct groups of
> target_name_id,wafer_id,lot_id,data_file_id). How can I get that?
>
> This will be a subquery, so I can't use mysql_num_rows() or
> FOUND_ROWS() after the fact. I need the result returned from the
> query.

I got this working:

select count(distinct lot_id,data_file_id)
  from data_cst
 where target_name_id=208082
   and wafer_id=425845;

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: help with correlated subquery

2012-08-23 Thread Larry Martell

On Tue, Aug 21, 2012 at 8:26 PM, Larry Martell  wrote:
> On Tue, Aug 21, 2012 at 8:07 PM,   wrote:
>>>>>> 2012/08/21 16:35 -0600, Larry Martell >>>>
>> I am trying to write a query that selects from both a correlated
>> subquery and a table in the main query, and I'm having a lot of
>> trouble getting the proper row count. I'm sure this is very simple,
>> and I'm just missing it. I'll try and present a simple example. For
>> this example, there are 27 rows, organized like this:
>>
>>
>> mysql> select count(*), target_name_id, ep, wafer_id from data_cst
>> where target_name_id = 44 group by target_name_id, ep, wafer_id;
>> +--++--+--+
>> | count(*) | target_name_id | ep   | wafer_id |
>> +--++--+--+
>> |6 | 44 | 1,1  |   16 |
>> |3 | 44 | 1,1  |   17 |
>> |6 | 44 | 1,2  |   16 |
>> |3 | 44 | 1,2  |   17 |
>> |6 | 44 | 1,3  |   16 |
>> |3 | 44 | 1,3  |   17 |
>> +--++--+--+
>> 6 rows in set (0.00 sec)
>>
>> I need to get an average of a column grouped by target_name_id, ep as
>> well as the average of the averages grouped by target_name_id, ep,
>> wafer_id, and I also need the count of the rows in the target_name_id,
>> ep group. My query is getting the correct averages, but incorrect row
>> counts:
>>
>> mysql> select count(*), target_name_id, ep, avg(bottom), avg(averages)
>> from (select avg(bottom) as averages, target_name_id as t, ep as e
>> from data_cst where target_name_id = 44 group by target_name_id, ep,
>> wafer_id) x, data_cst where target_name_id = t and ep = e group by
>> target_name_id, ep;
>> +--++--+-+-+
>> | count(*) | target_name_id | ep   | avg(bottom) | avg(averages)   |
>> +--++--+-+-+
>> |   18 | 44 | 1,1  |  21.8056667 |  21.8545833 |
>> |   18 | 44 | 1,2  | 121.798 | 121.83983335000 |
>> |   18 | 44 | 1,3  | 349.763 | 349.75016665000 |
>> +--++--+-+-+
>> 3 rows in set (0.01 sec)
>>
>> The count for each row should be 9. What do I need in my count() to be
>> counting the right thing?
>> <<<<<<<<
>> Your trouble lys in the joining; in effect, you are joining a row with 
>> wafer_id 16 with a row with wafer_id 17, and also a row with wafer_id 17 
>> with a row with wafer_id 16.
>>
>> A further advantage to using the now standard form of joining, as Rick James 
>> bids you do, is that one can add further conditions to it:
>>
>> select count(*), target_name_id, ep, avg(bottom), avg(averages)
>> from (select avg(bottom) as averages, target_name_id, ep
>> from data_cst
>> where target_name_id = 44
>> group by target_name_id, ep, wafer_id) x
>> JOIN data_cst
>> ON target_name_id = x.target_name_id and ep = x.ep and wafer_id < x.wafer_id
>> group by target_name_id, ep
>>
>> The inequality, maybe, will give you what you want.
>
> Not, wafer_id < x.wafer_id, but wafer_id = x.wafer_id - adding that
> makes it work the way I want. Thanks!!

So now that I have this working, that have added another requirement.
They also want a count of rows aggregated by a different set of
columns. So I need to add another subquery, but I can't figure out how
to get the row count. In this example query:

mysql> select count(*) from data_cst where target_name_id=208082 and
wafer_id=425845 group by target_name_id,wafer_id,lot_id,data_file_id;
+--+
| count(*) |
+--+
|   12 |
|   12 |
|   12 |
|   12 |
+--+
4 rows in set (0.00 sec)


The result I need is 4 (i.e there were 4 distinct groups of
target_name_id,wafer_id,lot_id,data_file_id). How can I get that?

This will be a subquery, so I can't use mysql_num_rows() or
FOUND_ROWS() after the fact. I need the result returned from the
query.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: help with correlated subquery

2012-08-21 Thread Larry Martell

On Tue, Aug 21, 2012 at 8:07 PM,   wrote:
>>>>> 2012/08/21 16:35 -0600, Larry Martell >>>>
> I am trying to write a query that selects from both a correlated
> subquery and a table in the main query, and I'm having a lot of
> trouble getting the proper row count. I'm sure this is very simple,
> and I'm just missing it. I'll try and present a simple example. For
> this example, there are 27 rows, organized like this:
>
>
> mysql> select count(*), target_name_id, ep, wafer_id from data_cst
> where target_name_id = 44 group by target_name_id, ep, wafer_id;
> +--++--+--+
> | count(*) | target_name_id | ep   | wafer_id |
> +--++--+--+
> |6 | 44 | 1,1  |   16 |
> |3 | 44 | 1,1  |   17 |
> |6 | 44 | 1,2  |   16 |
> |3 | 44 | 1,2  |   17 |
> |6 | 44 | 1,3  |   16 |
> |3 | 44 | 1,3  |   17 |
> +--++--+--+
> 6 rows in set (0.00 sec)
>
> I need to get an average of a column grouped by target_name_id, ep as
> well as the average of the averages grouped by target_name_id, ep,
> wafer_id, and I also need the count of the rows in the target_name_id,
> ep group. My query is getting the correct averages, but incorrect row
> counts:
>
> mysql> select count(*), target_name_id, ep, avg(bottom), avg(averages)
> from (select avg(bottom) as averages, target_name_id as t, ep as e
> from data_cst where target_name_id = 44 group by target_name_id, ep,
> wafer_id) x, data_cst where target_name_id = t and ep = e group by
> target_name_id, ep;
> +--++--+-+-+
> | count(*) | target_name_id | ep   | avg(bottom) | avg(averages)   |
> +--++--+-+-+
> |   18 | 44 | 1,1  |  21.8056667 |  21.8545833 |
> |   18 | 44 | 1,2  | 121.798 | 121.83983335000 |
> |   18 | 44 | 1,3  | 349.763 | 349.75016665000 |
> +--++--+-+-+
> 3 rows in set (0.01 sec)
>
> The count for each row should be 9. What do I need in my count() to be
> counting the right thing?
> <<<<<<<<
> Your trouble lys in the joining; in effect, you are joining a row with 
> wafer_id 16 with a row with wafer_id 17, and also a row with wafer_id 17 with 
> a row with wafer_id 16.
>
> A further advantage to using the now standard form of joining, as Rick James 
> bids you do, is that one can add further conditions to it:
>
> select count(*), target_name_id, ep, avg(bottom), avg(averages)
> from (select avg(bottom) as averages, target_name_id, ep
> from data_cst
> where target_name_id = 44
> group by target_name_id, ep, wafer_id) x
> JOIN data_cst
> ON target_name_id = x.target_name_id and ep = x.ep and wafer_id < x.wafer_id
> group by target_name_id, ep
>
> The inequality, maybe, will give you what you want.

Not, wafer_id < x.wafer_id, but wafer_id = x.wafer_id - adding that
makes it work the way I want. Thanks!!

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

1 2 >

1 - 100 of 126 matches

Mail list logo