[U2] Unidata index and short-circuit evaluation

2013-03-08 Thread Jeffrey Butera
While I'm on a roll...  I often look at how to make queries run faster.  
In short, we index all the commonly used data fields we can and (of 
course) it makes world of difference.  However, I have some questions 
about optimal ways to query data using a mix of indexed data, 
non-indexed data and i-descriptors.


Let's say that in a table FOO I want to do SELECT FOO WITH A='foo' AND 
B='bar' AND C='bang'

where:

A = indexed data field
B = non-index data field
C = I-descriptor (assume it's time-consuming: 2 seconds per record)

Which is the optimal way to attack?

1) I could just go for it with:

SELECT FOO WITH A='foo' AND B='bar' AND C='bang'

2) I could do the following:

SELECT FOO WITH A='foo' AND B='bar'
SELECT FOO WITH A='foo' AND B='bar' AND C='bang' REQUIRE.SELECT

3) I could do the following:

SELECT FOO WITH A='foo'
SELECT FOO WITH A='foo' AND B='bar' AND C='bang' REQUIRE.SELECT


I've done benchmarks, but really curious about the innards of Unidata 
and how/when it does short-circuit evaluation of AND clauses, etc.


My gut tells me that (3) should be good because it first weeds out bad 
records based solely on an indexed data field, thereby reducing the 
number of records that C needs to be evaluated.  Conversely, if it's 
doing a good job with short-circuit evaluation then (3) and (1) 
shouldn't be terribly different because failure of A='foo' would imply 
that C never gets evaluated.



--
Jeffrey Butera, PhD
Associate Director for Applications and Web Services
Information Technology
Hampshire College
413-559-5556

http://www.hampshire.edu
http://www.facebook.com/hampshirecollegeit

___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] Unidata index and short-circuit evaluation

2013-03-08 Thread Wols Lists
On 08/03/13 21:03, Jeffrey Butera wrote:
 While I'm on a roll...  I often look at how to make queries run faster. 
 In short, we index all the commonly used data fields we can and (of
 course) it makes world of difference.  However, I have some questions
 about optimal ways to query data using a mix of indexed data,
 non-indexed data and i-descriptors.
 
 Let's say that in a table FOO I want to do SELECT FOO WITH A='foo' AND
 B='bar' AND C='bang'
 where:
 
 A = indexed data field
 B = non-index data field
 C = I-descriptor (assume it's time-consuming: 2 seconds per record)
 
 Which is the optimal way to attack?
 
 1) I could just go for it with:
 
 SELECT FOO WITH A='foo' AND B='bar' AND C='bang'
 
 2) I could do the following:
 
 SELECT FOO WITH A='foo' AND B='bar'
 SELECT FOO WITH A='foo' AND B='bar' AND C='bang' REQUIRE.SELECT
 
 3) I could do the following:
 
 SELECT FOO WITH A='foo'
 SELECT FOO WITH A='foo' AND B='bar' AND C='bang' REQUIRE.SELECT
 
 
 I've done benchmarks, but really curious about the innards of Unidata
 and how/when it does short-circuit evaluation of AND clauses, etc.
 
 My gut tells me that (3) should be good because it first weeds out bad
 records based solely on an indexed data field, thereby reducing the
 number of records that C needs to be evaluated.  Conversely, if it's
 doing a good job with short-circuit evaluation then (3) and (1)
 shouldn't be terribly different because failure of A='foo' would imply
 that C never gets evaluated.
 
While I don't know UniData, I'm guessing that it's the same as UniVerse
in this sense, and imho the second select of both (2) and (3) is broken
(as in, the first select is a waste of time ...)

I notice you're using REQUIRE.SELECT. So...

SELECT FOO WITH A='foo' ;* will use the index
SELECT FOO WITH B='bar' REQUIRE.SELECT ;* will now find the records with
A equal to foo and B equal to bar
SELECT FOO WITH C='bang' REQUIRE.SELECT ;* now finishes off the select.

Whether you want to combine the second two selects as
SELECT FOO WITH B='bar' AND C='bang' REQUIRE.SELECT
depends on what C does.

If, in order to evaluate C, you need to read the contents of FOO, then
you should combine the two. If UniData reads @RECORD regardless of
whether it's required when evaluating an i-desc, then likewise.

If, however, evaluating C can be done without reading @RECORD, then you
may be better doing two selects.

Whatever happens, there is no point (indeed, it could easily be
positively harmful) in repeating an earlier select. The REQUIRE.SELECT
keyword guarantees that if the previous select fails to find any
records, the subsequent select will also fail rather than starting again
from scratch.

Cheers,
Wol
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] Unidata index and short-circuit evaluation

2013-03-08 Thread Wjhonson
If your file is small enough, and your system idle enough that the file remains 
*in memory* for all possible scenarios below, than you may not notice speed 
issues.

However, the monster in the kitchen, is the number of DISK READS you are doing. 
 If your prior reads get cycled out before they are read again, then you should 
run a single combined select which will do all accesses at the same instant.

 

 

 

-Original Message-
From: Wols Lists antli...@youngman.org.uk
To: u2-users u2-users@listserver.u2ug.org
Sent: Fri, Mar 8, 2013 1:17 pm
Subject: Re: [U2] Unidata index and short-circuit evaluation


On 08/03/13 21:03, Jeffrey Butera wrote:
 While I'm on a roll...  I often look at how to make queries run faster. 
 In short, we index all the commonly used data fields we can and (of
 course) it makes world of difference.  However, I have some questions
 about optimal ways to query data using a mix of indexed data,
 non-indexed data and i-descriptors.
 
 Let's say that in a table FOO I want to do SELECT FOO WITH A='foo' AND
 B='bar' AND C='bang'
 where:
 
 A = indexed data field
 B = non-index data field
 C = I-descriptor (assume it's time-consuming: 2 seconds per record)
 
 Which is the optimal way to attack?
 
 1) I could just go for it with:
 
 SELECT FOO WITH A='foo' AND B='bar' AND C='bang'
 
 2) I could do the following:
 
 SELECT FOO WITH A='foo' AND B='bar'
 SELECT FOO WITH A='foo' AND B='bar' AND C='bang' REQUIRE.SELECT
 
 3) I could do the following:
 
 SELECT FOO WITH A='foo'
 SELECT FOO WITH A='foo' AND B='bar' AND C='bang' REQUIRE.SELECT
 
 
 I've done benchmarks, but really curious about the innards of Unidata
 and how/when it does short-circuit evaluation of AND clauses, etc.
 
 My gut tells me that (3) should be good because it first weeds out bad
 records based solely on an indexed data field, thereby reducing the
 number of records that C needs to be evaluated.  Conversely, if it's
 doing a good job with short-circuit evaluation then (3) and (1)
 shouldn't be terribly different because failure of A='foo' would imply
 that C never gets evaluated.
 
While I don't know UniData, I'm guessing that it's the same as UniVerse
in this sense, and imho the second select of both (2) and (3) is broken
(as in, the first select is a waste of time ...)

I notice you're using REQUIRE.SELECT. So...

SELECT FOO WITH A='foo' ;* will use the index
SELECT FOO WITH B='bar' REQUIRE.SELECT ;* will now find the records with
A equal to foo and B equal to bar
SELECT FOO WITH C='bang' REQUIRE.SELECT ;* now finishes off the select.

Whether you want to combine the second two selects as
SELECT FOO WITH B='bar' AND C='bang' REQUIRE.SELECT
depends on what C does.

If, in order to evaluate C, you need to read the contents of FOO, then
you should combine the two. If UniData reads @RECORD regardless of
whether it's required when evaluating an i-desc, then likewise.

If, however, evaluating C can be done without reading @RECORD, then you
may be better doing two selects.

Whatever happens, there is no point (indeed, it could easily be
positively harmful) in repeating an earlier select. The REQUIRE.SELECT
keyword guarantees that if the previous select fails to find any
records, the subsequent select will also fail rather than starting again
from scratch.

Cheers,
Wol
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users

 
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] Unidata index and short-circuit evaluation

2013-03-08 Thread Rutherford, Marc
Jeffery,

I would say #1 does the trick.  

Any U2 TCL query evaluates from left to right.  When one term eliminates the 
record the query on that record stops, and the query continues to the next 
record.  So #1 is exactly your most effective, leaving the complex i-descriptor 
to evaluate only the smallest sub-set of records.

#2 and #3 don't save you anything with a first pass, only to repeat the same 
select on the second.  

I typically only use multiple passes when a I have complex nested 'and' and 
'or' clauses.  Unidata is somewhat obscure in the way it wants parentheses 
formatted (I am an old PICK hand).  So I will break up the query into separate 
passes that a human can easily understand.  Since I normally start my query 
with an indexed field I don't sweat the overhead of stacked queries.

Marc Rutherford
Principal Programmer Analyst
Advanced Bionics LLC
661) 362 1754

-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Jeffrey Butera
Sent: Friday, March 08, 2013 1:04 PM
To: U2 Users List
Subject: [U2] Unidata index and short-circuit evaluation

While I'm on a roll...  I often look at how to make queries run faster.  
In short, we index all the commonly used data fields we can and (of
course) it makes world of difference.  However, I have some questions about 
optimal ways to query data using a mix of indexed data, non-indexed data and 
i-descriptors.

Let's say that in a table FOO I want to do SELECT FOO WITH A='foo' AND B='bar' 
AND C='bang'
where:

A = indexed data field
B = non-index data field
C = I-descriptor (assume it's time-consuming: 2 seconds per record)

Which is the optimal way to attack?

1) I could just go for it with:

SELECT FOO WITH A='foo' AND B='bar' AND C='bang'

2) I could do the following:

SELECT FOO WITH A='foo' AND B='bar'
SELECT FOO WITH A='foo' AND B='bar' AND C='bang' REQUIRE.SELECT

3) I could do the following:

SELECT FOO WITH A='foo'
SELECT FOO WITH A='foo' AND B='bar' AND C='bang' REQUIRE.SELECT


I've done benchmarks, but really curious about the innards of Unidata and 
how/when it does short-circuit evaluation of AND clauses, etc.

My gut tells me that (3) should be good because it first weeds out bad records 
based solely on an indexed data field, thereby reducing the number of records 
that C needs to be evaluated.  Conversely, if it's doing a good job with 
short-circuit evaluation then (3) and (1) shouldn't be terribly different 
because failure of A='foo' would imply that C never gets evaluated.


--
Jeffrey Butera, PhD
Associate Director for Applications and Web Services Information Technology 
Hampshire College
413-559-5556

http://www.hampshire.edu
http://www.facebook.com/hampshirecollegeit

___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] Unidata index and short-circuit evaluation

2013-03-08 Thread Wols Lists
On 08/03/13 22:07, Wjhonson wrote:
 If your file is small enough, and your system idle enough that the file 
 remains *in memory* for all possible scenarios below, than you may not notice 
 speed issues.
 
 However, the monster in the kitchen, is the number of DISK READS you are 
 doing.  If your prior reads get cycled out before they are read again, then 
 you should run a single combined select which will do all accesses at the 
 same instant.
 
You missed the fact that the first select is based on an index. That
should not go anywhere near the data anyway. So doing it before or at
the same time as the other selects is irrelevant.

But yes. I based my recommendations on minimizing the number of disk
accesses ...

Cheers,
Wol
  
 
  
 
  
 
 -Original Message-
 From: Wols Lists antli...@youngman.org.uk
 To: u2-users u2-users@listserver.u2ug.org
 Sent: Fri, Mar 8, 2013 1:17 pm
 Subject: Re: [U2] Unidata index and short-circuit evaluation
 
 
 On 08/03/13 21:03, Jeffrey Butera wrote:
 While I'm on a roll...  I often look at how to make queries run faster. 
 In short, we index all the commonly used data fields we can and (of
 course) it makes world of difference.  However, I have some questions
 about optimal ways to query data using a mix of indexed data,
 non-indexed data and i-descriptors.

 Let's say that in a table FOO I want to do SELECT FOO WITH A='foo' AND
 B='bar' AND C='bang'
 where:

 A = indexed data field
 B = non-index data field
 C = I-descriptor (assume it's time-consuming: 2 seconds per record)

 Which is the optimal way to attack?

 1) I could just go for it with:

 SELECT FOO WITH A='foo' AND B='bar' AND C='bang'

 2) I could do the following:

 SELECT FOO WITH A='foo' AND B='bar'
 SELECT FOO WITH A='foo' AND B='bar' AND C='bang' REQUIRE.SELECT

 3) I could do the following:

 SELECT FOO WITH A='foo'
 SELECT FOO WITH A='foo' AND B='bar' AND C='bang' REQUIRE.SELECT


 I've done benchmarks, but really curious about the innards of Unidata
 and how/when it does short-circuit evaluation of AND clauses, etc.

 My gut tells me that (3) should be good because it first weeds out bad
 records based solely on an indexed data field, thereby reducing the
 number of records that C needs to be evaluated.  Conversely, if it's
 doing a good job with short-circuit evaluation then (3) and (1)
 shouldn't be terribly different because failure of A='foo' would imply
 that C never gets evaluated.

 While I don't know UniData, I'm guessing that it's the same as UniVerse
 in this sense, and imho the second select of both (2) and (3) is broken
 (as in, the first select is a waste of time ...)
 
 I notice you're using REQUIRE.SELECT. So...
 
 SELECT FOO WITH A='foo'   ;* will use the index
 SELECT FOO WITH B='bar' REQUIRE.SELECT ;* will now find the records with
 A equal to foo and B equal to bar
 SELECT FOO WITH C='bang' REQUIRE.SELECT ;* now finishes off the select.
 
 Whether you want to combine the second two selects as
 SELECT FOO WITH B='bar' AND C='bang' REQUIRE.SELECT
 depends on what C does.
 
 If, in order to evaluate C, you need to read the contents of FOO, then
 you should combine the two. If UniData reads @RECORD regardless of
 whether it's required when evaluating an i-desc, then likewise.
 
 If, however, evaluating C can be done without reading @RECORD, then you
 may be better doing two selects.
 
 Whatever happens, there is no point (indeed, it could easily be
 positively harmful) in repeating an earlier select. The REQUIRE.SELECT
 keyword guarantees that if the previous select fails to find any
 records, the subsequent select will also fail rather than starting again
 from scratch.
 
 Cheers,
 Wol
 ___
 U2-Users mailing list
 U2-Users@listserver.u2ug.org
 http://listserver.u2ug.org/mailman/listinfo/u2-users
 
  
 ___
 U2-Users mailing list
 U2-Users@listserver.u2ug.org
 http://listserver.u2ug.org/mailman/listinfo/u2-users
 

___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] Unidata index and short-circuit evaluation

2013-03-08 Thread Wjhonson
I didn't miss it.
The point of the request, was from the beginning to the ending.
Of course the first *portion* will be quick and use few disk reads.
I was discussing the full example.

 

 

 

-Original Message-
From: Wols Lists antli...@youngman.org.uk
To: u2-users u2-users@listserver.u2ug.org
Sent: Fri, Mar 8, 2013 2:43 pm
Subject: Re: [U2] Unidata index and short-circuit evaluation


On 08/03/13 22:07, Wjhonson wrote:
 If your file is small enough, and your system idle enough that the file 
remains *in memory* for all possible scenarios below, than you may not notice 
speed issues.
 
 However, the monster in the kitchen, is the number of DISK READS you are 
doing.  If your prior reads get cycled out before they are read again, then you 
should run a single combined select which will do all accesses at the same 
instant.
 
You missed the fact that the first select is based on an index. That
should not go anywhere near the data anyway. So doing it before or at
the same time as the other selects is irrelevant.

But yes. I based my recommendations on minimizing the number of disk
accesses ...

Cheers,
Wol
  
 
  
 
  
 
 -Original Message-
 From: Wols Lists antli...@youngman.org.uk
 To: u2-users u2-users@listserver.u2ug.org
 Sent: Fri, Mar 8, 2013 1:17 pm
 Subject: Re: [U2] Unidata index and short-circuit evaluation
 
 
 On 08/03/13 21:03, Jeffrey Butera wrote:
 While I'm on a roll...  I often look at how to make queries run faster. 
 In short, we index all the commonly used data fields we can and (of
 course) it makes world of difference.  However, I have some questions
 about optimal ways to query data using a mix of indexed data,
 non-indexed data and i-descriptors.

 Let's say that in a table FOO I want to do SELECT FOO WITH A='foo' AND
 B='bar' AND C='bang'
 where:

 A = indexed data field
 B = non-index data field
 C = I-descriptor (assume it's time-consuming: 2 seconds per record)

 Which is the optimal way to attack?

 1) I could just go for it with:

 SELECT FOO WITH A='foo' AND B='bar' AND C='bang'

 2) I could do the following:

 SELECT FOO WITH A='foo' AND B='bar'
 SELECT FOO WITH A='foo' AND B='bar' AND C='bang' REQUIRE.SELECT

 3) I could do the following:

 SELECT FOO WITH A='foo'
 SELECT FOO WITH A='foo' AND B='bar' AND C='bang' REQUIRE.SELECT


 I've done benchmarks, but really curious about the innards of Unidata
 and how/when it does short-circuit evaluation of AND clauses, etc.

 My gut tells me that (3) should be good because it first weeds out bad
 records based solely on an indexed data field, thereby reducing the
 number of records that C needs to be evaluated.  Conversely, if it's
 doing a good job with short-circuit evaluation then (3) and (1)
 shouldn't be terribly different because failure of A='foo' would imply
 that C never gets evaluated.

 While I don't know UniData, I'm guessing that it's the same as UniVerse
 in this sense, and imho the second select of both (2) and (3) is broken
 (as in, the first select is a waste of time ...)
 
 I notice you're using REQUIRE.SELECT. So...
 
 SELECT FOO WITH A='foo'   ;* will use the index
 SELECT FOO WITH B='bar' REQUIRE.SELECT ;* will now find the records with
 A equal to foo and B equal to bar
 SELECT FOO WITH C='bang' REQUIRE.SELECT ;* now finishes off the select.
 
 Whether you want to combine the second two selects as
 SELECT FOO WITH B='bar' AND C='bang' REQUIRE.SELECT
 depends on what C does.
 
 If, in order to evaluate C, you need to read the contents of FOO, then
 you should combine the two. If UniData reads @RECORD regardless of
 whether it's required when evaluating an i-desc, then likewise.
 
 If, however, evaluating C can be done without reading @RECORD, then you
 may be better doing two selects.
 
 Whatever happens, there is no point (indeed, it could easily be
 positively harmful) in repeating an earlier select. The REQUIRE.SELECT
 keyword guarantees that if the previous select fails to find any
 records, the subsequent select will also fail rather than starting again
 from scratch.
 
 Cheers,
 Wol
 ___
 U2-Users mailing list
 U2-Users@listserver.u2ug.org
 http://listserver.u2ug.org/mailman/listinfo/u2-users
 
  
 ___
 U2-Users mailing list
 U2-Users@listserver.u2ug.org
 http://listserver.u2ug.org/mailman/listinfo/u2-users
 

___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users

 
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] Unidata index and short-circuit evaluation

2013-03-08 Thread Wjhonson
To be more clear, although there may be some slight differences, if all the 
pieces are all in memory for the duration of the select or selects, those 
differences in timing will be so tiny, that it won't make any difference in 
real life.
 
The most critical improvement in speed, is simply adding more memory.  Then go 
home.

 

 

-Original Message-
From: Wjhonson wjhon...@aol.com
To: u2-users u2-users@listserver.u2ug.org
Sent: Fri, Mar 8, 2013 2:48 pm
Subject: Re: [U2] Unidata index and short-circuit evaluation


I didn't miss it.
The point of the request, was from the beginning to the ending.
Of course the first *portion* will be quick and use few disk reads.
I was discussing the full example.

 

 

 

-Original Message-
From: Wols Lists antli...@youngman.org.uk
To: u2-users u2-users@listserver.u2ug.org
Sent: Fri, Mar 8, 2013 2:43 pm
Subject: Re: [U2] Unidata index and short-circuit evaluation


On 08/03/13 22:07, Wjhonson wrote:
 If your file is small enough, and your system idle enough that the file 
remains *in memory* for all possible scenarios below, than you may not notice 
speed issues.
 
 However, the monster in the kitchen, is the number of DISK READS you are 
doing.  If your prior reads get cycled out before they are read again, then you 
should run a single combined select which will do all accesses at the same 
instant.
 
You missed the fact that the first select is based on an index. That
should not go anywhere near the data anyway. So doing it before or at
the same time as the other selects is irrelevant.

But yes. I based my recommendations on minimizing the number of disk
accesses ...

Cheers,
Wol
  
 
  
 
  
 
 -Original Message-
 From: Wols Lists antli...@youngman.org.uk
 To: u2-users u2-users@listserver.u2ug.org
 Sent: Fri, Mar 8, 2013 1:17 pm
 Subject: Re: [U2] Unidata index and short-circuit evaluation
 
 
 On 08/03/13 21:03, Jeffrey Butera wrote:
 While I'm on a roll...  I often look at how to make queries run faster. 
 In short, we index all the commonly used data fields we can and (of
 course) it makes world of difference.  However, I have some questions
 about optimal ways to query data using a mix of indexed data,
 non-indexed data and i-descriptors.

 Let's say that in a table FOO I want to do SELECT FOO WITH A='foo' AND
 B='bar' AND C='bang'
 where:

 A = indexed data field
 B = non-index data field
 C = I-descriptor (assume it's time-consuming: 2 seconds per record)

 Which is the optimal way to attack?

 1) I could just go for it with:

 SELECT FOO WITH A='foo' AND B='bar' AND C='bang'

 2) I could do the following:

 SELECT FOO WITH A='foo' AND B='bar'
 SELECT FOO WITH A='foo' AND B='bar' AND C='bang' REQUIRE.SELECT

 3) I could do the following:

 SELECT FOO WITH A='foo'
 SELECT FOO WITH A='foo' AND B='bar' AND C='bang' REQUIRE.SELECT


 I've done benchmarks, but really curious about the innards of Unidata
 and how/when it does short-circuit evaluation of AND clauses, etc.

 My gut tells me that (3) should be good because it first weeds out bad
 records based solely on an indexed data field, thereby reducing the
 number of records that C needs to be evaluated.  Conversely, if it's
 doing a good job with short-circuit evaluation then (3) and (1)
 shouldn't be terribly different because failure of A='foo' would imply
 that C never gets evaluated.

 While I don't know UniData, I'm guessing that it's the same as UniVerse
 in this sense, and imho the second select of both (2) and (3) is broken
 (as in, the first select is a waste of time ...)
 
 I notice you're using REQUIRE.SELECT. So...
 
 SELECT FOO WITH A='foo'   ;* will use the index
 SELECT FOO WITH B='bar' REQUIRE.SELECT ;* will now find the records with
 A equal to foo and B equal to bar
 SELECT FOO WITH C='bang' REQUIRE.SELECT ;* now finishes off the select.
 
 Whether you want to combine the second two selects as
 SELECT FOO WITH B='bar' AND C='bang' REQUIRE.SELECT
 depends on what C does.
 
 If, in order to evaluate C, you need to read the contents of FOO, then
 you should combine the two. If UniData reads @RECORD regardless of
 whether it's required when evaluating an i-desc, then likewise.
 
 If, however, evaluating C can be done without reading @RECORD, then you
 may be better doing two selects.
 
 Whatever happens, there is no point (indeed, it could easily be
 positively harmful) in repeating an earlier select. The REQUIRE.SELECT
 keyword guarantees that if the previous select fails to find any
 records, the subsequent select will also fail rather than starting again
 from scratch.
 
 Cheers,
 Wol
 ___
 U2-Users mailing list
 U2-Users@listserver.u2ug.org
 http://listserver.u2ug.org/mailman/listinfo/u2-users
 
  
 ___
 U2-Users mailing list
 U2-Users@listserver.u2ug.org
 http://listserver.u2ug.org/mailman/listinfo/u2-users
 

___
U2-Users mailing list

Re: [U2] Unidata index and short-circuit evaluation

2013-03-08 Thread Tony Gravagno
Side comment:

I understand what Wil is saying and I think he has a valid point. But
I believe the value of the point is now insignificant. The tiny bit of
contention that Will brings up here is about just how much disk access
is done by any given process. Eliminate disk reads and the process
speeds up - or so it used to be.

I confess that over the last decade my certainty about such matters
has continued to dwindle, as well as my concern. It used to be easy
when there was a single fixed drum with some number of platters and
heads, sectors were 512 bytes, rotation rate was 5400rpm, and you
could measure memory + L2 cache and get some idea of where your data
was and what the latency to data was going to be like.

These days sites are running many different kinds of drives. SSD (HHD
and now SSHD) drives now have as much capacity as disk-based drives,
making disk almost as fast as cache (depending on what kind) - and
the cost has come down to nearly the same price as hard drives. These
tiny disks spinning in a box are almost as obsolete as 9-track tape.
Now couple that with virtualization where you have no idea which part
of the virtual machine is in cache. Couple that with RAID where
multiple drives and controllers add some latency, but also reduce some
disk hits with striping. That's just in the hardware, and I didn't
even mention caching controllers.

When working with traditional MV blobs we can also map processes to
memory, allocate more frames to processes to eliminate hits to
overflow and file space, and thus reduce the number actual file reads.
With U2 and other systems the OS caches for us, whether on disk swap
space or in memory.

So what IS a disk read anymore? A read from disk is completely
unrelated to actual disk activity these days. As I said above, we're
not even really talking about disk anymore. Sure, at a higher level
we just want to reduce the number of READ statements, regardless of
where the data comes from in the universe (oh, I didn't mention
virtualized data in the cloud either...) but these days, a READ
statement is more like a virtual read, just telling the system to
get the data from wherever it is now - it's not a directive to go to
disk.

I've lost touch with all of the places where data can be. But I also
realized a while back that it's futile to beat my head against a wall
trying to chase disk reads around for better performance this week,
because I'm never really going to have a good answer, and it's just
going to change next week anyway.

Well, that's how I see it. YMMV
Comments?
T 

 From: Wjhonson 
 I didn't miss it.
 The point of the request, was from the beginning to the ending.
 Of course the first *portion* will be quick and use few disk reads.
 I was discussing the full example.
 

 From: Wol

 Wjhonson wrote:
  If your file is small enough, and your system idle enough that the
file
  remains *in memory* for all possible scenarios below, than you may
  not notice speed issues.
 
  However, the monster in the kitchen, is the number of DISK READS
you
  are doing.  If your prior reads get cycled out before they are
read again,
  then you should run a single combined select which will do all
accesses
  at the same instant.


 You missed the fact that the first select is based on an index. That
 should not go anywhere near the data anyway. So doing it before or
at
 the same time as the other selects is irrelevant.
 
 But yes. I based my recommendations on minimizing the number of
 disk accesses ...


___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users