RE: [U2] [UD] Indexing 'Best Practices'
From: David Wolverton Would we be better off to create a Virtual field definition to index that carries the 'real' code into the index, unless it is COMPLETE, in which case we would make the virtual field pass a NULL and then index this new virtual field using NoNulls? Assuming COMPLETE can't be changed to null in the app, yes. One consideration you haven't discussed is the amount of impact on the existing app, and this strategy would minimize that impact. I wouldn't go writing a subroutine to do the calculation, but with a simple IF the overhead should be minimal. But it does not appear to me that the 'size' of the index created has significant operational cost ... The size of the index - that is, number of records indexed - does in fact have an operational cost. Minimize the size of the index - particularly if a large percent of the records don't have to be in there, and performance will positively reflect it. If you're only dropping a small percentage of the records from the index with NO.NULLS you might not notice a difference, but drop a large percentage and I bet you'd notice the difference. possibly not as much the expense of doing all those Virtual Field computations along the way If it's a subroutine, then I would expect the computations to have some expense. A simple IF should have minimal impact. Plus, if we ever DO need it, COMPLETE is indexed for selection. If COMPLETE is needed, you might consider a separate indexed field that uses COMPLETE with some other criteria that may be useful for reporting. At least then you're starting a new index tree independent from the other one. Not sure if this qualifies as best practice, but that would be the direction I would lean. -Kevin [EMAIL PROTECTED] http://www.PrecisOnline.com ** Check out scheduled Connect! training courses at http://www.PrecisOnline.com/train.html. --- u2-users mailing list u2-users@listserver.u2ug.org To unsubscribe please visit http://listserver.u2ug.org/
Re: [U2] [UD] Indexing 'Best Practices'
I would go for the simplest solution to operate and maintain, almost to hell with machine efficiency. You (or anyone else trying to follow you) are probably the most critical and expensive resource. - Original Message - From: David Wolverton [EMAIL PROTECTED] To: u2-users@listserver.u2ug.org Sent: Wednesday, August 02, 2006 1:59 PM Subject: [U2] [UD] Indexing 'Best Practices' I have a scenario to describe and would like comments as to what others see as a 'Best Practice' for Indexing. We're on UniData, currentish Windows version, and have indexes in place on files that are starting to grow at larger client sites and are heavily modified. Here is a common index issue we face on some files, and I'd like thoughts and comments... A file contains about 500,000 records, growing 100,000 records a year, (but I need to think about 5,000,000 growing 1,000,000 a year as well) and we have indexed the record 'status' so that we can quickly grab the records that are in varying stages of completion. This status field occupies a 'real' attribute and contains entries like OK, PEND, MAIL, WAIT, ENTRY, and so on. As well, every record at some point and time is finally COMPLETE. These statuses change ofen - so a given record may travel through 5 to 15 status changes before it becomes COMPLETE. So we have tens, and hundreds of records (and have to think about thousands) in the different status 'buckets' and can grab those instantly via a SELECT - and that is great and works like a champ -- except that we are also carrying an index for COMPLETE which is just enormous (and essentially useless) - every record in the file except for the currently active is stamped COMPLETE. We would never have a reason to 'look' at all complete records via this index, as to do so would be almost the entire file. So if we did not have this index, I don't see that it would be a problem. SO... The choice: Considering Time/Resources/Better Living... Would we be better off to create a Virtual field definition to index that carries the 'real' code into the index, unless it is COMPLETE, in which case we would make the virtual field pass a NULL and then index this new virtual field using NoNulls? Doing so would remove the COMPLETEs from the index - Would the computation and resources be better served by getting rid of the index for COMPLETE? Or would be time spent 'deriving' the Indexes be more 'expensive' in the long run considering these statuses change a lot, which means the computation would have to be run each time with the only 'net' saving is the write to the COMPLETE index? From my view, the 'cost' to remove COMPLETE via a Virtual Field would only save a disk operation in the life of the record - that is, the actual 'write' of the COMPLETE index record. That and some space on the disk. But it does not appear to me that the 'size' of the index created has significant operational cost - possibly not as much the expense of doing all those Virtual Field computations along the way . The way B+Tree works (according to WikiPedia!), the system would just keep adding layers, and an insert that triggers a B+Tree 'split' would cause at worst 3 or so extra disk writes it appears - not a complete cascade. Plus, if we ever DO need it, COMPLETE is indexed for selection. Or are disk writes evil incarnate and the Virtual Field removal of that operations is good riddance? I see arguments both ways -- Thoughts? I'm sure others have a similar issue, and I'm wondering what has been found to be the 'best practice' here. David Wolverton --- u2-users mailing list u2-users@listserver.u2ug.org To unsubscribe please visit http://listserver.u2ug.org/ --- u2-users mailing list u2-users@listserver.u2ug.org To unsubscribe please visit http://listserver.u2ug.org/
RE: [U2] [UD] Indexing 'Best Practices'
OK - sold! Virtual field to move COMPLETE to NULL is the answer... As I was driving back to the hotel after doing that post last night, I got to thinking that the Virtual Field computation would strictly be CPU/Memory, which means I can do A LOT of that for the cost of a single disk access. And with the B+Tree design, we are doomed to at least TWO disk reads in every case (unless you happen to hit the Root record!) - the first to the 'lead' block, and then a read either 'left' or 'right' - and fresh from sleep, I can see that as the index grew, that could turn into 3, 4, or 50 traversals (aka Disk reads) left or right to position the key on write. The leaf size is 4K - PERIOD. So every 100 or so records in a leaf, there would be some sort of shift of level push that would involve even more write traffic. More than likely I'll create a new dictionary that is a Virtual Field, and index that NO.NULLS and change the routines that need expedited selects that we were using the index for anyway - any report that would allow COMPLETE as an option will probably have to go through the whole file, so we may have to create 'versions' of the report - a faster Select Only From non-COMPLETE Version that can use Indexes, and a slow, but inclusive Can Include COMPLETE version which will use the unindexed attribute. Thanks for all who responded on and off list! If anyone else knows magic that goes on behind the scenes on indexes, or has recommendations, please continue to chime in! --- u2-users mailing list u2-users@listserver.u2ug.org To unsubscribe please visit http://listserver.u2ug.org/
Re: [U2] [UD] Indexing 'Best Practices'
At 20:59 -0500 2006/08/01, David Wolverton wrote: So we have tens, and hundreds of records (and have to think about thousands) in the different status 'buckets' and can grab those instantly via a SELECT - and that is great and works like a champ -- except that we are also carrying an index for COMPLETE which is just enormous (and essentially useless) - every record in the file except for the currently active is stamped COMPLETE. We would never have a reason to 'look' at all complete records via this index, as to do so would be almost the entire file. So if we did not have this index, I don't see that it would be a problem. How often is the information in the COMPLETE records needed? Will a COMPLETE record ever be changed to another status? How often will you need to mix data from COMPLETE records and records with other statuses? If a COMPLETE record is essentially useless (except for historical purposes), I'd consider moving records out of POTENTIALLY_HUGE_FILE and into POTENTIALLY_HUGE_FILE_ARCHIVE as soon as they're COMPLETE. That solves the index problem and keeps the working file smaller and/or less full. Ray -- .=. | =-=-=-=-=-=-= Eagle Rock Information Systems Corp =-=-=-=-=-=-= | | -=-=-=-=-=-=- web and database business solutions -=-=-=-=-=-=- | | http://www.eriscorp.commailto:[EMAIL PROTECTED] | |Midwest Regional Office: 815-547-0662 (voice) 815-547-0353 (Fax)| .=. --- u2-users mailing list u2-users@listserver.u2ug.org To unsubscribe please visit http://listserver.u2ug.org/
Re: [U2] [UD] Indexing 'Best Practices'
Kate Stanton [EMAIL PROTECTED] wrote on 2 Aug 2006 18:11:11 +1200 I would go for the simplest solution to operate and maintain, almost to hell with machine efficiency. You (or anyone else trying to follow you) are probably the most critical and expensive resource. I must disagree with your reasoning. System response time is critical in a customer service situation. If a customer has to wait a noticeable time because the programmer didn't make an effort to provide efficient code, it's a waste of the customer's time - and I would suggest that a customer's time is a company's most critical resource, and wasting it is your most expensive mistake. --Tom Pellitieri Century Equipment Toledo, Ohio --- u2-users mailing list u2-users@listserver.u2ug.org To unsubscribe please visit http://listserver.u2ug.org/
RE: [U2] [UD] Indexing 'Best Practices'
Hi David, Don't know if this is possible or not but couldn't you write your COMPLETE records to a .HIST file, keeping the rest of the data in a .CURR file. Then have a PART file definition that binds the two parts into what is the current filename? You could then index and select off the .CURR file and be able to process off either the PART filename or the .CURR filename. I'm not real familiar with PART files but it seems like it should work. I'm not even sure if PART files are available in UD but thought the suggestion was worth mentioning. Good luck, Bob Woodward Senior Programmer/Analyst Harbor Wholesale Grocery -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of David Wolverton Sent: Wednesday, August 02, 2006 6:53 AM To: u2-users@listserver.u2ug.org Subject: RE: [U2] [UD] Indexing 'Best Practices' OK - sold! Virtual field to move COMPLETE to NULL is the answer... As I was driving back to the hotel after doing that post last night, I got to thinking that the Virtual Field computation would strictly be CPU/Memory, which means I can do A LOT of that for the cost of a single disk access. And with the B+Tree design, we are doomed to at least TWO disk reads in every case (unless you happen to hit the Root record!) - the first to the 'lead' block, and then a read either 'left' or 'right' - and fresh from sleep, I can see that as the index grew, that could turn into 3, 4, or 50 traversals (aka Disk reads) left or right to position the key on write. The leaf size is 4K - PERIOD. So every 100 or so records in a leaf, there would be some sort of shift of level push that would involve even more write traffic. More than likely I'll create a new dictionary that is a Virtual Field, and index that NO.NULLS and change the routines that need expedited selects that we were using the index for anyway - any report that would allow COMPLETE as an option will probably have to go through the whole file, so we may have to create 'versions' of the report - a faster Select Only From non-COMPLETE Version that can use Indexes, and a slow, but inclusive Can Include COMPLETE version which will use the unindexed attribute. Thanks for all who responded on and off list! If anyone else knows magic that goes on behind the scenes on indexes, or has recommendations, please continue to chime in! --- u2-users mailing list u2-users@listserver.u2ug.org To unsubscribe please visit http://listserver.u2ug.org/ --- u2-users mailing list u2-users@listserver.u2ug.org To unsubscribe please visit http://listserver.u2ug.org/
RE: [U2] [UD] Indexing 'Best Practices'
UniData does not have the 'part' files like UniVerse - it would be cool - but not a choice for us without 'doing it manually'. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bob Woodward Sent: Wednesday, August 02, 2006 11:11 AM To: u2-users@listserver.u2ug.org Subject: RE: [U2] [UD] Indexing 'Best Practices' Hi David, Don't know if this is possible or not but couldn't you write your COMPLETE records to a .HIST file, keeping the rest of the data in a .CURR file. Then have a PART file definition that binds the two parts into what is the current filename? snip --- u2-users mailing list u2-users@listserver.u2ug.org To unsubscribe please visit http://listserver.u2ug.org/