Thanks for spending your time :)
Indexing is something that needs to be well thought out.
Please do not tell me that. In current jBASE implementation (4.1.5.x) it should be very well thought out. Some programs (not always "our" in terms of source code) create and use temporary indexes. We have no control over that. Some indexes seem to be little "funny".
2) An index that reflects a large subset of the primary keys may not be the index you want. Perhaps you want the inverse index? For instance, say that out of 5 million records, there are 4,829,345 where the field BADDEBT is equal to "Y" and the rest are equal to "N". If you create an index on BADDEBT, then will you ever want the 4,829,345 or do you just want the ones with "N"? Is the N on its own important? And so on. Design an index that can pare down the result set as much as possible in one go.
I have learned that and we have already removed many indexes. I also learned that update time may be dramatically high when index is set.
3) Split your processing into small chunks. To be honest, why anyone with large batch jobs is not using some form of message queuing I will never understand. Especially as for fairly simple needs, all you need do is make a CALLJ to place an ITEM ID on a freeware implementation of JMS (Java Message Services). If you are a bank, you probably already have MQ Series (WebSphere MQ) or Rendezvous anyway.
Yes, we do have MQ Series and it is broadly used, but only for communication purposes - never by batch jobs. I think that I did not understand you. Would not you problem of transaction boundary? MQ integration with jBASE is not totally clear for me in terms of transactionality.
Say: I have jBASIC transactional software that gets information from the (MQ) queue. If something bad happens (I mean true disaster like unexpected power loss) I can not recover fully, because information was already taken from the queue. Am I right?
I think I already owe you a beer so send me please your postal adress and expect delivery in next year :)
Kind regards
Pawel
Dnia 15-12-2008 o godz. 18:39 Jim Idle napisa�(a):
On Sun, 2008-12-14 at 20:18 +0100, Pawel (privately) wrote:Hi, Can anyone explain to me following thing in jBASE 4.1.5.XX / AIX 5.3.0.0: a) CREATE-INDEX <large file> <index name> BY 2 b) SELECT <large file> SAVING UNIQUE <field> are crashing (segmentation violation) on large files as long as "export MALLOCTYPE=buckets" is not set. Commands work for some time, but they can not complete their work. There are no limits on processes.We did not solve this "problem" by ourselves, but Temenos advised us to set above .profile parameter. According to my knowledge (http://publib.boulder.ibm.com/infocenter/tivihelp/v2r1/index.jsp?topic=/com.ibm.IBMDS.doc/tuning07.htm) is AIX tuning parameter. So why it miraclously helps? Nothing else is changed, so perhaps this should be treated as jBASE bug? What do you think? Any light on the root cause highly welcome. I know C, but did not compare yet number of mallocs done by test jBASIC program. I just wonder wheter this parameter increases or decreases jBASIC performance.Kind regards and Merry Chrismtas to members
Alright, you asked.... ;-)
A number of malloc strategies are available in AIX 5.3, which are an attempt to solve the issue ('discovered' by many an application) of memory churning. As memory is claimed from the process space, the system can either claim memory address space from the 'top down' or bottom up. It can also implement various strategies to do with how much to allocate at one go (as in, even though you ask for a block that is 32 bytes, it will allocate 64 bytes, or 128 bytes and so on.
The reasons for this are generally because when you realloc() or free() and {m}alloc (which is what realloc() will do in the absence of any room to grow), you end up creating 'holes' in the free space chain, which if you are lucky you can reuse (the program asks for another piece of memory that can be satisfied from the free chain), and if you are unlucky, you cannot (you keep asking for a piece of memory that is too big for any piece in the free-chain - usually because you are growing some list or string, such as a SELECT list.
If you are growing a list and happen not to be freeing say 1K, then asking for 2K, then freeing 2K and asking for 4K and so on then you will end up with all those freed pieces occupying memory that your application cannot use. Unless (and even despite the fact that) you are running a 64 bit memory model, you will eventually run out of memory, even though there is plenty of memory 'free' - the free space chain has plenty of space to give out, but there is not enough contiguous space to satisfy your memory allocation request.
The MALLOCTYPE=buckets setting in AIX is one way to solve this issue - basically you are telling AIX that the processes running with this set are likely to exhibit this contiguous growth. In that cases it uses a different malloc strategy than normal. IIRC it allocates memory from the top of the available address space downwards, rather than the typical sbrk() calls to extend the memory address space upwards. It also buckets the allocations according to their size to prevent small allocation requests from screwing the ability to realloc() contiguous blocks efficiently.
You can see the effect (simplistically) like this:
alloc 1024:
+---1024---+
alloc 32
+---1024---+--32--+
Now realloc that 1024 to 1M, but you cannot reuse the 1024 because the 32 is in the way. That means that the 1024 goes on the free space chain and the 32 stays where it is, then the 1M is next:
+---1024free--+--32--+---1M--+
And this keeps happening.
So, if you put smaller requests in one bucket and larger requests on their own, or in a different bucket, then you find that rather than adding the 1024 to the free chain, you can just extend it (which also means you don't have to copy the existing data to newly realloc()ed space).
The other (complementary) effect is that rather than allocate small addresses and increase them, you allocate addresses at the high end and decrease them. For various reasons (you can walk it through a s a mind experiment as I am sick of typing +--- ;-), you are statistically more likely to be able to extend a pre-existing contiguous block.
Now, the reasons this helps your jBASE process may be obvious by now, but in case it is not, you should see that the SELECTs (and other programs) are building up a long list of things, which are causing the malloc system to be unable to satisfy a realloc request, even thoguh there will be plenty of space on the free space chain. So turning on IBMs alternative malloc strategy solves this issue nicely as that is why IBM implemented it.
However, it points out a number of things:
1) There is code in the jQL engine, or the list processing engine, or perhaps indexes, which is not checking that an allocation request is successful. This is why you are getting the segmentation violations. TEMENOS should fix this because while UNIX is good at detecting it, it is still dangerous and leaves you with an uneasy feeling, even though <cough>everyone is using transaction boundaries for updates of course </cough>. You should be making a request that they fix the seg faults, not just mask the issue.
2) The lists should not be being dealt with like this internally, they should be chunked. This would mean that the standard memory allocation schemes would be happy anyway.
However it also tells me that your application strategy is probably flawed, or at best inefficient. This could be just the way you are forced to interact with T24/jBASE, but within the following the following pieces of advice, you may find some help:
1) Indexing is something that needs to be well thought out.
If you are creating an index, the keys to which will yield all of the primary keys, then maybe it isn't so useful an index. It can be useful for sorting, assuming it gives you the collation sequence you want of course, but usually there is no need to loop process a file on a sort sequence that is not natural key order. It is the results that you want ordered in some way.
2) An index that reflects a large subset of the primary keys may not be the index you want. Perhaps you want the inverse index? For instance, say that out of 5 million records, there are 4,829,345 where the field BADDEBT is equal to "Y" and the rest are equal to "N". If you create an index on BADDEBT, then will you ever want the 4,829,345 or do you just want the ones with "N"? Is the N on its own important? And so on. Design an index that can pare down the result set as much as possible in one go.
3) Split your processing into small chunks. To be honest, why anyone with large batch jobs is not using some form of message queuing I will never understand. Especially as for fairly simple needs, all you need do is make a CALLJ to place an ITEM ID on a freeware implementation of JMS (Java Message Services). If you are a bank, you probably already have MQ Series (WebSphere MQ) or Rendezvous anyway.
4) Doctor, doctor, it hurts whenever I keep doing this. Don't keep doing that;
5) Doctor, doctor, I broke my arm in 3 places. Don't go to those 3 places again;
jim
----------------------------------------------------
Potrzebujesz pomocy szkolnej?
Streszczenia lektur, gotowe opracowania,
recenzje film�w i ksi��ek, Kliknij:
http://klik.wp.pl/?adr=http://corto.www.wp.pl/as/naleniucha.html&sid=584
--~--~---------~--~----~------------~-------~--~----~
Please read the posting guidelines at: http://groups.google.com/group/jBASE/web/Posting%20Guidelines
IMPORTANT: Type T24: at the start of the subject line for questions specific to Globus/T24
To post, send email to [email protected]
To unsubscribe, send email to [email protected]
For more options, visit this group at http://groups.google.com/group/jBASE?hl=en
-~----------~----~----~----~------~----~------~--~---
