|
Hi,
Dnia 23-03-2009 o godz. 0:57 Jim Idle napisał(a):
Pawel (privately) wrote:Hi, I have reviewed all distribution routines and they are really simple. I checked data distribution for most frequently used files and simple algorithms are fine. Data volumes in some days are "visibly" higher than in the other, but this can be reflected by right sizing of partfiles.
Such distribution brings extra benefit to us - we can easily find information for selected day avoiding querying large table.
The main difference is that jBASE runs distribution routine for these "full scan" selects and I can not understand why does it need to do it? It is not needed to trigger distribution routine :) in full scan selects on distributed files. I know that SELECT is written in a way it is written (SELECT / READNEXT on distributed file + READ), but it could be written as SELECT / READNEXT on partfiles + READ without call to distribution routine (it should combine output select list too).
I guess that SELECT / READNEXT operations of jEDI driver implemented for distributed files are virtually handling distribution (so SELECT program is not aware of partfiles), Are you referring to this one variable JBASE_DISTRIB_FASTSCAN=1? I do not know what it is, but it is set in our .profile :) I did not check wheter it influences positively or negatively on performance.
Can somebody explain me please what is FastScan for distributed files?
I still do not understand :( Why distribution routine needs to be invoked if you run SELECT against distributed file? I know that SELECT is written in such a way, but it could be written in other. Instead of using of SELECT / READNEXT on distributed file it could:
* check names and numbers of partfiles
* query each partfile (in sequence or in parallel) - no dist routine called
* combine and return output select list
Then distribution subroutine would not be called! We can do our own (workaround) SELECT wrapper to behave in such way, but I think that this could be optimization for jBASE team not us.
A select will read the ID, then it will read the record - non-distributed files use some neat tricks for bulk reads, whereas distributed files probably cannot. Have you considered SELECTing each part file individually, then merging the results? I doubt that the distributed files can be generically optimized, but by changing the key on read and write (computationally of course) you can probably get much better performance. I did not know that key can be changed "on the fly". It think that it has only limited usage. Am I wrong? Of course, once you can use my new file system, you won't need distributed files and won't have this problem :-) Looking for beta-testers? :) We can try :) Kind regardsPawel ---------------------------------------------------- Aukcja Charytatywna Sztuki Współczesnej! Licytuj dzieła współczesnych artystów i pomóż dzieciom chorym na autyzm. Kliknij: http://klik.wp.pl/?adr=http://corto.www.wp.pl/as/innespojrzenie0309.html&sid=670 --~--~---------~--~----~------------~-------~--~----~ Please read the posting guidelines at: http://groups.google.com/group/jBASE/web/Posting%20Guidelines IMPORTANT: Type T24: at the start of the subject line for questions specific to Globus/T24
To post, send email to [email protected]
|
- jBASE unefficient? - distributed files Pawel (privately)
- Re: jBASE unefficient? - distributed files Jim Idle
- Re: jBASE unefficient? - distributed files CLIF
- Re: jBASE unefficient? - distributed files Pawel (privately)
- Re: jBASE unefficient? - distributed files Jim Idle
- Re: jBASE unefficient? - distributed fi... Pawel (privately)
