Re: jBASE unefficient? - distributed files

Pawel (privately) Mon, 23 Mar 2009 07:48:16 -0700

Hi,

Dnia 23-03-2009 o godz. 0:57 Jim Idle napisał(a):

Pawel (privately) wrote:

Hi,

The problem isn't so much that distributed files are inefficient but
probably that the algorithm you are using is not optimal. What is the
SELECT statement you are using?

These are regular SELECT statements, eg. SELECT <distributed_file> WITH 
FIELD1 LIKE sth...

I will focus on one specific file only, but other are very much similar. 
Our distribution algorithms are always uncomplicated. They distribute 
data very well in this, discussed case.

Or at least, you THINK they do ;-)

I have reviewed all distribution routines and they are really simple. I checked data distribution for most frequently used files and simple algorithms are fine. Data volumes in some days are "visibly" higher than in the other, but this can be reflected by right sizing of partfiles.

Such distribution brings extra benefit to us - we can easily find information for selected day avoiding querying large table.

The main difference is that jBASE runs distribution routine for these 
"full scan" selects and I can not understand why does it need to do it?
  
How can it do otherwise? The list must be the list of record keys.

It is not needed to trigger distribution routine :) in full scan selects on distributed files. I know that SELECT is written in a way it is written (SELECT / READNEXT on distributed file + READ), but it could be written as SELECT / READNEXT on partfiles + READ without call to distribution routine (it should combine output select list too).

I guess that SELECT / READNEXT operations of jEDI driver implemented for 
distributed files are virtually handling distribution (so SELECT program 
is not aware of partfiles),
Yep. And because it is a calculated key, it probably isn't using the fastscan interface so performance will be very low in comparison.

Are you referring to this one variable JBASE_DISTRIB_FASTSCAN=1? I do not know what it is, but it is set in our .profile :) I did not check wheter it influences positively or negatively on performance.

Can somebody explain me please what is FastScan for distributed files?


This is however optimization for jBASE team,

No - we optimized for the general case, but if you are going to take over the key (or rather partition selection), there is nothing to be done but ask you for it.

I still do not understand :( Why distribution routine needs to be invoked if you run SELECT against distributed file? I know that SELECT is written in such a way, but it could be written in other. Instead of using of SELECT / READNEXT on distributed file it could:

* check names and numbers of partfiles

* query each partfile (in sequence or in parallel) - no dist routine called

* combine and return output select list

Then distribution subroutine would not be called! We can do our own (workaround) SELECT wrapper to behave in such way, but I think that this could be optimization for jBASE team not us.

A select will read the ID, then it will read the record -
non-distributed files use some neat tricks for bulk reads, whereas
distributed files probably cannot. Have you considered SELECTing each
part file individually, then merging the results? I doubt that the
distributed files can be generically optimized, but by changing the key
on read and write (computationally of course) you can probably get much
better performance.

I did not know that key can be changed "on the fly". It think that it has only limited usage. Am I wrong?

Of course, once you can use my new file system, you won't need
distributed files and won't have this problem :-)

Looking for beta-testers? :) We can try :)
I wonder if jBASE 5 would bring any benefit to distributed files.

Kind regards

Pawel

----------------------------------------------------
Aukcja Charytatywna Sztuki Współczesnej!
Licytuj dzieła współczesnych artystów
i pomóż dzieciom chorym na autyzm.
Kliknij: http://klik.wp.pl/?adr=http://corto.www.wp.pl/as/innespojrzenie0309.html&sid=670
--~--~---------~--~----~------------~-------~--~----~
Please read the posting guidelines at: http://groups.google.com/group/jBASE/web/Posting%20Guidelines

IMPORTANT: Type T24: at the start of the subject line for questions specific to Globus/T24

To post, send email to [email protected]
To unsubscribe, send email to [email protected]
For more options, visit this group at http://groups.google.com/group/jBASE?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: jBASE unefficient? - distributed files

Reply via email to