Re: jBASE unefficient? - distributed files

Pawel (privately) Sun, 22 Mar 2009 16:13:09 -0700

Hi,
 
> The problem isn't so much that distributed files are inefficient but
> probably that the algorithm you are using is not optimal. What is the
> SELECT statement you are using?
These are regular SELECT statements, eg. SELECT <distributed_file> WITH 
FIELD1 LIKE sth...


I will focus on one specific file only, but other are very much similar. 
Our distribution algorithms are always uncomplicated. They distribute 
data very well in this, discussed case.

Distribution algorithm is very simple - it uses part of date (day) 
contained in key to distribute records. So we have 32 partfiles. IDs not 
matching some pattern (say 1A5N), are put to partfile 32. For these 
matching pattern there is only 1 invocation of ICONV and OCONV. Day is 
obtained from date and returned as partfile number. Procedure can not be 
simpler (few lines) I think :)

The performance problem arises when you ask for data with selection 
criteria. jBASE will start to call distribution subroutine thousands of 
times. This will introduce enomours overhead. We usually do not need to 
ask queries like that, but for some (CSHD) investigations we are forced 
to do it like that.

> If you create a list then read through
> that list, you will read in the list order, which may not be optimal for
> that distribution. Note that many moons ago I modified the distributed
> files so that you could change the key on the fly to guarantee that the
> distribution is good (not that anyone has ever used except the people I
> wrote it for) Otherwise the key order you get is not necessarily the
> order that is best to read through the part files and you will create
> millions of random reads instead of lots of sequential reads.
I think that select is taking keys "in natural order" from partfiles, 
but I can confirm tommorow. We are using jBASE 4.1.5.17.

The main difference is that jBASE runs distribution routine for these 
"full scan" selects and I can not understand why does it need to do it?
I guess that SELECT / READNEXT operations of jEDI driver implemented for 
distributed files are virtually handling distribution (so SELECT program 
is not aware of partfiles), but just performs SELECT / READNEXT + READ 
of record.

This is inefficient, because READ introduces unnecessary overhead caused 
by calling distribution routine. Results can be obtained much faster by 
doing (direct) SELECTs on partfiles and combining output.

This is however optimization for jBASE team, not us I belive. We already 
raised it, but I noticed "resistance" in accepting this ticket :(

> A select will read the ID, then it will read the record -
> non-distributed files use some neat tricks for bulk reads, whereas
> distributed files probably cannot. Have you considered SELECTing each
> part file individually, then merging the results? I doubt that the
> distributed files can be generically optimized, but by changing the key
> on read and write (computationally of course) you can probably get much
> better performance.
> 
> Of course, once you can use my new file system, you won't need
> distributed files and won't have this problem :-)
I need to read your post from the past. Do you need if jBASE 5 would 
help us in liquidating described problem?

I think that calling of distribution routine is not needed if you do 
full scan table. I guess that many people could benefit from such 
optimization.

Kind regards
Pawel

----------------------------------------------------
EuroBasket 2009 w Polsce!
Giganci nadchodzą, zobacz trailer.
Kliknij: 
http://klik.wp.pl/?adr=http%3A%2F%2Fcorto.www.wp.pl%2Fas%2Feurobasket.html&sid=668



--~--~---------~--~----~------------~-------~--~----~
Please read the posting guidelines at: 
http://groups.google.com/group/jBASE/web/Posting%20Guidelines

IMPORTANT: Type T24: at the start of the subject line for questions specific to 
Globus/T24

To post, send email to [email protected]
To unsubscribe, send email to [email protected]
For more options, visit this group at http://groups.google.com/group/jBASE?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: jBASE unefficient? - distributed files

Reply via email to