Re: jBASE unefficient? - distributed files

CLIF Fri, 20 Mar 2009 10:34:02 -0700

What's your new file system?

On Mar 20, 10:12 am, Jim Idle <[email protected]> wrote:
> Pawel (privately) wrote:
> > Hi,
>
> > I would like to ask group members about jBASE performance in relation to
> > distributed files.
>
> > We have noticed that SELECTs with criteria on distributed may be 4 times
> > slower than on regular files.
> > I think that reason is quite clear: jBASE is querying each partfile and
> > then reading record from it to qualify it or throw it away from select
> > list.
>
> > Reading of the record is however not clever process. Distributed routine
> > is run unnecessary for each record, which causes a lot of overhead.
> > Does jBASE need to confirm that key taken from part file #1 is still in
> > part file #1? ;)
>
> > We have raised it as performance bug. I would expect that jBASE reads
> > part files one by one and does not need to invoke distribution routine.
> > What do you think?
> > Our suggestion relates also to part files scanning - it could be done in
> > separate processes too (just to speed up selection process).
>
> > PS. I know that it is not clever to ask quries against distributed
> > files, but why not to optimize jBASE? :)
>
> The problem isn't so much that distributed files are inefficient but
> probably that the algorithm you are using is not optimal. What is the
> SELECT statement you are using? If you create a list then read through
> that list, you will read in the list order, which may not be optimal for
> that distribution. Note that many moons ago I modified the distributed
> files so that you could change the key on the fly to guarantee that the
> distribution is good (not that anyone has ever used except the people I
> wrote it for) Otherwise the key order you get is not necessarily the
> order that is best to read through the part files and you will create
> millions of random reads instead of lots of sequential reads.
>
> A select will read the ID, then it will read the record -
> non-distributed files use some neat tricks for bulk reads, whereas
> distributed files probably cannot. Have you considered SELECTing each
> part file individually, then merging the results? I doubt that the
> distributed files can be generically optimized, but by changing the key
> on read and write (computationally of course) you can probably get much
> better performance.
>
> Of course, once you can use my new file system, you won't need
> distributed files and won't have this problem :-)
>
> Jim


--~--~---------~--~----~------------~-------~--~----~
Please read the posting guidelines at: 
http://groups.google.com/group/jBASE/web/Posting%20Guidelines

IMPORTANT: Type T24: at the start of the subject line for questions specific to 
Globus/T24

To post, send email to [email protected]
To unsubscribe, send email to [email protected]
For more options, visit this group at http://groups.google.com/group/jBASE?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: jBASE unefficient? - distributed files

Reply via email to