Pawel (privately) wrote: > Hi, > > I would like to ask group members about jBASE performance in relation to > distributed files. > > We have noticed that SELECTs with criteria on distributed may be 4 times > slower than on regular files. > I think that reason is quite clear: jBASE is querying each partfile and > then reading record from it to qualify it or throw it away from select > list. > > Reading of the record is however not clever process. Distributed routine > is run unnecessary for each record, which causes a lot of overhead. > Does jBASE need to confirm that key taken from part file #1 is still in > part file #1? ;) > > We have raised it as performance bug. I would expect that jBASE reads > part files one by one and does not need to invoke distribution routine. > What do you think? > Our suggestion relates also to part files scanning - it could be done in > separate processes too (just to speed up selection process). > > PS. I know that it is not clever to ask quries against distributed > files, but why not to optimize jBASE? :) > > The problem isn't so much that distributed files are inefficient but probably that the algorithm you are using is not optimal. What is the SELECT statement you are using? If you create a list then read through that list, you will read in the list order, which may not be optimal for that distribution. Note that many moons ago I modified the distributed files so that you could change the key on the fly to guarantee that the distribution is good (not that anyone has ever used except the people I wrote it for) Otherwise the key order you get is not necessarily the order that is best to read through the part files and you will create millions of random reads instead of lots of sequential reads.
A select will read the ID, then it will read the record - non-distributed files use some neat tricks for bulk reads, whereas distributed files probably cannot. Have you considered SELECTing each part file individually, then merging the results? I doubt that the distributed files can be generically optimized, but by changing the key on read and write (computationally of course) you can probably get much better performance. Of course, once you can use my new file system, you won't need distributed files and won't have this problem :-) Jim --~--~---------~--~----~------------~-------~--~----~ Please read the posting guidelines at: http://groups.google.com/group/jBASE/web/Posting%20Guidelines IMPORTANT: Type T24: at the start of the subject line for questions specific to Globus/T24 To post, send email to [email protected] To unsubscribe, send email to [email protected] For more options, visit this group at http://groups.google.com/group/jBASE?hl=en -~----------~----~----~----~------~----~------~--~---
