What's your new file system? On Mar 20, 10:12 am, Jim Idle <[email protected]> wrote: > Pawel (privately) wrote: > > Hi, > > > I would like to ask group members about jBASE performance in relation to > > distributed files. > > > We have noticed that SELECTs with criteria on distributed may be 4 times > > slower than on regular files. > > I think that reason is quite clear: jBASE is querying each partfile and > > then reading record from it to qualify it or throw it away from select > > list. > > > Reading of the record is however not clever process. Distributed routine > > is run unnecessary for each record, which causes a lot of overhead. > > Does jBASE need to confirm that key taken from part file #1 is still in > > part file #1? ;) > > > We have raised it as performance bug. I would expect that jBASE reads > > part files one by one and does not need to invoke distribution routine. > > What do you think? > > Our suggestion relates also to part files scanning - it could be done in > > separate processes too (just to speed up selection process). > > > PS. I know that it is not clever to ask quries against distributed > > files, but why not to optimize jBASE? :) > > The problem isn't so much that distributed files are inefficient but > probably that the algorithm you are using is not optimal. What is the > SELECT statement you are using? If you create a list then read through > that list, you will read in the list order, which may not be optimal for > that distribution. Note that many moons ago I modified the distributed > files so that you could change the key on the fly to guarantee that the > distribution is good (not that anyone has ever used except the people I > wrote it for) Otherwise the key order you get is not necessarily the > order that is best to read through the part files and you will create > millions of random reads instead of lots of sequential reads. > > A select will read the ID, then it will read the record - > non-distributed files use some neat tricks for bulk reads, whereas > distributed files probably cannot. Have you considered SELECTing each > part file individually, then merging the results? I doubt that the > distributed files can be generically optimized, but by changing the key > on read and write (computationally of course) you can probably get much > better performance. > > Of course, once you can use my new file system, you won't need > distributed files and won't have this problem :-) > > Jim
--~--~---------~--~----~------------~-------~--~----~ Please read the posting guidelines at: http://groups.google.com/group/jBASE/web/Posting%20Guidelines IMPORTANT: Type T24: at the start of the subject line for questions specific to Globus/T24 To post, send email to [email protected] To unsubscribe, send email to [email protected] For more options, visit this group at http://groups.google.com/group/jBASE?hl=en -~----------~----~----~----~------~----~------~--~---
