RE: initial page allocation versus incremental allocation
Won't this still limit the setting to the MAX_PRE_ALLOC_SIZE? Why have the maximum setting? Or, at least, why limit it to just 8 pages? If it is configurable, then applications that need the option have it available, and those that don't can continue using the default settings. I'm surprised there's an option to configure the initial allocation but not the increment size. What I've been trying to do is use Derby in a client application that needs to download more data than will fit into heap. The objects get serialized into rows and are indexed by the row number and an additional java.util.UUID that identifies the object. The client gui can analyze the data and display the results and the raw data as required. The loading works great up to the initial allocation of pages, then it crawls, which cascades through and disrupts the whole application. It looks like I could solve this by configuring the page-increment size just like the initial allocation setting. I had hoped to use Derby as a kind of general purpose heap management solution. It has been doing a great job of limiting the amount of heap it uses, and it integrates well with the rest of the application, but the slowdown of the table loading is killing me. Brian -Original Message- From: knut.hat...@sun.com [mailto:knut.hat...@sun.com] Sent: Tuesday, March 03, 2009 3:28 AM To: Derby Discussion Subject: Re: initial page allocation versus incremental allocation Brian Peterson writes: > I see that there’s a property to allow configuring the number of pages > to initially allocate to a table, derby.storage.initialPages, but > there isn’t a property to allow for setting the number of pages to > allocate when incrementally expanding the file container. It looks > like RawStoreFactory might’ve allowed for this with > > public static final String PRE_ALLOCATE_PAGE = > “derby.storage.pagePerAllocation”; > > but this isn’t reference by anything I can find. I haven't tested that it actually works, but it appears to be referenced in FileContainer.createInfoFromProp(): PreAllocSize = PropertyUtil.getServiceInt(tc, createArgs, RawStoreFactory.PRE_ALLOCATE_PAGE, MIN_PRE_ALLOC_SIZE, MAX_PRE_ALLOC_SIZE, DEFAULT_PRE_ALLOC_SIZE /* default */); If it turns out that setting the property works, we should probably try to get it into the documentation, as it looks like it could be useful. -- Knut Anders
initial page allocation versus incremental allocation
I see that there's a property to allow configuring the number of pages to initially allocate to a table, derby.storage.initialPages, but there isn't a property to allow for setting the number of pages to allocate when incrementally expanding the file container. It looks like RawStoreFactory might've allowed for this with public static final String PRE_ALLOCATE_PAGE = "derby.storage.pagePerAllocation"; but this isn't reference by anything I can find. FileContainer fixes the incremental expansion to 8 pages with the DEFAULT_PRE_ALLOC_SIZE constant. What was the reason for not allowing the pre-allocation setting to be configurable? Were there adverse affects on FileContainer if it was increased to something like 100 pages? Brian
RE: inserts slowing down after 2.5m rows
I thought I read in the documentation that 1000 was the max initial pages you could allocate, and after that, Derby allocates a page at a time. Is there some other setting for getting it to allocate more at a time? Brian From: Michael Segel [mailto:mse...@segel.com] On Behalf Of de...@segel.com Sent: Friday, February 27, 2009 9:59 PM To: 'Derby Discussion' Subject: RE: inserts slowing down after 2.5m rows Ok, For testing, if you allocate 2000 pages, then if my thinking is ok, then you'll fly along until you get until 2100 pages. It sounds like you're hitting a bit of a snag where after your initial allocation of pages, Derby is only allocating a smaller number of pages at a time. I would hope that you could configure the number of pages to be allocated in blocks as the table grows. _ From: publicay...@verizon.net [mailto:publicay...@verizon.net] Sent: Friday, February 27, 2009 8:48 PM To: Derby Discussion Subject: Re: inserts slowing down after 2.5m rows I've increased the log size and the checkpoint interval, but it doesn't seem to help. It looks like the inserts begin to dramatically slow down once the table reaches the initial allocation of pages. Things just fly along until it gets to about 1100 pages (I've allocated an initial 1000 pages, pages are 32k). Any suggestions on how to keep the inserts moving quickly at this point? Brian On Fri, Feb 27, 2009 at 3:41 PM, publicay...@verizon.net wrote: The application is running on a client machine. I'm not sure how to tell if there's a different disk available that I could log to. If checkpoint is causing this delay, how to a manage that? Can I turn checkpointing off? I already have durability set to test; I'm not concerned about recovering from a crashed db. Brian On Fri, Feb 27, 2009 at 9:34 AM, Peter Ondruška wrote: > Could be checkpoint.. BTW to speed up bulk load you may want to use large log files located separately from data disks. 2009/2/27, Brian Peterson < dianeay...@verizon.net >: > I have a big table that gets a lot of inserts. Rows are inserted 10k at a > time with a table function. At around 2.5 million rows, inserts slow down > from 2-7s to around 15-20s. The table's dat file is around 800-900M. > > > > I have durability set to "test", table-level locks, a primary key index and > another 2-column index on the table. Page size is at the max and page cache > set to 4500 pages. The table gets compressed (inplace) every 500,000 rows. > I'm using Derby 10.4 with JDK 1.6.0_07, running on Windows XP. I've ruled > out anything from the rest of the application, including GC (memory usage > follows a consistent pattern during the whole load). It is a local file > system. The database has a fixed number of tables (so there's a fixed number > of dat files in the database directory the whole time). The logs are getting > cleaned up, so there's only a few dat files in the log directory as well. > > > > Any ideas what might be causing the big slowdown after so many loads? > > > > Brian > > > >
inserts slowing down after 2.5m rows
I have a big table that gets a lot of inserts. Rows are inserted 10k at a time with a table function. At around 2.5 million rows, inserts slow down from 2-7s to around 15-20s. The table's dat file is around 800-900M. I have durability set to "test", table-level locks, a primary key index and another 2-column index on the table. Page size is at the max and page cache set to 4500 pages. The table gets compressed (inplace) every 500,000 rows. I'm using Derby 10.4 with JDK 1.6.0_07, running on Windows XP. I've ruled out anything from the rest of the application, including GC (memory usage follows a consistent pattern during the whole load). It is a local file system. The database has a fixed number of tables (so there's a fixed number of dat files in the database directory the whole time). The logs are getting cleaned up, so there's only a few dat files in the log directory as well. Any ideas what might be causing the big slowdown after so many loads? Brian
iterating over millions of rows
I have a big table, about 1 million rows, that I'm doing a simple "select *" over. The table is depressingly simple, basically a big VARCHAR for bit data that stores some serialized bytes. When I profile using VisualVM it seems that it is spending most of its time in org.apache.derby.impl.store.raw.data.RAFContainer4.readFully If I remember correctly, this gets invoked something like 3000 times. Is there anything I can do to speed up iterating over this table? It is taking about 30s to iterate over the 1 million records, but I could have up to 25 million. It is an embedded db using 10.4, JDK 1.6.0_07, running on a Windows XP SP2 machine. I have page size set to the max, 32K, and the page case size set to 6000 pages. Is there anything I can do, or have I just run up against how fast Windows can read off of the disk? Brian
RE: is there a lower-level (non-SQL) API for Derby?
Hi Rick, I've tried following up with this because I'd be interested in using this lighter version. From what I've been able to find, it looks like you started to set up the goals for such an effort. Is this effort still moving forward? My chief need would be speed -- factoring out the overhead of the JDBC/SQL interface. I see someone else noted that this has been measured at 15-20% for lookups on simple tables. I would definitely use the subsystem to get a 20% improvement when using an embedded database. Brian -Original Message- From: richard.hille...@sun.com [mailto:richard.hille...@sun.com] Sent: Monday, January 05, 2009 9:49 AM To: Derby Discussion Cc: derby-...@db.apache.org Subject: Re: is there a lower-level (non-SQL) API for Derby? Hi Tim, This question has come up before. For instance, you may find some interesting discussion on the following email thread: http://www.nabble.com/simpler-api-to-the-Derby-store-td18137499.html#a181374 99 The Derby storage layer is supposed to be an independent component. The api is described in the javadoc for the org.apache.derby.iapi.store.access package: http://db.apache.org/derby/javadoc/engine/ What would you say are your chief needs? Are you looking for a version of Derby which is 1) smaller 2) faster or 3) easier-to-use Hope this helps, -Rick Tim Dugan wrote: > > I'm looking to see if Derby can be used similarly to Berkeley DB -- a > lower-level API. Can anyone tell me? > > > > Maybe to the access area of the "Store Layer" which in some Derby > documentation is described like this: > > "The Store layer is split into two main areas, access and raw. The > access layer presents a conglomerate (table or index)/row based > interface to the SQL layer. It handles table scans, index scans, > index lookups, indexing, sorting, locking policies, transactions, > isolation levels." > > Now that Derby is included in Java 16--I am having a really hard time > finding Java documentation that talks about Derby. >
procedure to upgrade or create an embedded database
I'm trying to figure out how to determine what connection parameters for an embedded database I should use when there might not be a database in place (so "create=true"), or there might be an older version already in place (so "upgrade=true"), or there might already be a database of the right version in place (so "create" or add nothing to the URL). I read that "upgrade" and "create" cannot both be specified in the connection URL. If I'm putting out a release of my application that uses the latest version of Derby (10.4) while a previous version used 10.2.2, what are the recommended steps for making the connection to the Derby database if one is already present? (Note that I have to handle this programmatically as part of the application startup.) Do I first try a URL with "create" and see if there's an error, and if so, follow up with a call with "upgrade"? Or do I have the procedure always use "upgrade" and follow up with a URL with "create" if it fail to make a connection? Brian