RE: initial page allocation versus incremental allocation

2009-03-03 Thread Brian Peterson
Won't this still limit the setting to the MAX_PRE_ALLOC_SIZE?  Why have the 
maximum setting? Or, at least, why limit it to just 8 pages? If it is 
configurable, then applications that need the option have it available, and 
those that don't can continue using the default settings. I'm surprised there's 
an option to configure the initial allocation but not the increment size.

What I've been trying to do is use Derby in a client application that needs to 
download more data than will fit into heap. The objects get serialized into 
rows and are indexed by the row number and an additional java.util.UUID that 
identifies the object. The client gui can analyze the data and display the 
results and the raw data as required.

The loading works great up to the initial allocation of pages, then it crawls, 
which cascades through and disrupts the whole application. It looks like I 
could solve this by configuring the page-increment size just like the initial 
allocation setting. 

I had hoped to use Derby as a kind of general purpose heap management solution. 
It has been doing a great job of limiting the amount of heap it uses, and it 
integrates well with the rest of the application, but the slowdown of the table 
loading is killing me. 

Brian

-Original Message-
From: knut.hat...@sun.com [mailto:knut.hat...@sun.com] 
Sent: Tuesday, March 03, 2009 3:28 AM
To: Derby Discussion
Subject: Re: initial page allocation versus incremental allocation

Brian Peterson dianeay...@verizon.net writes:

 I see that there’s a property to allow configuring the number of pages
 to initially allocate to a table, derby.storage.initialPages, but
 there isn’t a property to allow for setting the number of pages to
 allocate when incrementally expanding the file container. It looks
 like RawStoreFactory might’ve allowed for this with

 public static final String PRE_ALLOCATE_PAGE = 
 “derby.storage.pagePerAllocation”;

 but this isn’t reference by anything I can find.

I haven't tested that it actually works, but it appears to be referenced
in FileContainer.createInfoFromProp():

PreAllocSize = 
PropertyUtil.getServiceInt(tc, createArgs,
RawStoreFactory.PRE_ALLOCATE_PAGE,
MIN_PRE_ALLOC_SIZE,
MAX_PRE_ALLOC_SIZE, 
   
DEFAULT_PRE_ALLOC_SIZE /* default */);

If it turns out that setting the property works, we should probably try
to get it into the documentation, as it looks like it could be useful.

-- 
Knut Anders




initial page allocation versus incremental allocation

2009-03-02 Thread Brian Peterson
I see that there's a property to allow configuring the number of pages to
initially allocate to a table, derby.storage.initialPages, but there isn't a
property to allow for setting the number of pages to allocate when
incrementally expanding the file container. It looks like RawStoreFactory
might've allowed for this with 

 

public static final String PRE_ALLOCATE_PAGE =
derby.storage.pagePerAllocation;

 

but this isn't reference by anything I can find.  FileContainer fixes the
incremental expansion to 8 pages with the DEFAULT_PRE_ALLOC_SIZE constant.

 

What was the reason for not allowing the pre-allocation setting to be
configurable? Were there adverse affects on FileContainer if it was
increased to something like 100 pages?

 

Brian

 



inserts slowing down after 2.5m rows

2009-02-27 Thread Brian Peterson
I have a big table that gets a lot of inserts. Rows are inserted 10k at a
time with a table function. At around 2.5 million rows, inserts slow down
from 2-7s to around 15-20s. The table's dat file is around 800-900M.

 

I have durability set to test, table-level locks, a primary key index and
another 2-column index on the table. Page size is at the max and page cache
set to 4500 pages. The table gets compressed (inplace) every 500,000 rows.
I'm using Derby 10.4 with JDK 1.6.0_07, running on Windows XP. I've ruled
out anything from the rest of the application, including GC (memory usage
follows a consistent pattern during the whole load). It is a local file
system. The database has a fixed number of tables (so there's a fixed number
of dat files in the database directory the whole time). The logs are getting
cleaned up, so there's only a few dat files in the log directory as well.

 

Any ideas what might be causing the big slowdown after so many loads?

 

Brian

 



RE: inserts slowing down after 2.5m rows

2009-02-27 Thread Brian Peterson
I thought I read in the documentation that 1000 was the max initial pages
you could allocate, and after that, Derby allocates a page at a time. Is
there some other setting for getting it to allocate more at a time?

 

Brian

 

From: Michael Segel [mailto:mse...@segel.com] On Behalf Of de...@segel.com
Sent: Friday, February 27, 2009 9:59 PM
To: 'Derby Discussion'
Subject: RE: inserts slowing down after 2.5m rows

 

Ok, 

 

For testing, if you allocate 2000 pages, then if my thinking is ok, then
you'll fly along until you get until 2100 pages.

 

It sounds like you're hitting a bit of a snag where after your initial
allocation of pages, Derby is only allocating a smaller number of pages at a
time.

 

I would hope that you could configure the number of pages to be allocated in
blocks as the table grows.

 

 

  _  

From: publicay...@verizon.net [mailto:publicay...@verizon.net] 
Sent: Friday, February 27, 2009 8:48 PM
To: Derby Discussion
Subject: Re: inserts slowing down after 2.5m rows

 

 I've increased the log size and the checkpoint interval, but it doesn't
seem to help.

 

It looks like the inserts begin to dramatically slow down once the table
reaches the initial allocation of pages. Things just fly along until it gets
to about 1100 pages (I've allocated an initial 1000 pages, pages are 32k).

 

Any suggestions on how to keep the inserts moving quickly at this point?

 

Brian

 

On Fri, Feb 27, 2009 at  3:41 PM, publicay...@verizon.net wrote:

 

 The application is running on a client machine. I'm not sure how to tell if
there's a different disk available that I could log to. 

 

If checkpoint is causing this delay, how to a manage that? Can I turn
checkpointing off? I already have durability set to test; I'm not concerned
about recovering from a crashed db. 

 

Brian 

 

On Fri, Feb 27, 2009 at  9:34 AM, Peter Ondruška wrote: 

 

 Could be checkpoint.. BTW to speed up bulk load you may want to use 

large log files located separately from data disks. 

 

2009/2/27, Brian Peterson  dianeay...@verizon.net : 

 I have a big table that gets a lot of inserts. Rows are inserted 10k at a 

 time with a table function. At around 2.5 million rows, inserts slow down 

 from 2-7s to around 15-20s. The table's dat file is around 800-900M. 

 

 

 

 I have durability set to test, table-level locks, a primary key index
and 

 another 2-column index on the table. Page size is at the max and page
cache 

 set to 4500 pages. The table gets compressed (inplace) every 500,000 rows.


 I'm using Derby 10.4 with JDK 1.6.0_07, running on Windows XP. I've ruled 

 out anything from the rest of the application, including GC (memory usage 

 follows a consistent pattern during the whole load). It is a local file 

 system. The database has a fixed number of tables (so there's a fixed
number 

 of dat files in the database directory the whole time). The logs are
getting 

 cleaned up, so there's only a few dat files in the log directory as well. 

 

 

 

 Any ideas what might be causing the big slowdown after so many loads? 

 

 

 

 Brian 

 

 

 

 



RE: is there a lower-level (non-SQL) API for Derby?

2009-01-19 Thread Brian Peterson
Hi Rick,

I've tried following up with this because I'd be interested in using this
lighter version. From what I've been able to find, it looks like you started
to set up the goals for such an effort. Is this effort still moving forward?

My chief need would be speed -- factoring out the overhead of the JDBC/SQL
interface. I see someone else noted that this has been measured at 15-20%
for lookups on simple tables. I would definitely use the subsystem to get a
20% improvement when using an embedded database.

Brian 

-Original Message-
From: richard.hille...@sun.com [mailto:richard.hille...@sun.com] 
Sent: Monday, January 05, 2009 9:49 AM
To: Derby Discussion
Cc: derby-...@db.apache.org
Subject: Re: is there a lower-level (non-SQL) API for Derby?

Hi Tim,

This question has come up before. For instance, you may find some 
interesting discussion on the following email thread: 
http://www.nabble.com/simpler-api-to-the-Derby-store-td18137499.html#a181374
99

The Derby storage layer is supposed to be an independent component. The 
api is described in the javadoc for the 
org.apache.derby.iapi.store.access package: 
http://db.apache.org/derby/javadoc/engine/

What would you say are your chief needs? Are you looking for a version 
of Derby which is

1) smaller
2) faster

  or

3) easier-to-use

Hope this helps,
-Rick

Tim Dugan wrote:
  
 I'm looking to see if Derby can be used similarly to Berkeley DB -- a 
 lower-level API.  Can anyone tell me?

  

 Maybe to the access area of the Store Layer which in some Derby 
 documentation is described like this:

 The Store layer is split into two main areas, access and raw. The
 access layer presents a conglomerate (table or index)/row based
 interface to the SQL layer. It handles table scans, index scans,
 index lookups, indexing, sorting, locking policies, transactions,
 isolation levels.

 Now that Derby is included in Java 16--I am having a really hard time 
 finding Java documentation that talks about Derby.
  





iterating over millions of rows

2009-01-19 Thread Brian Peterson
I have a big table, about 1 million rows, that I'm doing a simple select *
over. The table is depressingly simple, basically a big VARCHAR for bit data
that stores some serialized bytes. When I profile using VisualVM it seems
that it is spending most of its time in 

org.apache.derby.impl.store.raw.data.RAFContainer4.readFully

If I remember correctly, this gets invoked something like 3000 times. Is
there anything I can do to speed up iterating over this table? It is taking
about 30s to iterate over the 1 million records, but I could have up to 25
million.

It is an embedded db using 10.4, JDK 1.6.0_07, running on a Windows XP SP2
machine. I have page size set to the max, 32K, and the page case size set to
6000 pages. 

Is there anything I can do, or have I just run up against how fast Windows
can read off of the disk?

Brian





procedure to upgrade or create an embedded database

2008-12-06 Thread Brian Peterson
I'm trying to figure out how to determine what connection parameters for an
embedded database I should use when there might not be a database in place
(so create=true), or there might be an older version already in place (so
upgrade=true), or there might already be a database of the right version
in place (so create or add nothing to the URL).

I read that upgrade and create cannot both be specified in the
connection URL. If I'm putting out a release of my application that uses the
latest version of Derby (10.4) while a previous version used 10.2.2, what
are the recommended steps for making the connection to the Derby database if
one is already present? (Note that I have to handle this programmatically as
part of the application startup.)

Do I first try a URL with create and see if there's an error, and if so,
follow up with a call with upgrade? Or do I have the procedure always use
upgrade and follow up with a URL with create if it fail to make a
connection?

Brian