Improving data insert performance

2010-01-13 Thread Nathan Boy
Hello,
   I have an embedded database application that generally involves
inserting somewhere between 50k and 1000k rows of data into a
database, and then analyzing and querying that data afterwards.  The
data goes into about ten tables, but the bulk of the data is in just a
few of them.  I run my database with test durability, and I add all
of the primary key and foreign key constraints after the data is
inserted into the database.  Currently all of the data is inserted
using prepared statements executed in batches, and this gives me
between 10 and 20 row inserts per millisecond on average.  I have
spent quite a bit of time optimizing the insert step, and while I was
doing research I came across this discussion from last year:

http://www.mail-archive.com/derby-user@db.apache.org/msg10194.html

The discussion suggests using bulk import as a way to speed up this
initial insert step.  Unfortunately, I cannot use the built in import
functions, as my data includes Timestamps with nanosecond granularity.
 As far as I can tell, there is no way to convince derby to parse a
time specified down to the nanosecond.  In one of the emails, someone
suggested that you can get bulk import performance by using a table
function, and then running a query like INSERT INTO MYTABLE (...)
SELECT S.* FROM TABLE (MYFUNC ()) S.  In my tests, however, this
doesn't seem to perform the insert any faster than simply inserting
the rows one at a time with a prepared statement.  I think this may be
because I don't have a way to set the 'insertMode=bulkImport'
property, which the bulk import system procedure is allowed to do.
Does anyone know of a way to work around this, or of a better way to
get my data into the database as quickly as possible?  Thanks in
advance for your time.

Cheers,
Nathan Boy


Re: Improving data insert performance

2010-01-13 Thread Bryan Pendleton

using prepared statements executed in batches, and this gives me
between 10 and 20 row inserts per millisecond on average.  I have


Wow! You're already going very fast, and you've already put a lot
of good effort into this, so I congratulate you on how far you've
gone already.

A couple of thoughts occurred to me as I read your post:

 - have you tried using in-memory databases rather than test duration?
 - have you experimented with using a larger page size to minimize
   per-page overhead?
 - have you experimented with giving a larger-than-default page cache?

Also, have you done any system-level analysis of your application
during the insert:
 - is your system CPU-bound? is there any disk io occurring at all?
 - how much garbage are you generating, what is the GC profile like?
 - what rev of the JDK are you running, and on what operating system?
 - is your system multi-core? Can you throw threads/cores at this problem?
 - can you get any sort of profiling data about where the hotspots
   are during your run?

Good luck with your performance work, it sounds extremely interesting!

thanks,

bryan



Re: Improving data insert performance

2010-01-13 Thread Peter Ondruška
Also try using larger log files (10+MB) and if you rarely change data
you load (which I guess is your case) you may want to use
pageReservedSpace=0 and large pageSize (32kb?)

On Wed, Jan 13, 2010 at 5:01 PM, Nathan Boy nathan@gmail.com wrote:
 Hello,
   I have an embedded database application that generally involves
 inserting somewhere between 50k and 1000k rows of data into a
 database, and then analyzing and querying that data afterwards.  The
 data goes into about ten tables, but the bulk of the data is in just a
 few of them.  I run my database with test durability, and I add all
 of the primary key and foreign key constraints after the data is
 inserted into the database.  Currently all of the data is inserted
 using prepared statements executed in batches, and this gives me
 between 10 and 20 row inserts per millisecond on average.  I have
 spent quite a bit of time optimizing the insert step, and while I was
 doing research I came across this discussion from last year:

 http://www.mail-archive.com/derby-user@db.apache.org/msg10194.html

 The discussion suggests using bulk import as a way to speed up this
 initial insert step.  Unfortunately, I cannot use the built in import
 functions, as my data includes Timestamps with nanosecond granularity.
  As far as I can tell, there is no way to convince derby to parse a
 time specified down to the nanosecond.  In one of the emails, someone
 suggested that you can get bulk import performance by using a table
 function, and then running a query like INSERT INTO MYTABLE (...)
 SELECT S.* FROM TABLE (MYFUNC ()) S.  In my tests, however, this
 doesn't seem to perform the insert any faster than simply inserting
 the rows one at a time with a prepared statement.  I think this may be
 because I don't have a way to set the 'insertMode=bulkImport'
 property, which the bulk import system procedure is allowed to do.
 Does anyone know of a way to work around this, or of a better way to
 get my data into the database as quickly as possible?  Thanks in
 advance for your time.

 Cheers,
 Nathan Boy



Embedded Web Server in Java stand alone application...

2010-01-13 Thread Jim Crowell

I have a Java stand alone application that creates a Derby Database using the
embedded derby connect. It work fine in creating, viewing and editing the
Derby tables.

I have a requirement to generate reports as a function of the constructed
Derby file. In my infinite wisdom I decided to build the application reports
using my application  host default Internet Browser, i.e. the “Report
Generator” is HTM driven.

This decision is based on my requirement to generate reports in the
following modes:

1) Local Mode using the Derby Database residing on the HDD. I’ll invoke the
Internet Browser from my stand alone app.

2) WWW Mode using the Derby Database residing on my Server. I’ll use
JavaServer Pages [JSP] to construct the various reports.

The “WWW mode” is doable, standard processing. The reports shall be created
as HTML templates with custom Tags where the Derby parameters shall be
inserted.

In the “Local Mode”, ideally I would like to use the same JSP / Servlet
method employed in the “WWW mode”. An alternative [workaround] approach
would be to still use the form template with custom tags but do the
processing independent of a Web Server, JSP container. Some Java methods
could be used in both the alternative approach and the JSP / Servlet “WWW
Mode” phase.

The purpose of this thread is to determine if it is possible to use a Web
Server in the “Local Mode”? Another objective is to get an idea if this type
of stand alone application / Web Server integration has been accomplished. I
think I know how to do the “alternative implementation” and needed to
understand if I’m getting into development “Over My Head” in implementing
the Web Server approach in my “Local Mode”.

The following figure is designed to illustrate my “Local Mode” approach so
that I can ask questions and the thread reader can comment [shoot holes] in
the system. The figure is base lined from a Derby tutorial
[http://db.apache.org/derby/papers/DerbyTut/ns_intro.html#ns_lookat_code]
that was recommended to me in the Apache-Derby-Developers mailing list.

http://old.nabble.com/file/p27148809/embedded_server.jpg 
Note: I received some very useful advice from that forum and was advised to
go to the Nabble “derby-user” mailing list for help on the feasibility of
having the Web Server embedded in my application.

In my reading, I came upon the Apache Portable Runtime [APR] and Tomcat
Native [http://tomcat.apache.org/tomcat-5.5-doc/apr.html] documentation and
thought that may be a good place to start the conversation.

Note that the Java stand alone application under development shall be
marketed for the Windows and the Mac OS X platforms. I would have to load
the matching Native Tomcat in my deployment process.

Question:
How can I produce a Java stand alone application that shall conditionally
spawn my end user’s default Internet Browser with “Report Generator” HTML
that use a Web Server and JSP?

-
Regards,
Jim...
-- 
View this message in context: 
http://old.nabble.com/Embedded-Web-Server-in-Java-stand-alone-application...-tp27148809p27148809.html
Sent from the Apache Derby Users mailing list archive at Nabble.com.



Streaming Results

2010-01-13 Thread Stian Brattland

Hi,

I've got a question regarding results streaming. The J/Connector for 
MySQL supports results streaming, which means
that you can stream and process rows in a ResultSet one by one. 
Normally, all rows in a ResultSet will be retrived
before you can process the ResultSet. However, i am curious as to wether 
this feature also exists in Derby?


In MySQL, you would do the following to stream results from the database 
as you iterate through a ResultSet:


stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY, 
java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);


Kind regards,
Stian Brattland