Improving data insert performance
Hello, I have an embedded database application that generally involves inserting somewhere between 50k and 1000k rows of data into a database, and then analyzing and querying that data afterwards. The data goes into about ten tables, but the bulk of the data is in just a few of them. I run my database with test durability, and I add all of the primary key and foreign key constraints after the data is inserted into the database. Currently all of the data is inserted using prepared statements executed in batches, and this gives me between 10 and 20 row inserts per millisecond on average. I have spent quite a bit of time optimizing the insert step, and while I was doing research I came across this discussion from last year: http://www.mail-archive.com/derby-user@db.apache.org/msg10194.html The discussion suggests using bulk import as a way to speed up this initial insert step. Unfortunately, I cannot use the built in import functions, as my data includes Timestamps with nanosecond granularity. As far as I can tell, there is no way to convince derby to parse a time specified down to the nanosecond. In one of the emails, someone suggested that you can get bulk import performance by using a table function, and then running a query like INSERT INTO MYTABLE (...) SELECT S.* FROM TABLE (MYFUNC ()) S. In my tests, however, this doesn't seem to perform the insert any faster than simply inserting the rows one at a time with a prepared statement. I think this may be because I don't have a way to set the 'insertMode=bulkImport' property, which the bulk import system procedure is allowed to do. Does anyone know of a way to work around this, or of a better way to get my data into the database as quickly as possible? Thanks in advance for your time. Cheers, Nathan Boy
Re: Improving data insert performance
using prepared statements executed in batches, and this gives me between 10 and 20 row inserts per millisecond on average. I have Wow! You're already going very fast, and you've already put a lot of good effort into this, so I congratulate you on how far you've gone already. A couple of thoughts occurred to me as I read your post: - have you tried using in-memory databases rather than test duration? - have you experimented with using a larger page size to minimize per-page overhead? - have you experimented with giving a larger-than-default page cache? Also, have you done any system-level analysis of your application during the insert: - is your system CPU-bound? is there any disk io occurring at all? - how much garbage are you generating, what is the GC profile like? - what rev of the JDK are you running, and on what operating system? - is your system multi-core? Can you throw threads/cores at this problem? - can you get any sort of profiling data about where the hotspots are during your run? Good luck with your performance work, it sounds extremely interesting! thanks, bryan
Re: Improving data insert performance
Also try using larger log files (10+MB) and if you rarely change data you load (which I guess is your case) you may want to use pageReservedSpace=0 and large pageSize (32kb?) On Wed, Jan 13, 2010 at 5:01 PM, Nathan Boy nathan@gmail.com wrote: Hello, I have an embedded database application that generally involves inserting somewhere between 50k and 1000k rows of data into a database, and then analyzing and querying that data afterwards. The data goes into about ten tables, but the bulk of the data is in just a few of them. I run my database with test durability, and I add all of the primary key and foreign key constraints after the data is inserted into the database. Currently all of the data is inserted using prepared statements executed in batches, and this gives me between 10 and 20 row inserts per millisecond on average. I have spent quite a bit of time optimizing the insert step, and while I was doing research I came across this discussion from last year: http://www.mail-archive.com/derby-user@db.apache.org/msg10194.html The discussion suggests using bulk import as a way to speed up this initial insert step. Unfortunately, I cannot use the built in import functions, as my data includes Timestamps with nanosecond granularity. As far as I can tell, there is no way to convince derby to parse a time specified down to the nanosecond. In one of the emails, someone suggested that you can get bulk import performance by using a table function, and then running a query like INSERT INTO MYTABLE (...) SELECT S.* FROM TABLE (MYFUNC ()) S. In my tests, however, this doesn't seem to perform the insert any faster than simply inserting the rows one at a time with a prepared statement. I think this may be because I don't have a way to set the 'insertMode=bulkImport' property, which the bulk import system procedure is allowed to do. Does anyone know of a way to work around this, or of a better way to get my data into the database as quickly as possible? Thanks in advance for your time. Cheers, Nathan Boy
Embedded Web Server in Java stand alone application...
I have a Java stand alone application that creates a Derby Database using the embedded derby connect. It work fine in creating, viewing and editing the Derby tables. I have a requirement to generate reports as a function of the constructed Derby file. In my infinite wisdom I decided to build the application reports using my application host default Internet Browser, i.e. the “Report Generator” is HTM driven. This decision is based on my requirement to generate reports in the following modes: 1) Local Mode using the Derby Database residing on the HDD. I’ll invoke the Internet Browser from my stand alone app. 2) WWW Mode using the Derby Database residing on my Server. I’ll use JavaServer Pages [JSP] to construct the various reports. The “WWW mode” is doable, standard processing. The reports shall be created as HTML templates with custom Tags where the Derby parameters shall be inserted. In the “Local Mode”, ideally I would like to use the same JSP / Servlet method employed in the “WWW mode”. An alternative [workaround] approach would be to still use the form template with custom tags but do the processing independent of a Web Server, JSP container. Some Java methods could be used in both the alternative approach and the JSP / Servlet “WWW Mode” phase. The purpose of this thread is to determine if it is possible to use a Web Server in the “Local Mode”? Another objective is to get an idea if this type of stand alone application / Web Server integration has been accomplished. I think I know how to do the “alternative implementation” and needed to understand if I’m getting into development “Over My Head” in implementing the Web Server approach in my “Local Mode”. The following figure is designed to illustrate my “Local Mode” approach so that I can ask questions and the thread reader can comment [shoot holes] in the system. The figure is base lined from a Derby tutorial [http://db.apache.org/derby/papers/DerbyTut/ns_intro.html#ns_lookat_code] that was recommended to me in the Apache-Derby-Developers mailing list. http://old.nabble.com/file/p27148809/embedded_server.jpg Note: I received some very useful advice from that forum and was advised to go to the Nabble “derby-user” mailing list for help on the feasibility of having the Web Server embedded in my application. In my reading, I came upon the Apache Portable Runtime [APR] and Tomcat Native [http://tomcat.apache.org/tomcat-5.5-doc/apr.html] documentation and thought that may be a good place to start the conversation. Note that the Java stand alone application under development shall be marketed for the Windows and the Mac OS X platforms. I would have to load the matching Native Tomcat in my deployment process. Question: How can I produce a Java stand alone application that shall conditionally spawn my end user’s default Internet Browser with “Report Generator” HTML that use a Web Server and JSP? - Regards, Jim... -- View this message in context: http://old.nabble.com/Embedded-Web-Server-in-Java-stand-alone-application...-tp27148809p27148809.html Sent from the Apache Derby Users mailing list archive at Nabble.com.
Streaming Results
Hi, I've got a question regarding results streaming. The J/Connector for MySQL supports results streaming, which means that you can stream and process rows in a ResultSet one by one. Normally, all rows in a ResultSet will be retrived before you can process the ResultSet. However, i am curious as to wether this feature also exists in Derby? In MySQL, you would do the following to stream results from the database as you iterate through a ResultSet: stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY, java.sql.ResultSet.CONCUR_READ_ONLY); stmt.setFetchSize(Integer.MIN_VALUE); Kind regards, Stian Brattland