large amounts of data and cgi

2008-01-15 Thread Bernard Devlin
Part of a web application I'm working on requires a large number of
queries joining several tables, and each table containing millions of
rows.  Accessing the database  (Frontbase) using either the JDBC
drivers or the Rev drivers provided by Frontbase, resulted in single
queries that would take anywhere between 300-900ms.  As there could be
hundreds of such queries in order to produce one response, I had to
look at ways to speed this up.  I don't believe that it is a question
of writing better SQL - the queries have to spider across the data,
and keep a track of the route they take to get to each node in the
graph.

I tuned the database, making sure that all the 'where' columns were
indexed, and checked the query plan to ensure that it was precisely
those indexes being used in the queries.  I gave the database 100mb of
RAM cache, and even tried loading the core tables 100% into RAM.  But
still I could not really get much better performance.  From my
testing, the data did not seem to be found in RAM until after the
first time it was retrieved from disk, despite my setting the tables
to pre-load in RAM.  Perhaps over time RAM-cached data would have
shown a much better performance than at this stage.

I decided to try HSQLDB - a rdbms written in Java, and which can
contain all its tables in RAM, so I could be sure that the data was in
RAM.  After exporting the core tables to CSV format (and transforming
them to  SQL Inserts using Rev), it took about 1 hour to start the
database, and when started the database took 1.5gb of RAM.  But the
queries were still slow (my guess is that HSQLDB is just not designed
to take that much data).

So, I took the CSV data and imported it into a stack, turning each
table into a custom property set.  I could now search the data in a
tiny fraction of a second -- literally.  The hundreds of queries
involved in making a single response are completed in 200ms.  That's
right -- I can now run the entire series of queries in less time than
it took to do a single query accessing a rdbms.

Obviously, custom properties being used in this way are not
appropriate if one needs referential integrity, locking, concurrent
access, etc.  But for a read-only query, Rev has provided a very
satisfactory solution.  Moreover, whilst HSQLDB would take an hour and
1.5gb of RAM, Rev could load the 100mb of CSV data in 2-4 secs, making
the frequent updating of data and subsequent restarting of the
application feasible.

After following Pierre Sahores' tutorial on creating persistent Rev
CGI processes, I now have these queries returning a web page in 200ms,
instead of the 30 secs they would have taken using Java and a
database.  I know all of this flies in the face of many people's
expectations, and it is certainly not the position I saw myself being
in 2 weeks ago.  The rest of the application is written in Java, but
this core feature has been made possible using Rev.  I  expect that
associative arrays in e.g. PHP might also be as fast in this regard.

Bernard
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: large amounts of data and cgi

2008-01-15 Thread Andre Garzia
Bernard,

have you tried valentina? It is very fast!!! :-D

cheers
andre

On 1/15/08, Bernard Devlin [EMAIL PROTECTED] wrote:
 Part of a web application I'm working on requires a large number of
 queries joining several tables, and each table containing millions of
 rows.  Accessing the database  (Frontbase) using either the JDBC
 drivers or the Rev drivers provided by Frontbase, resulted in single
 queries that would take anywhere between 300-900ms.  As there could be
 hundreds of such queries in order to produce one response, I had to
 look at ways to speed this up.  I don't believe that it is a question
 of writing better SQL - the queries have to spider across the data,
 and keep a track of the route they take to get to each node in the
 graph.

 I tuned the database, making sure that all the 'where' columns were
 indexed, and checked the query plan to ensure that it was precisely
 those indexes being used in the queries.  I gave the database 100mb of
 RAM cache, and even tried loading the core tables 100% into RAM.  But
 still I could not really get much better performance.  From my
 testing, the data did not seem to be found in RAM until after the
 first time it was retrieved from disk, despite my setting the tables
 to pre-load in RAM.  Perhaps over time RAM-cached data would have
 shown a much better performance than at this stage.

 I decided to try HSQLDB - a rdbms written in Java, and which can
 contain all its tables in RAM, so I could be sure that the data was in
 RAM.  After exporting the core tables to CSV format (and transforming
 them to  SQL Inserts using Rev), it took about 1 hour to start the
 database, and when started the database took 1.5gb of RAM.  But the
 queries were still slow (my guess is that HSQLDB is just not designed
 to take that much data).

 So, I took the CSV data and imported it into a stack, turning each
 table into a custom property set.  I could now search the data in a
 tiny fraction of a second -- literally.  The hundreds of queries
 involved in making a single response are completed in 200ms.  That's
 right -- I can now run the entire series of queries in less time than
 it took to do a single query accessing a rdbms.

 Obviously, custom properties being used in this way are not
 appropriate if one needs referential integrity, locking, concurrent
 access, etc.  But for a read-only query, Rev has provided a very
 satisfactory solution.  Moreover, whilst HSQLDB would take an hour and
 1.5gb of RAM, Rev could load the 100mb of CSV data in 2-4 secs, making
 the frequent updating of data and subsequent restarting of the
 application feasible.

 After following Pierre Sahores' tutorial on creating persistent Rev
 CGI processes, I now have these queries returning a web page in 200ms,
 instead of the 30 secs they would have taken using Java and a
 database.  I know all of this flies in the face of many people's
 expectations, and it is certainly not the position I saw myself being
 in 2 weeks ago.  The rest of the application is written in Java, but
 this core feature has been made possible using Rev.  I  expect that
 associative arrays in e.g. PHP might also be as fast in this regard.

 Bernard
 ___
 use-revolution mailing list
 use-revolution@lists.runrev.com
 Please visit this url to subscribe, unsubscribe and manage your subscription 
 preferences:
 http://lists.runrev.com/mailman/listinfo/use-revolution



-- 
http://www.andregarzia.com All We Do Is Code.
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution