[sqlite] Running a select on multiple servers.
Hi, Currently sqlite code reads the data from the disk, sorts them and returns the data. I need to do the following, and would like to know how much work will it involve. Instead of reading from disk, I need to query say 100 servers simultaneously in parallel (using threads) a single select command. That is, run this query: 'select * from table where parent_name = 'name' order by name limit 10', on 100 machines. (that will return 1000 rows). Now re-sort them, dump the 990 rows, and return the first 10. There is no need to do any synchronized writing or anything, so this is almost like a database client that does re-sorting on its own. How much effort will it take to create such a system starting from the current sqlite code base? How much will it cost me if I was to hire someone to do this? I don't know why such a system doesn't exist now. I have been searching the net, but couldn't find any resources on such a system. I am not a database expert, so I am not sure whether what I want can be accomplished trivially using some other straightforward SQL method. This constitutes a very high end distributed database system. You basically need to only do search on multiple databases. Once you get the row, the modification can be done to that specific machine very easily by the higher end code, but the multiple database search has to be handled at the lower end in C. Thanks in advance. -- :: Ligesh :: http://ligesh.com
[sqlite] Re: sqite2.8 performance snag when filesize growsmore than 70MB on a 256MB ram
On Mon, Aug 22, 2005 at 08:44:07AM +0800, Damian Slee wrote: > >>In order to implement the ORDER BY clause, SQLite reads the entire result > >>set into memory and sorts it there. When your result set gets very large > >>(13000 rows) and each row uses in excess of 1KB or memory,this is > >>apparently causing your machine to thrash. > > > If an index exists on the same column that Order By is used, does this still > result in reading the entire result set? Yes. It seems you have to trust the database, and avoid the 'order by' when you want the result to be sorted according the primary key. Actually I think it is a small bottleneck which should be solved, and it will really make sqlite the best database solution. I had switch to mysql (i am checking pgsql too), and it seems to be working fine. The problem seems to be that the whole row is loaded, which is not acceptable since there are even scenarios where you store entire images into database. The row size should not ever be an issue when dealing with the table. Thanks.
[sqlite] Re: sqite2.8 performance snag when filesize grows more than 70MB on a 256MB ram
On Sat, Aug 20, 2005 at 06:36:19PM -0400, D. Richard Hipp wrote: > On Sun, 2005-08-21 at 03:21 +0530, Ligesh wrote: > > I am running a very simple sql query with only one instance. The > > query is something like: > > > > select * from table where parent_name = 'parent' order by name limit > > 10 > > > > The query takes around .3 seconds when the total number of matches > > are around 7,000, but just jumps to 11 seconds, at 13,000 matches, and > > there is some heavy swap activity too at this point, and the whole > > machine's responsiveness goes down once this query is run. > > > > The table is indexed with parent_name, so I think it shouldn't even > > have to load the whole table into the memory. The row size isn't huge > > too with only around 1KB per row. > > > > In order to implement the ORDER BY clause, SQLite reads the > entire result set into memory and sorts it there. When your > result set gets very large (13000 rows) and each row uses in > excess of 1KB or memory, this is apparently causing your > machine to thrash. > -- As an aside, I had thought that the row size would not be much relevant to database speed, especially the size of those columns that are not used in either 'where' or in sorting. So is this true for a standard sql database? Or should I start moving the serialized columns to another table? The logic would be to store the serialized values in a separate table. When the main rows are loaded, tranparently fill the serialized values at the database driver level itself. So it can be done without touching the top level logic, but by just adding more intelligence to my database layer. Thanks. -- :: Ligesh :: http://ligesh.com
[sqlite] Re: sqite2.8 performance snag when filesize grows more than 70MB on a 256MB ram
On Sat, Aug 20, 2005 at 06:36:19PM -0400, D. Richard Hipp wrote: > On Sun, 2005-08-21 at 03:21 +0530, Ligesh wrote: > > I am running a very simple sql query with only one instance. The > > query is something like: > > > > > > In order to implement the ORDER BY clause, SQLite reads the > entire result set into memory and sorts it there. When your > result set gets very large (13000 rows) and each row uses in > excess of 1KB or memory, this is apparently causing your > machine to thrash. > -- Thanks, so I would like to know if all sql databases do this. I am not a database expert per say, and that is one of the reasons why the row size is a bit large. Two of the columns are kept serialized, (they are unserialized only at the exact point when they are needed, so as such they don't introduce any overhead other than increasing the row size. Does sqlite 3 do it in the proper way. That is, just load the the sort column + a pointer-to-the-row to the ram. Sort it there, and return the result? Actually the database structure is a full hierarchical tree, and this is the ONLY query I execute. That is, just get the children of a particular parent. Thanks a lot. - Ligesh
[sqlite] Re: sqite2.8 performance snag when filesize grows more than 70MB on a 256MB ram
On Sun, Aug 21, 2005 at 12:30:25AM +0200, Ulrik Petersen wrote: > Ligesh, > > Ligesh wrote: > > >I am running a very simple sql query with only one instance. The query is > >something like: > > > >select * from table where parent_name = 'parent' order by name limit 10 > > > > > > My guess (and it is only a guess) is that, because of the ORDER BY > clause, SQLite is loading all rows into memory, then sorting them > in-memory. > Yup your guess is right, as Richard has already explained. I am running on linux, and I will need this to scale to at least 200,000 rows (Ram will be higher naturally, but it should scale logarithmically). As usual, I expect the database to do the dirty work and return to me only the 10-20 rows that will be shown to the user. My processing is very inefficient, since I am abstracting out the database almost entirely. The database rows are automatically converted to classes which have the same name as the table, and some of the columns are kept serialized. These are automatically unserialized when this particular variable is requested. (i am doing this via the __get overloading ... All in all, a very complex and inefficient setup at my end). But the overheads on my end are not supposed to be an issue since I will only be workin on max 20 objects at a one particular time, so I just went for transparent handling of database rather than efficiency. I guess, I will have to go for another database. Since the database code is only around 30 lines, out of 80,000 lines of total code, (though the application is 100% database based one) I think that is the easiest method here. Thanks.
[sqlite] Re: sqite2.8 performance snag when filesize grows more than 70MB on a 256MB ram
I am running a very simple sql query with only one instance. The query is something like: select * from table where parent_name = 'parent' order by name limit 10 The query takes around .3 seconds when the total number of matches are around 7,000, but just jumps to 11 seconds, at 13,000 matches, and there is some heavy swap activity too at this point, and the whole machine's responsiveness goes down once this query is run. The table is indexed with parent_name, so I think it shouldn't even have to load the whole table into the memory. The row size isn't huge too with only around 1KB per row. Is this some known problem? Can anyone shed some light on this? Thanks in advance. - Ligesh
[sqlite] The total when using LIMIT clause
Hi, How do we know the total number of results, when we are using LIMIT? is there any parameter that returns the real number of matches? Thanks in advance. -- :: Ligesh