[sqlite] Running a select on multiple servers.

2005-08-22 Thread Ligesh
 Hi,

  Currently sqlite code reads the data from the disk, sorts them and returns 
the data. I need to do the following, and would like to know how much work will 
it involve.

  Instead of reading from disk, I need to query say 100 servers simultaneously 
in parallel (using threads) a single select command. That is, run this query: 
'select * from table where parent_name = 'name' order by name limit 10', on 100 
machines. (that will return 1000 rows). Now re-sort them, dump the 990 rows, 
and return the first 10. There is no need to do any synchronized writing or 
anything, so this is almost like a database client that does re-sorting on its 
own.

   How much effort will it take to create such a system starting from the 
current sqlite code base? How much will it cost me if I was to hire someone to 
do this? I don't know why such a system doesn't exist now. I have been 
searching the net, but couldn't find any resources on such a system. I am not a 
database expert, so I am not sure whether what I want can be accomplished 
trivially using some other straightforward SQL method.

   This constitutes a very high end distributed database system. You basically 
need to only do search on multiple databases. Once you get the row, the 
modification can be done to that specific machine very easily by the higher end 
code, but the multiple database search has to be handled at the lower end in C.

   Thanks in advance.


--
:: Ligesh :: http://ligesh.com 



[sqlite] Re: sqite2.8 performance snag when filesize growsmore than 70MB on a 256MB ram

2005-08-21 Thread Ligesh
On Mon, Aug 22, 2005 at 08:44:07AM +0800, Damian Slee wrote:
> >>In order to implement the ORDER BY clause, SQLite reads the entire result 
> >>set into memory and sorts it there.  When your result set gets very large 
> >>(13000 rows) and each row uses in excess of 1KB or memory,this is 
> >>apparently causing your machine to thrash.
> 
> 
> If an index exists on the same column that Order By is used, does this still 
> result in reading the entire result set?

 Yes. It seems you have to trust the database, and avoid the 'order by' when 
you want the result to be sorted according the primary key.

 Actually I think it is a small bottleneck which should be solved, and it will 
really make sqlite the best database solution. I had switch to mysql (i am 
checking pgsql too), and it seems to be working fine. The problem seems to be 
that the whole row is loaded, which is not acceptable since there are even 
scenarios where you store entire images into database. The row size should not 
ever be an issue when dealing with the table.

 Thanks.



[sqlite] Re: sqite2.8 performance snag when filesize grows more than 70MB on a 256MB ram

2005-08-20 Thread Ligesh
On Sat, Aug 20, 2005 at 06:36:19PM -0400, D. Richard Hipp wrote:
> On Sun, 2005-08-21 at 03:21 +0530, Ligesh wrote:
> >  I am running a very simple sql query with only one instance. The
> > query is something like:
> > 
> >  select * from table where parent_name = 'parent' order by name limit
> > 10
> > 
> >  The query takes around .3 seconds when the total number of matches
> > are around 7,000, but just jumps to 11 seconds, at 13,000 matches, and
> > there is some heavy swap activity too at this point, and the whole
> > machine's responsiveness goes down once this query is run.
> > 
> >  The table is indexed with parent_name, so I think it shouldn't even
> > have to load the whole table into the memory. The row size isn't huge
> > too with only around 1KB per row.
> > 
> 
> In order to implement the ORDER BY clause, SQLite reads the
> entire result set into memory and sorts it there.  When your
> result set gets very large (13000 rows) and each row uses in
> excess of 1KB or memory, this is apparently causing your
> machine to thrash.
> -- 

 As an aside, I had thought that the row size would not be much relevant to 
database speed, especially the size of those columns that are not used in 
either 'where' or in sorting. So is this true for a standard sql database? Or 
should I start moving the serialized columns to another table?

 
 The logic would be to store the serialized values in a separate table. When 
the main rows are loaded, tranparently fill the serialized values at the 
database driver level itself. So it can be done without touching the top level 
logic, but by just adding more intelligence to my database layer.

 Thanks.

--
:: Ligesh :: http://ligesh.com 




[sqlite] Re: sqite2.8 performance snag when filesize grows more than 70MB on a 256MB ram

2005-08-20 Thread Ligesh
On Sat, Aug 20, 2005 at 06:36:19PM -0400, D. Richard Hipp wrote:
> On Sun, 2005-08-21 at 03:21 +0530, Ligesh wrote:
> >  I am running a very simple sql query with only one instance. The
> > query is something like:
> > 
> > 
> 
> In order to implement the ORDER BY clause, SQLite reads the
> entire result set into memory and sorts it there.  When your
> result set gets very large (13000 rows) and each row uses in
> excess of 1KB or memory, this is apparently causing your
> machine to thrash.
> -- 

 Thanks, so I would like to know if all sql databases do this. I am not a 
database expert per say, and that is one of the reasons why the row size is a 
bit large. Two of the columns are kept serialized, (they are unserialized only 
at the exact point when they are needed, so as such they don't introduce any 
overhead other than increasing the row size.

 Does sqlite 3 do it in the proper way. That is, just load the the sort column 
+ a pointer-to-the-row to the ram. Sort it there, and return the result?

  Actually the database structure is a full hierarchical tree, and this is the 
ONLY query I execute. That is, just get the children of a particular parent.

 Thanks a lot.

 - Ligesh





[sqlite] Re: sqite2.8 performance snag when filesize grows more than 70MB on a 256MB ram

2005-08-20 Thread Ligesh
On Sun, Aug 21, 2005 at 12:30:25AM +0200, Ulrik Petersen wrote:
> Ligesh,
> 
> Ligesh wrote:
> 
> >I am running a very simple sql query with only one instance. The query is 
> >something like:
> >
> >select * from table where parent_name = 'parent' order by name limit 10
> > 
> >
> 
> My guess (and it is only a guess) is that, because of the ORDER BY 
> clause, SQLite is loading all rows into memory, then sorting them 
> in-memory.
> 

 Yup your guess is right, as Richard has already explained.

 I am running on linux, and I will need this to scale to at least 200,000 rows 
(Ram will be higher naturally, but it should scale logarithmically). As usual, 
I expect the database to do the dirty work and return to me only the 10-20 rows 
that will be shown to the user. My processing is very inefficient, since I am 
abstracting out the database almost entirely. The database rows are 
automatically converted to classes which have the same name as the table, and 
some of the columns are kept serialized. These are automatically unserialized 
when this particular variable is requested. (i am doing this via the __get 
overloading ... All in all, a very complex and inefficient setup at my end).

 But the overheads on my end are not supposed to be an issue since I will only 
be workin on max 20 objects at a one particular time, so I just went for 
transparent handling of database rather than efficiency.

 I guess, I will have to go for another database. Since the database code is 
only around 30 lines, out of 80,000 lines of total code, (though the 
application is 100% database based one) I think that is the easiest method here.

 Thanks.




[sqlite] Re: sqite2.8 performance snag when filesize grows more than 70MB on a 256MB ram

2005-08-20 Thread Ligesh

 I am running a very simple sql query with only one instance. The query is 
something like:

 select * from table where parent_name = 'parent' order by name limit 10

 The query takes around .3 seconds when the total number of matches are around 
7,000, but just jumps to 11 seconds, at 13,000 matches, and there is some heavy 
swap activity too at this point, and the whole machine's responsiveness goes 
down once this query is run.

 The table is indexed with parent_name, so I think it shouldn't even have to 
load the whole table into the memory. The row size isn't huge too with only 
around 1KB per row.

 Is this some known problem? Can anyone shed some light on this?

 Thanks in advance.


- Ligesh


[sqlite] The total when using LIMIT clause

2005-08-16 Thread Ligesh

 Hi,

  How do we know the total number of results, when we are using LIMIT? is there 
any parameter that returns the real number of matches?

  Thanks in advance.

--
:: Ligesh