RE: [sqlite] temp_store assumptions
> > You are right - once you have the keywords it is a simple AND query. > The tricky part is once you've selected "vacation", what are the > keywords (the subset) that will be displayed to the user? That query > is: > > - select all photos that have the keyword 'vacation' > - return all (distinct) keywords of all those photos > > This query is performed often - it is what I'm trying to optimise. It > gets slightly more complex as you drill down. For example, say > "vacation" and "sports" are selected. The query is then: > > - select all photos that have the keywords 'vacation' and 'sports' > - return all (distinct) keywords of all those photos > ok, if I understand you correctly, I think I have a nice solution, which should be very fast, and neat. I am assuming the following schema: create table photo( id integer primary key, data varchar(20) ); create table keyword( id integer primary key, data varchar(20) ); create table pk( idPhoto integer, idKeyword integer); create index pk_1 on pk (idPhoto); (these indexes are crucial to fast operation) create index pk_2 on pk (idkeyword); >> >> I am assuming the following GUI: say we have a window that is split into two panes. The left hand pane contains a tree structure with keywords. The right hand pane is a grid that shows the selected photos. The semantics of the treeview as as follows: The first level of the tree contains a distinct list of all available keywords. each node is initially collapsed. when a node is selected, the right-hand pane fills with the photos that have this keyword (we are at the top level, still). The tricky part, which I did not understand from your initial description, is "what are the children of a keyword node". I am assuming that what you want is this: the children of the keyword are those keywords that exist in all photos that have the active keyword. in other words, if I select "vacation", and this gives me 10 photos, and these photos in turn have each 5 more keywords, I expect to see a list of max 50 keywords (minus any non-distinct). If this is what you are doing, it is very clever, because it assures that your user queries will always produce (some) results. If my assumptions are correct, then you have two different issues to handle : a. given a list of keywords, which photos contain all of them ? this is needed in order to populate the right hand pane, and I think that it is NOT a simple AND thing b. given a list of keywords (which is produced by traversing the tree from the root up to the selected node), which keywords should appear as children of this node? it turns out that there are very simple sql constructs that can give these results with a few statements, instead of a programmatic solution which would indeed fire a large number of small subqueries. The idea is to utilize the sql SET (asw in mathematical sets) operations, which is why SQL was created in the first place. Let us take issue (a) first: say that the user has selected 2 keywords, whose IDS your tree structure knows (it stores them as data behind each node). further, the keyword path (in ID form is :) 100/745 (100 is vacation 745 is sports etc etc) The following query populates your right hand side: select idPhoto,data from pk join photo on id=idPhoto where idKeyword=100 intersect select idPhoto,data from pk join photo on id=idPhoto where idKeyword=745 the trick is the intersect operator, which gives you the desired AND effect. The joins are done in order to avoid re-querying for the actual photo data. in a simpler,essential form, this would be : select idPhoto from pk where idKeyword=100 intersect select idPhoto from pk where idKeyword=745 you can see how this can be expanded for any number of keywords, and how to build this statement dynamically, based on the current active tree node. Having solved (a), the solution to (b) is similar, and in fact uses the same query, only as a sub-query: what you need here is to find the proper children keywords for keyword 745. so, you do this: select distinct idKeyword from pk where idPhoto in ( select idPhoto from pk where idKeyword=100 intersect select idPhoto from pk where idKeyword=745 ) notice that the subquery in this case is the same as the one that would have run for node path 100/745. This suggests that clever caching can be applied here. for example, you could have saved the query that you run for the right hand side (the results pane), and re-use it: create temp table x as select idPhoto,data from pk join photo on id=idPhoto where idKeyword=100 intersect select idPhoto,data from pk join photo on id=idPhoto where idKeyword=745 then, select * from x populates your RHS, and select distinct idKeyword from pk where idPhoto in (Select idPhoto from x) populates your tree sub-nodes and, in fact you only need to run the (b) query the first time a node is expanded, not all the time, assuming static data. so, to finalize, in order to service a GUI such as the one I
Re: [sqlite] temp_store assumptions
Mike, Thank you very much for your useful reply. the way to do this is to use a second db instance, open it to :MEMORY:, and copy the data (once) to this new db. then, you will have a RAM-based db, which should be fast. Yes, this is what I was hoping to do with the first method. Thanks - I'll give this a try. I am not sure I understand your GUI topology, though. dirlling down is a process of elimination, as I understand it, which means executing queries against progressively smaller result sets (temporary tables would do fine here), and should be very fast with proper indexes. once I select "vacation", I have a list of vacation photos. What is the user's next action ? do they select "sports" ? If so, is this meant to be "select photos with vacation AND sports keywords" ? if so, this is trivial and very fast. just do a join with the previous result set (saved in a temporary table). You are right - once you have the keywords it is a simple AND query. The tricky part is once you've selected "vacation", what are the keywords (the subset) that will be displayed to the user? That query is: - select all photos that have the keyword 'vacation' - return all (distinct) keywords of all those photos This query is performed often - it is what I'm trying to optimise. It gets slightly more complex as you drill down. For example, say "vacation" and "sports" are selected. The query is then: - select all photos that have the keywords 'vacation' and 'sports' - return all (distinct) keywords of all those photos and, are you sure this is a hierarchical data base ? it sounds like the typical many-to-many with split relation, which cannot be represented hirerarchically, since the data is *not* hierarchical in nature. No, the data is not inherently hierarchical at all. I'm kind of faking it by massaging it into something that looks hierarchical. I've had a very hard time communicating what I'm trying to accomplish, but I do have working code that I'm happy with. I just need to speed up my searches by an order of magnitude or two. I'm trying to do it in memory since I have not yet come up with a clever way to simplify the actual queries. Thanks again for your help! Demitri
Re: [sqlite] temp_store assumptions
> > - set the pragma "temp_store" to MEMORY > - CREATE TEMPORARY VIEW temp_table AS () > > I am assuming that temp_table is completely in memory, and any queries > against it will not go back to the disk. Since the table itself is > small, I am hoping that the overhead of reading the entire table into > RAM will be shadowed by the speed improvement of the largish number of > queries that don't have to go to the disk. Does this sound reasonable? > > -- I suspect not. creating a view does not actually generate any data it is only when you SELECT from that view that any data access happens. so your scheme is not caching into RAM at all. the way to do this is to use a second db instance, open it to :MEMORY:, and copy the data (once) to this new db. then, you will have a RAM-based db, which should be fast. I am not sure I understand your GUI topology, though. dirlling down is a process of elimination, as I understand it, which means executing queries against progressively smaller result sets (temporary tables would do fine here), and should be very fast with proper indexes. once I select "vacation", I have a list of vacation photos. What is the user's next action ? do they select "sports" ? If so, is this meant to be "select photos with vacation AND sports keywords" ? if so, this is trivial and very fast. just do a join with the previous result set (saved in a temporary table). and, are you sure this is a hierarchical data base ? it sounds like the typical many-to-many with split relation, which cannot be represented hirerarchically, since the data is *not* hierarchical in nature. it may well be that you have a pseudo-problem, created by a GUI that does not match the nature of the data.
[sqlite] temp_store assumptions
Hello, I have a large number of queries (maybe up to a few hundred at a time) that I am making against a single table. When performing these queries, the table that I query does not change. The table itself is small (one of the rows is a foreign key). I am trying to get better performance out of this process; I had an idea and want to see if all of my assumptions are correct before I code it (rather than come up with another method). - set the pragma "temp_store" to MEMORY - CREATE TEMPORARY VIEW temp_table AS () I am assuming that temp_table is completely in memory, and any queries against it will not go back to the disk. Since the table itself is small, I am hoping that the overhead of reading the entire table into RAM will be shadowed by the speed improvement of the largish number of queries that don't have to go to the disk. Does this sound reasonable? -- In anticipation of someone asking what I am actually doing to see if there is a better way to approach my problem, I'll elaborate here, but this is extra credit. :) Imagine a program that catalogs photographs and assigns keywords to them (think iPhoto). A simple way to describe this is to have three tables: photo, photo_to_keyword, and keyword. Simple. Now, if you think about it this information can be displayed in a hierarchical manner. Imagine the list of all distinct keywords. Select "Vacation" - this will filter the full list of photos to ones that have that keyword. Of all the photos that have the keyword "Vacation", some will be "Bermuda", some will be "Podunk, Nebraska", etc. The UI will display that subset, i.e. the list of all keywords of all photos with "Vacation" as a keyword. Selecting the next will drill down further. The problem that I am having is "converting" the three relational tables into a hierarchical data structure. I can do it, but the it requires some fairly complex queries, and a large number of them. The result works, but it's horribly slow. Since searches are being done against "keyword"s (strings), the query essentially has to convert the keyword string into a row id, go to the join table, search for all photos with that keyword, get the list of all keyword row ids of those photos, and convert that list to strings for display. This search has to be done not only once for each keyword, but again for the second level since I will need to know if any of them "branches" to a second level. Hence the incredibly slow results. So, in the case above the temporary table I was hoping to make would be: CREATE TEMPORARY VIEW photo_to_keyword_temp AS SELECT ptk.photo_id, k.label FROM photo_to_keyword ptk, keyword k WHERE ptk.keyword_id = k.id This would at least save the trip from and back to the keyword table. I would appreciate any suggestions or tips!! Cheers, Demitri Muna