Hi Karel,
Keep in mind that each time you modify a list and put it in memcache
the whole list is serialized which is why you see it is expensive.
There is an efficient approach to merging queries that does not need
memcahe that Bret Slatkin called the "zig zag" method:
http://www.scribd.com/doc/16952419/Building-scalable-complex-apps-on-App-Engine
It does not require the results to be in memory either so will work
for large datasets. You just need to make sure all the queries are
sorted by the same property e.g. __key__
JD
On 2 Mar 2010, at 23:17, Karel Alvarez wrote:
Hi
Some time ago, I asked how to use multiple contains in a query, and
I got some responses, that was great and I thank everybody for they
help.
I am posting my findings and advance in hope it might be useful for
somebody trying to do the same.
I am trying to build a database with real state listings in my area,
and build some searches on it, the search is likely to have many
fields, and several of the fields the user can select multiple
values. I chose to handle the whole entity relationship myself
instead of using the fancy features of GAE, I had my reasons for it,
but frustration with GAE it is a part of it, regardless of that, the
approach to searching might still apply if you chose to use
relationships from GAE.
I read somewhere in the docs that when you use contains in a query,
it internally it executes an equal sub-query for each of the values
in the list, (somebody care to confirm that?) so if you have several
fields with contains you might bump into the 30 sub-query constraint
pretty fast.
So I choose to:
-execute the search by each one of the fields, and each ones of the
selected values sequentially, get only the ids, each one of this
should hit only one index, and be fast.
-add the results from each result to a memcache instance, using
increment, collect the ids in a list for later (there is no way to
get all the keys in the cache,that I found)
-collect the counts for each id in the list I got, and for each
check the count, if the count is equal to the number of queries, it
means that entity returned true for each of the queries and its an
entity that I want to return, i collect all the ides that are good
results, and go to the datastore to collect the full entities to
return.
This process is expensive, and I still got to try it out with a a
big set, but executes sufficiently fast for my test set, of course I
cache the result until the user changes the search criteria (or
expires).
Here is the code for the search method:
private List<IndexEntry> buildResultsFor(SearchCriteria sc) {
List<IndexEntry> result = new ArrayList<IndexEntry>();
// Price parsing
float minPrice = -1;
float maxPrice = -1;
if (sc.getMinPrice().length() > 0) {
minPrice = Float.parseFloat(sc.getMinPrice());
}
if (sc.getMaxPrice().length() > 0) {
maxPrice = Float.parseFloat(sc.getMaxPrice());
}
// Listing Status parsing
Long[] statusIds = null;
String[] statusNames = sc.getStatus();
if (statusNames != null && statusNames.length > 0) {
statusIds = getListingStatusIdsByNames(statusNames);
}
// House Types parsing
Long[] houseTypesIds = null;
String[] houseTypeNames = sc.getHouseType();
if (houseTypeNames != null && houseTypeNames.length > 0) {
houseTypesIds = getHouseTypeIdsByNames(houseTypeNames);
}
// THE search
MemcacheService cache =
MemcacheServiceFactory.getMemcacheService();
Set<String> allIds = new HashSet<String>();
int condCount = 0;
Map<Object, Long> lastResults = null;
Long one = new Long(1);
// by price
if (minPrice > 0 || maxPrice > 0) {
condCount++;
List<String> ids = indexService.getByPriceRange(minPrice,
maxPrice);
allIds.addAll(ids);
lastResults = addToChache(cache, one, ids);
}
// by status
if (statusIds != null) {
condCount++;
for (int i = 0; i < statusIds.length; i++) {
List<String> listingByStatus =
indexService.getByListingStatus(statusIds[i]);
allIds.addAll(listingByStatus);
lastResults = addToChache(cache, one,
listingByStatus);
}
}
// by house type
if (houseTypesIds != null) {
condCount++;
for (int i = 0; i < houseTypesIds.length; i++) {
List<String> listingByHT =
indexService.getByHouseType(houseTypesIds[i]);
allIds.addAll(listingByHT);
lastResults = addToChache(cache, one,
listingByHT);
}
}
// by Zip Code
String[] zipCodes = parseZipCode(sc.getZipCode());
if (zipCodes != null && zipCodes.length > 0) {
condCount++;
for (int i = 0; i < zipCodes.length; i++) {
List<String> listingByZ =
indexService.getByZipCode(zipCodes[i]);
allIds.addAll(listingByZ);
lastResults = addToChache(cache, one,
listingByZ);
}
}
if (lastResults != null) {
Map<Object, Object> counters =
cache.getAll(Arrays.asList(allIds.toArray()));
List<String> ids = new ArrayList<String>();
for (Object listingNumber : counters.keySet()) {
String sCount = (String)
counters.get(listingNumber);
long count = Long.parseLong(sCount);
if (count > condCount) {
ids.add(listingNumber.toString());
if (ids.size()>500){
break;
}
}
}
cache.clearAll();
if (ids.size() > 0) {
result = indexService.getEntriesOn(ids);
}
}
return result;
}
hope it helps somebody
thanks
Karel
--
You received this message because you are subscribed to the Google
Groups "Google App Engine for Java" group.
To post to this group, send email to google-appengine-java@googlegroups.com
.
To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com
.
For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en
.
--
You received this message because you are subscribed to the Google Groups "Google
App Engine for Java" group.
To post to this group, send email to google-appengine-j...@googlegroups.com.
To unsubscribe from this group, send email to
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/google-appengine-java?hl=en.