Why not?  My two biggest projects have 180k and 90k friends.

 

From: google-appengine@googlegroups.com
[mailto:google-appengine@googlegroups.com] On Behalf Of Nick Johnson
(Google)
Sent: Wednesday, April 27, 2011 7:40 PM
To: google-appengine@googlegroups.com
Subject: Re: [google-appengine] Appropriate way to save hundreds of
thousands of ids per user

 

Hi David,

 

Can you elaborate on your exact use-case? You mentioned twitter friends, but
I'm fairly sure no users have 200,000 friends on Twitter.

 

-Nick Johnson

On Mon, Apr 25, 2011 at 2:54 PM, David Parks <davidpark...@yahoo.com> wrote:

I did indeed mean pulling back a result set of say 200,000 rows. If I'm
following the conversation correctly then what you described was storing all
IDs, querying that one field and de-serializing all IDs into an array that
you can then search for the ID's you need.

 

I like that idea. But I certainly can't tell you if the overhead of reading
all values, and deserializing them will be better or worse than the overhead
of scrolling through a large result set and loading the database with
hundreds of millions of rows. Of all databases you could be using, googles
big table is certainly well designed for large data sets.

 

It seems that your proposed method makes great sense when you need the
entire result set (or close to it) for one or more users. But when you only
need 100 results of 150,000, then the deserialization process is going to
constitute a measurable overhead. Also, I can't say for sure how the google
datastore will  perform when you commit hundreds of millions of rows to it.
Of course, if small queries like are rare, then maybe it's not so important
to consider them.

 

Anyway, I guess you could write, in perhaps a day or less, a very simple
test case that populate the datastore with both scenarios and profile them.

 

Doing the profiling work will probably give you some very useful insight and
experience on how things will really perform in reality. 

 

I'd also suggest that you encapsulate this functionality so that you can
easily replace one strategy with another without changing code unrelated to
the data store (e.g. design your code using proper data access objects to
keep this code separate from the rest of your code, and code to interfaces
up front). 

 

 

 

From: google-appengine@googlegroups.com
[mailto:google-appengine@googlegroups.com] On Behalf Of Nischal Shetty
Sent: Monday, April 25, 2011 10:34 AM


To: google-appengine@googlegroups.com
Subject: Re: [google-appengine] Appropriate way to save hundreds of
thousands of ids per user

 

@David

 

Querying the whole group would mean having 200,000 results for few of my
users. Pulling all that and then searching, wouldn't that be inefficient? or
are you talking about sharded ListProperty here?

 

 

 

On 25 April 2011 05:41, David Parks <davidpark...@yahoo.com> wrote:

That seems like a reasonable approach. But I think you should do both tests.
1) let google do the work and store a lot of records, 2) query the whole
group and parse it into an array and search the array. It wouldn't be too
hard to created a simple test case that populates the data for whatever # of
users you need to plan for and profile the lookup and storage speeds of
both.

 

I'd love to know your results if you do test both approaches.

 

 

From: google-appengine@googlegroups.com
[mailto:google-appengine@googlegroups.com] On Behalf Of Nischal Shetty
Sent: Friday, April 22, 2011 3:10 PM


To: google-appengine@googlegroups.com

Subject: Re: [google-appengine] Appropriate way to save hundreds of
thousands of ids per user

 

@David

 

Thanks for the input. Every reply gives me some more insight into how I
achieve this. My use case is as below : 

 

1. At times I would need all the IDs at the same time in memory

2. Most of the times I would need to check if a set of IDs as input by the
user (say 100 IDs) are present in the datastore

 

I've been thinking of doing the following :

 

1. Persisting all the IDs by putting them into an array (I will probably
have shards where each array would hold 50k IDs)

2. Implementing a bloom filter to search for the set of IDs if they exist in
the datastore.

 

 

On 22 April 2011 09:34, David Parks <davidpark...@yahoo.com> wrote:

I don't know your intended use of these ID's, my thoughts here are limited
to assumed use, feel free to ignore thoughts that are off base for your use
case. 

 

If, when you query for the IDs you are looking for *all* the IDs, then just
serialize them into one field and retrieve them as one record and
de-serialize them in a way that doesn't require they all fit into memory at
the same time (a tokenized CSV list is most straight forward example, but
you can do more compact serializations).

 

If you need to query for some subset of these IDs, then storing them in the
datastore is indeed the way to go I suspect. You can batch many
inserts/updates. You'll have a large table, but that isn't likely to be a
problem with this data store, but do test it. If lookup times degrade with
size you could consider partitioning your users into different groups
(simple example: 1 group of users IDs that end in even #'s, another that
ends in odd #'s), this can reduce the size of indexes and improve
performance on some systems (I don't have personal experience to tell you
whether this is necessary in this system, but it's a thought to consider).

 

Again, I just offer this as food for thought. If you describe your intended
access patterns it will probably help guide the discussion. Good luck.

 

 

From: google-appengine@googlegroups.com
[mailto:google-appengine@googlegroups.com] On Behalf Of nischalshetty
Sent: Tuesday, April 19, 2011 1:15 PM
To: google-appengine@googlegroups.com
Subject: [google-appengine] Appropriate way to save hundreds of thousands of
ids per user

 

Every user in my app would have thousands of ids corresponding to them. I
would need to look up these ids often.

Two things I could think of:

1. Put them into Lists - (drawback is that lists have a maximum capacity of
5000(hope I'm right here) and I have users who would need to save more than
150,000 ids)
2. Insert each id as a unique record in the datastore (too much of data? as
it would be user * ids of all users). Can I batch put 5000 records at a
time? Can I batch get at least 100 - 500 records at a time?

Is there any other way to do this? I hope my question's clear. Your
suggestions are greatly appreciated.

-- 
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to
google-appengine+unsubscr...@googlegroups.com
<mailto:google-appengine%2bunsubscr...@googlegroups.com> .
For more options, visit this group at
http://groups.google.com/group/google-appengine?hl=en.

  _____  

No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1209 / Virus Database: 1500/3582 - Release Date: 04/18/11

-- 
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to
google-appengine+unsubscr...@googlegroups.com
<mailto:google-appengine%2bunsubscr...@googlegroups.com> .
For more options, visit this group at
http://groups.google.com/group/google-appengine?hl=en.




-- 
-Nischal

+91-9920240474 <tel:%2B91-9920240474> 

twitter: NischalShetty <http://twitter.com/nischalshetty> 

facebook: Nischal <http://facebook.com/nischal> 

 

 <http://www.justunfollow.com> 

 <http://www.justunfollow.com>  

 <http://www.justunfollow.com>  

 <http://www.justunfollow.com> -- 
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/google-appengine?hl=en.

 <http://www.justunfollow.com> 
  _____  


 <http://www.justunfollow.com> No virus found in this message.
Checked by AVG - www.avg.com

 <http://www.justunfollow.com> Version: 10.0.1209 / Virus Database:
1500/3589 - Release Date: 04/21/11

 <http://www.justunfollow.com> -- 
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/google-appengine?hl=en.

 <http://www.justunfollow.com> 



 <http://www.justunfollow.com>  

 <http://www.justunfollow.com>  

 <http://www.justunfollow.com>  

 <http://www.justunfollow.com> -- 
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/google-appengine?hl=en.

 <http://www.justunfollow.com> 
  _____  


 <http://www.justunfollow.com> No virus found in this message.
Checked by AVG - www.avg.com

 <http://www.justunfollow.com> Version: 10.0.1209 / Virus Database:
1500/3595 - Release Date: 04/24/11

 <http://www.justunfollow.com> -- 
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/google-appengine?hl=en.

 <http://www.justunfollow.com> 


-- 
Nick Johnson, Developer Programs Engineer, App Engine



 <http://www.justunfollow.com> -- 
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/google-appengine?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to