[google-appengine] Re: What is the most efficient way to do a large IN query in GQL?

2010-09-10 Thread ogterran
Hi Nick,

I can't use key_names since twitter ids are not the key_names for my
user models
The users don't have to oauth to twitter or other sites.

Robert/Niklas,
I'll try getting all the users and do a reference query.

Thanks guys
John

On Sep 7, 3:39 am, "Nick Johnson (Google)" 
wrote:
> Hi John,
>
> On Tue, Sep 7, 2010 at 1:01 AM, johnterran  wrote:
> > Hi Robert,
>
> > I can't use the key_name.  The ids are not from my site
> > i.e.
> > Lets say the ids are from twitter. I want to know how many of the
> > twitter users
> > are registered on my site.   So the ids can exists in the datastore,
> > but it doesn't have to.
>
> This doesn't prevent you using the IDs as key_names. Attempting to fetch an
> entity that doesn't exist will simply return None for that entity.
>
> -Nick Johnson
>
>
>
>
>
>
>
> > Is the best way to get all the users and filter them manually similar
> > to what Niklas wrote?
>
> > Thanks
> > John
>
> > On Sep 6, 8:22 am, Robert Kluin  wrote:
> > > It will not be possible to use IN for something like that.  IN will
> > > execute a series of queries, and it is is capped at 30.
>
> > > If possible, I would suggest you make the entity key_name the user's
> > > id.  Then you can just build a list of keys and fetch those -- but I
> > > really doubt you'll get anything close to 10K on a single fetch.
>
> > > Robert
>
> > > On Mon, Sep 6, 2010 at 04:51, johnterran  wrote:
> > > > Hi
>
> > > > In BigTable, what is the most efficient way to do a large IN query?
> > > > My IN parameter list is typically 500 but can be 10k+
> > > > i.e.
> > > > class User(db.Model):
> > > >    name = db.StringProperty(required = True)
> > > >    id = db.StringProperty(required = True)
>
> > > > given a list of ids that can consist of 10k list, i need to retrieve
> > > > all the names
> > > >  users = db.GqlQuery("SELECT * FROM User where id IN :1",
> > > >                            ids)
>
> > > > what is the best way to do this?
>
> > > > Thanks
> > > > John
>
> > > > --
> > > > You received this message because you are subscribed to the Google
> > Groups "Google App Engine" group.
> > > > To post to this group, send email to google-appengine@googlegroups.com
> > .
> > > > To unsubscribe from this group, send email to
> > google-appengine+unsubscr...@googlegroups.com > e...@googlegroups.com>
> > .
> > > > For more options, visit this group athttp://
> > groups.google.com/group/google-appengine?hl=en.
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Google App Engine" group.
> > To post to this group, send email to google-appeng...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > google-appengine+unsubscr...@googlegroups.com > e...@googlegroups.com>
> > .
> > For more options, visit this group at
> >http://groups.google.com/group/google-appengine?hl=en.
>
> --
> Nick Johnson, Developer Programs Engineer, App Engine Google Ireland Ltd. ::
> Registered in Dublin, Ireland, Registration Number: 368047
> Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number:
> 368047

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



Re: [google-appengine] Re: What is the most efficient way to do a large IN query in GQL?

2010-09-07 Thread Nick Johnson (Google)
Hi John,

On Tue, Sep 7, 2010 at 1:01 AM, johnterran  wrote:

> Hi Robert,
>
> I can't use the key_name.  The ids are not from my site
> i.e.
> Lets say the ids are from twitter. I want to know how many of the
> twitter users
> are registered on my site.   So the ids can exists in the datastore,
> but it doesn't have to.
>

This doesn't prevent you using the IDs as key_names. Attempting to fetch an
entity that doesn't exist will simply return None for that entity.

-Nick Johnson


>
> Is the best way to get all the users and filter them manually similar
> to what Niklas wrote?
>
> Thanks
> John
>
> On Sep 6, 8:22 am, Robert Kluin  wrote:
> > It will not be possible to use IN for something like that.  IN will
> > execute a series of queries, and it is is capped at 30.
> >
> > If possible, I would suggest you make the entity key_name the user's
> > id.  Then you can just build a list of keys and fetch those -- but I
> > really doubt you'll get anything close to 10K on a single fetch.
> >
> > Robert
> >
> >
> >
> > On Mon, Sep 6, 2010 at 04:51, johnterran  wrote:
> > > Hi
> >
> > > In BigTable, what is the most efficient way to do a large IN query?
> > > My IN parameter list is typically 500 but can be 10k+
> > > i.e.
> > > class User(db.Model):
> > >name = db.StringProperty(required = True)
> > >id = db.StringProperty(required = True)
> >
> > > given a list of ids that can consist of 10k list, i need to retrieve
> > > all the names
> > >  users = db.GqlQuery("SELECT * FROM User where id IN :1",
> > >ids)
> >
> > > what is the best way to do this?
> >
> > > Thanks
> > > John
> >
> > > --
> > > You received this message because you are subscribed to the Google
> Groups "Google App Engine" group.
> > > To post to this group, send email to google-appengine@googlegroups.com
> .
> > > To unsubscribe from this group, send email to
> google-appengine+unsubscr...@googlegroups.com
> .
> > > For more options, visit this group athttp://
> groups.google.com/group/google-appengine?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to google-appeng...@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengine+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>
>


-- 
Nick Johnson, Developer Programs Engineer, App Engine Google Ireland Ltd. ::
Registered in Dublin, Ireland, Registration Number: 368047
Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number:
368047

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



Re: [google-appengine] Re: What is the most efficient way to do a large IN query in GQL?

2010-09-06 Thread Robert Kluin
Hey John,
  Niklas also has another followup post.  For the exact situation you
are asking about I think he has the right idea.  You could use simply
use a reference property to reference another model, making it very
easy to query. To build those models you will need to iterate over all
users and generate the appropriate cross-reference models.  For that I
suggest looking into cursors and the task queue.

  If you are trying to keep statistics of some sort, I would
pre-compute when ever possible.  In this case you could keep counts of
how many users each source has.  For querying you could add a "source"
field to your user model indicating the source of the user, then
fetching twitter, facebook, or google users would be much easier.

  If your data set is large, you may want to look into the mapper api.

Robert




On Mon, Sep 6, 2010 at 20:01, johnterran  wrote:
> Hi Robert,
>
> I can't use the key_name.  The ids are not from my site
> i.e.
> Lets say the ids are from twitter. I want to know how many of the
> twitter users
> are registered on my site.   So the ids can exists in the datastore,
> but it doesn't have to.
>
> Is the best way to get all the users and filter them manually similar
> to what Niklas wrote?
>
> Thanks
> John
>
> On Sep 6, 8:22 am, Robert Kluin  wrote:
>> It will not be possible to use IN for something like that.  IN will
>> execute a series of queries, and it is is capped at 30.
>>
>> If possible, I would suggest you make the entity key_name the user's
>> id.  Then you can just build a list of keys and fetch those -- but I
>> really doubt you'll get anything close to 10K on a single fetch.
>>
>> Robert
>>
>>
>>
>> On Mon, Sep 6, 2010 at 04:51, johnterran  wrote:
>> > Hi
>>
>> > In BigTable, what is the most efficient way to do a large IN query?
>> > My IN parameter list is typically 500 but can be 10k+
>> > i.e.
>> > class User(db.Model):
>> >    name = db.StringProperty(required = True)
>> >    id = db.StringProperty(required = True)
>>
>> > given a list of ids that can consist of 10k list, i need to retrieve
>> > all the names
>> >  users = db.GqlQuery("SELECT * FROM User where id IN :1",
>> >                            ids)
>>
>> > what is the best way to do this?
>>
>> > Thanks
>> > John
>>
>> > --
>> > You received this message because you are subscribed to the Google Groups 
>> > "Google App Engine" group.
>> > To post to this group, send email to google-appeng...@googlegroups.com.
>> > To unsubscribe from this group, send email to 
>> > google-appengine+unsubscr...@googlegroups.com.
>> > For more options, visit this group 
>> > athttp://groups.google.com/group/google-appengine?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Google App Engine" group.
> To post to this group, send email to google-appeng...@googlegroups.com.
> To unsubscribe from this group, send email to 
> google-appengine+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/google-appengine?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



[google-appengine] Re: What is the most efficient way to do a large IN query in GQL?

2010-09-06 Thread Niklasro
On Sep 7, 12:01 am, johnterran  wrote:
> Hi Robert,
>
> I can't use the key_name.  The ids are not from my site
> i.e.
> Lets say the ids are from twitter. I want to know how many of the
> twitter users
> are registered on my site.   So the ids can exists in the datastore,
> but it doesn't have to.
>
> Is the best way to get all the users and filter them manually similar
> to what Niklas wrote?
>
> Thanks
> John
>
> On Sep 6, 8:22 am, Robert Kluin  wrote:
>
> > It will not be possible to use IN for something like that.  IN will
> > execute a series of queries, and it is is capped at 30.
>
> > If possible, I would suggest you make the entity key_name the user's
> > id.  Then you can just build a list of keys and fetch those -- but I
> > really doubt you'll get anything close to 10K on a single fetch.
>
> > Robert
>
> > On Mon, Sep 6, 2010 at 04:51, johnterran  wrote:
> > > Hi
>
> > > In BigTable, what is the most efficient way to do a large IN query?
> > > My IN parameter list is typically 500 but can be 10k+
> > > i.e.
> > > class User(db.Model):
> > >    name = db.StringProperty(required = True)
> > >    id = db.StringProperty(required = True)
>
> > > given a list of ids that can consist of 10k list, i need to retrieve
> > > all the names
> > >  users = db.GqlQuery("SELECT * FROM User where id IN :1",
> > >                            ids)
>
> > > what is the best way to do this?
>
> > > Thanks
> > > John
class Match(db.Model):#match other with own
  user=db.UserProperty(verbose_name="myuser")
 
reference=db.ReferenceProperty(OtherModel,collection_name='matched_users',verbose_name="Title")

Here a better(?) structure referenced model ie twitter.matched_users
could do it or even in memcache for something like a dictionary or a
table. Again I didn't program it only proposing similar structure
which solved my logic matching entities
according to nearly arbitrary matches So like making the query like a
dictionary worked for blobs also "automatically" parametrized via
get_serving_url ie finding the parametrization got querying 2 () only
querying 1 just observing how query parametrizes.
Regards
Niklas (proposing Match class that also can work self-referencing)

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



[google-appengine] Re: What is the most efficient way to do a large IN query in GQL?

2010-09-06 Thread johnterran
Hi Robert,

I can't use the key_name.  The ids are not from my site
i.e.
Lets say the ids are from twitter. I want to know how many of the
twitter users
are registered on my site.   So the ids can exists in the datastore,
but it doesn't have to.

Is the best way to get all the users and filter them manually similar
to what Niklas wrote?

Thanks
John

On Sep 6, 8:22 am, Robert Kluin  wrote:
> It will not be possible to use IN for something like that.  IN will
> execute a series of queries, and it is is capped at 30.
>
> If possible, I would suggest you make the entity key_name the user's
> id.  Then you can just build a list of keys and fetch those -- but I
> really doubt you'll get anything close to 10K on a single fetch.
>
> Robert
>
>
>
> On Mon, Sep 6, 2010 at 04:51, johnterran  wrote:
> > Hi
>
> > In BigTable, what is the most efficient way to do a large IN query?
> > My IN parameter list is typically 500 but can be 10k+
> > i.e.
> > class User(db.Model):
> >    name = db.StringProperty(required = True)
> >    id = db.StringProperty(required = True)
>
> > given a list of ids that can consist of 10k list, i need to retrieve
> > all the names
> >  users = db.GqlQuery("SELECT * FROM User where id IN :1",
> >                            ids)
>
> > what is the best way to do this?
>
> > Thanks
> > John
>
> > --
> > You received this message because you are subscribed to the Google Groups 
> > "Google App Engine" group.
> > To post to this group, send email to google-appeng...@googlegroups.com.
> > To unsubscribe from this group, send email to 
> > google-appengine+unsubscr...@googlegroups.com.
> > For more options, visit this group 
> > athttp://groups.google.com/group/google-appengine?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



[google-appengine] Re: What is the most efficient way to do a large IN query in GQL?

2010-09-06 Thread Niklasro(.appspot)


On Sep 6, 8:51 am, johnterran  wrote:
> Hi
>
> In BigTable, what is the most efficient way to do a large IN query?
> My IN parameter list is typically 500 but can be 10k+
> i.e.
> class User(db.Model):
>     name = db.StringProperty(required = True)
>     id = db.StringProperty(required = True)
>
> given a list of ids that can consist of 10k list, i need to retrieve
> all the names
>  users = db.GqlQuery("SELECT * FROM User where id IN :1",
>                             ids)
>
> what is the best way to do this?
>
> Thanks
> John
I propose parametrize pairs to a dictionary using Query and not GQL ie
User.All( ...+ logic
ie filter("url id", ['www.domain010703.com.','domain010703']) with IN
pairs listed as dictionary. I didn't program it but the data structure
seems adequate.
Thank you
Niklas R

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.