Robert, You will get into the trouble with these aggregations:
urls: http://правительство.рф/search/?phrase=налог§ion=gov_events -> httpsearchphrase http://правительство.рф/search/?phrase=президент§ion=gov_events -> httpsearchphrase or usernames: мститель2000 -> 2000 тест2000 -> 2000 but anyway in most cases your approach will work well:) You can leave it up to the user (add some kind of flag "use_hash"). or we can try to url encode strings: urllib.quote(task_name.encode('utf-8')) http3AD0BFD180D0B0D0B2D0B8D182D0B5D0BBD18CD181D182D0B2D0BED180D184search3Fphrase3DD0BDD0B0D0BBD0BED0B3 http3AD0BFD180D0B0D0B2D0B8D182D0B5D0BBD18CD181D182D0B2D0BED180D184search3Fphrase3DD0BFD180D0B5D0B7D0B8D0B4D0B5D0BDD182 but this is not better that hash :-D thanks On Nov 3, 7:13 am, Robert Kluin <robert.kl...@gmail.com> wrote: > Hey Dmitry, > I am sure the "fix" in that commit is _not_ a good idea. Originally > I stuck it in because I use entity keys as the task-name, sometimes > they contains characters not allowed in task-names. I actually > debated for several days about pushing that update out; finally I > decide to push and hope someone would notice and offer their thoughts. > > I like your idea a lot. But, for many aggregations I like to use > entity keys, it makes it possible for me to visually see what a task > is doing. What do you think about something like the following > approach: > > if not re.match("^[a-zA-Z0-9-]+$", task_name): > task_name = sha1_hash(task_name) > > That should allow 'valid' names to remain as-is, but it will safely > encode non-valid task-names. Do you think that is an acceptable > method? > > Thanks a lot for your feedback. > > Robert > > On Tue, Nov 2, 2010 at 07:15, Dmitry <dmitry.lukas...@gmail.com> wrote: > > Hi Robert, > > > Regarding your latest commit: > > > # TODO: find a better solution for cleaning up the name. > > task_name = re.sub('[^a-zA-Z0-9-]', '', task_name)[:500] > > > Don't think this is a good idea:) For example I have unicode > > characters in aggregation value. In this case regexp will return > > nothing. > > I use sha1 hash now... but there's also a little possibility of > > collision > > > sha1_hash(self.agg_name) > > > def utf8encoded(data): > > if data is None: > > return None > > if isinstance(data, unicode): > > return unicode(data).encode('utf-8') > > else: > > return data > > > def sha1_hash(value): > > return hashlib.sha1(utf8encoded(value)).hexdigest() > > > On Oct 24, 9:26 pm, Robert Kluin <robert.kl...@gmail.com> wrote: > >> Hi Dmitry, > >> Glad to hear it was helpful! Not sure when you checked it out last, > >> but I made a number of good (I think) improvements in the last couple > >> days, such as continuations to allow splitting large groups of work > >> up. > > >> Robert > > >> On Sun, Oct 24, 2010 at 07:57, Dmitry <dmitry.lukas...@gmail.com> wrote: > >> > Robert, > > >> > You grouping_with_date_rollup.py example was extremely helpful. Thanks > >> > a lot again! :) > > >> > On Oct 14, 8:47 pm, Robert Kluin <robert.kl...@gmail.com> wrote: > >> >> Hey Carles, > >> >> Glad it seems helpful. I am hoping to get time today to push out > >> >> some revisions and sample code. > > >> >> Robert > > >> >> On Thu, Oct 14, 2010 at 05:50, Carles Gonzalez <carle...@gmail.com> > >> >> wrote: > >> >> > Robert, I took a brief inspection at your code and seems very cool. > >> >> > Exactly > >> >> > what i was lloking for for my report generation and such. > >> >> > I'm looking forward for more examples, but it seems a very valuable > >> >> > addition > >> >> > for our toolbox. > >> >> > Thanks a lot! > > >> >> > On Wed, Oct 13, 2010 at 9:20 PM, Carles Gonzalez <carle...@gmail.com> > >> >> > wrote: > > >> >> >> Neat! I'm going to see this code, hopefully I'll understand > >> >> >> something :) > >> >> >> On Wednesday, October 13, 2010, Robert Kluin <robert.kl...@gmail.com> > >> >> >> wrote: > >> >> >> > Hey Dmitry, > >> >> >> > In case it might help, I pushed some code to bitbucket. At the > >> >> >> > moment I would (personally) say the code is not too pretty, but it > >> >> >> > works well. :) > >> >> >> > http://bitbucket.org/thebobert/slagg > > >> >> >> > Sorry it does not really have good documentation at the moment, > >> >> >> > but > >> >> >> > I think the basic example I threw together will give you a good > >> >> >> > idea > >> >> >> > of how to use it. I need to do another cleanup pass over the API > >> >> >> > to > >> >> >> > make a few more refinements. > > >> >> >> > I pulled this code out of one of my apps, and tried to quickly > >> >> >> > refactor it to be a bit more generic. We are currently using > >> >> >> > basically the same code in three apps to do some really complex > >> >> >> > calculations. As soon as I get time I will get an example up > >> >> >> > showing > >> >> >> > how to use it for neat stuff, like overall, yearly, monthly, and > >> >> >> > daily > >> >> >> > aggregates across multiple values (like total dollars and > >> >> >> > quantity). > >> >> >> > The cool thing is that you can do all of those aggregations across > >> >> >> > various groupings, like customer, company, contact, and > >> >> >> > sales-person, > >> >> >> > at once. I'll get that code pushed out in the next few days. > > >> >> >> > Would love to get some feedback on it. > > >> >> >> > Robert > > >> >> >> > On Tue, Oct 12, 2010 at 17:26, Dmitry <dmitry.lukas...@gmail.com> > >> >> >> > wrote: > >> >> >> >> Ben, thanks for your code! I'm trying to understand all this stuff > >> >> >> >> too... > >> >> >> >> Robert, any success with your "library"? May be you've already > >> >> >> >> done > >> >> >> >> all stuff we are trying to implement... > > >> >> >> >> p.s. where is Brett S.:) would like to hear his comments on this > > >> >> >> >> On Sep 21, 1:49 pm, Ben <pondneverfree...@yahoo.com> wrote: > >> >> >> >>> Thanks for your insights. I would love feedback on this > >> >> >> >>> implementation > >> >> >> >>> (Brett S. suggested we send in our code for > >> >> >> >>> this)http://pastebin.com/3pUhFdk8 > > >> >> >> >>> This implementation is for just one materialized view row at a > >> >> >> >>> time > >> >> >> >>> (e.g. a simple counter, no presence markers). Hopefully putting > >> >> >> >>> an ETA > >> >> >> >>> on the transactional task will relieve the write pressure, since > >> >> >> >>> usually it should be an old update with an out-of-date sequence > >> >> >> >>> number > >> >> >> >>> and be discarded (the update having already been completed in > >> >> >> >>> batch by > >> >> >> >>> the fork-join-queue). > > >> >> >> >>> I'd love to generalize this to do more than one materialized > >> >> >> >>> view row > >> >> >> >>> but thought I'd get feedback first. > > >> >> >> >>> Thanks, > >> >> >> >>> Ben > > >> >> >> >>> On Sep 17, 7:30 am, Robert Kluin <robert.kl...@gmail.com> wrote: > > >> >> >> >>> > Responses inline. > > >> >> >> >>> > On Thu, Sep 16, 2010 at 17:32, Ben <pondneverfree...@yahoo.com> > >> >> >> >>> > wrote: > >> >> >> >>> > > I have a question about Brett Slatkin's talk at I/O 2010 on > >> >> >> >>> > > data > >> >> >> >>> > > pipelines. The question is about slide #67 of his pdf, > >> >> >> >>> > > corresponding > >> >> >> >>> > > to minute 51:30 of his talk > > >> >> >> >>> > > >http://code.google.com/events/io/2010/sessions/high-throughput-data-p... > > >> >> >> >>> > > I am wondering what is supposed to happen in the > >> >> >> >>> > > transactional > >> >> >> >>> > > task > >> >> >> >>> > > (bullet point 2c). Would these updates to the materialized > >> >> >> >>> > > view > >> >> >> >>> > > cause > >> >> >> >>> > > you to write too frequently to the entity group containing > >> >> >> >>> > > the > >> >> >> >>> > > materialized view? > > >> >> >> >>> > I think there are really two different approaches you can use > >> >> >> >>> > to > >> >> >> >>> > insert your work models. > >> >> >> >>> > 1) The work models get added to the original entity's group. > >> >> >> >>> > So, > >> >> >> >>> > inside of the original transaction you do not write to the > >> >> >> >>> > entity > >> >> >> >>> > group containing the materialized view -- so no contention on > >> >> >> >>> > it. > >> >> >> >>> > Commit the transaction and proceed to step 3. > >> >> >> >>> > 2) You kick off a transactional task to insert the work > >> >> >> >>> > model, or > >> >> >> >>> > fan-out more tasks to create work models :). Then you > >> >> >> >>> > proceed to > >> >> >> >>> > step 3. > > >> >> >> >>> > You can use method 1 if you have only a few aggregates. If > >> >> >> >>> > you have > >> >> >> >>> > more aggregates use the second method. I have a "library" I am > >> >> >> >>> > almost > >> >> >> >>> > ready to open source that makes method 2 really easy, so you > >> >> >> >>> > can > >> >> >> >>> > have > >> >> >> >>> > lots of aggregates. I'll post to this group when I release it. > > >> >> >> >>> > > And a related question, what happens if there is a failure > >> >> >> >>> > > just > >> >> >> >>> > > after > >> >> >> >>> > > the transaction in bullet #2, but right before the named > >> >> >> >>> > > task gets > >> >> >> >>> > > inserted in bullet #3. In my current implementation I just > >> >> >> >>> > > left > >> >> >> >>> > > out > >> >> >> >>> > > the transactional task (bullet point 2c) but I think that > >> >> >> >>> > > causes > >> >> >> >>> > > me to > >> >> >> >>> > > lose the eventual consistency. > > >> >> >> >>> > Failure between steps 2 and 3 just means _that_ particular > >> >> >> >>> > update > >> >> >> >>> > will > >> >> >> >>> > not try to kick-off, ie insert, the fan-in (aggregation) task. > >> >> >> >>> > But > >> >> >> >>> > it > >> >> >> >>> > might have already been inserted by the previous update, or > >> >> >> >>> > the next > >> >> >> >>> > update. However, if nothing else kicks of the fan-in task you > >> >> >> >>> > will > >> >> >> >>> > need some periodic "cleanup" method to catch the update and > >> >> >> >>> > kick of > >> >> >> >>> > the fan-in task. Depending on exactly how you implemented > >> >> >> >>> > step 2 > >> >> >> >>> > you > >> >> >> >>> > may not need a transactional task. > > >> >> >> >>> > Robert > > >> >> >> >>> > > Thanks! > > >> >> > -- > >> >> > You received this message because you are subscribed to the Google > >> >> > Groups > >> >> > "Google App Engine" group. > >> >> > To post to this group, send email to > >> >> > google-appeng...@googlegroups.com. > >> >> > To unsubscribe from this group, send email to > >> >> > google-appengine+unsubscr...@googlegroups.com. > >> >> > For more options, visit this group at > >> >> >http://groups.google.com/group/google-appengine?hl=en. > > >> > -- > >> > You received this message because you are subscribed to the Google > >> > Groups "Google App Engine" group. > >> > To post to this group, send email to google-appeng...@googlegroups.com. > >> > To unsubscribe from this group, send email to > >> > google-appengine+unsubscr...@googlegroups.com. > >> > For more options, visit this group > >> > athttp://groups.google.com/group/google-appengine?hl=en. > > > -- > > You received this message because you are subscribed to the Google Groups > > "Google App Engine" group. > > To post to this group, send email to google-appeng...@googlegroups.com. > > To unsubscribe from this group, send email to > > google-appengine+unsubscr...@googlegroups.com. > > For more options, visit this group > > athttp://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appeng...@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.