Dmitry, I finally got the time to make these changes. Let me know if that works for your use-case.
I really appreciate all of your suggestions and help with this. Robert 2010/11/3 Dmitry <dmitry.lukas...@gmail.com>: > oops I read expression in wrong direction. This will definitely work! > > On Nov 3, 7:43 pm, Robert Kluin <robert.kl...@gmail.com> wrote: >> Dmitry, >> Right, I know those will cause problems. So what about my suggested >> solution of using: >> >> if not re.match("^[a-zA-Z0-9-]+$", task_name): >> task_name = sha1_hash(task_name) >> >> That should correctly handle your use cases, since the full name will be >> hashed. >> >> Are there issues with that solution I am not seeing? >> >> Robert >> >> On Nov 3, 2010, at 3:52, Dmitry <dmitry.lukas...@gmail.com> wrote: >> >> > Robert, >> >> > You will get into the trouble with these aggregations: >> >> > urls: >> > http://правительство.рф/search/?phrase=налог§ion=gov_events -> >> > httpsearchphrase >> > http://правительство.рф/search/?phrase=президент§ion=gov_events -> >> > httpsearchphrase >> >> > or usernames: >> > мститель2000 -> 2000 >> > тест2000 -> 2000 >> >> > but anyway in most cases your approach will work well:) You can leave >> > it up to the user (add some kind of flag "use_hash"). >> >> > or we can try to url encode strings: >> > urllib.quote(task_name.encode('utf-8')) >> > http3AD0BFD180D0B0D0B2D0B8D182D0B5D0BBD18CD181D182D0B2D0BED180D184search3Fphrase3DD0BDD0B0D0BBD0BED0B3 >> > http3AD0BFD180D0B0D0B2D0B8D182D0B5D0BBD18CD181D182D0B2D0BED180D184search3Fphrase3DD0BFD180D0B5D0B7D0B8D0B4D0B5D0BDD182 >> >> > but this is not better that hash :-D >> >> > thanks >> >> > On Nov 3, 7:13 am, Robert Kluin <robert.kl...@gmail.com> wrote: >> >> Hey Dmitry, >> >> I am sure the "fix" in that commit is _not_ a good idea. Originally >> >> I stuck it in because I use entity keys as the task-name, sometimes >> >> they contains characters not allowed in task-names. I actually >> >> debated for several days about pushing that update out; finally I >> >> decide to push and hope someone would notice and offer their thoughts. >> >> >> I like your idea a lot. But, for many aggregations I like to use >> >> entity keys, it makes it possible for me to visually see what a task >> >> is doing. What do you think about something like the following >> >> approach: >> >> >> if not re.match("^[a-zA-Z0-9-]+$", task_name): >> >> task_name = sha1_hash(task_name) >> >> >> That should allow 'valid' names to remain as-is, but it will safely >> >> encode non-valid task-names. Do you think that is an acceptable >> >> method? >> >> >> Thanks a lot for your feedback. >> >> >> Robert >> >> >> On Tue, Nov 2, 2010 at 07:15, Dmitry <dmitry.lukas...@gmail.com> wrote: >> >>> Hi Robert, >> >> >>> Regarding your latest commit: >> >> >>> # TODO: find a better solution for cleaning up the name. >> >>> task_name = re.sub('[^a-zA-Z0-9-]', '', task_name)[:500] >> >> >>> Don't think this is a good idea:) For example I have unicode >> >>> characters in aggregation value. In this case regexp will return >> >>> nothing. >> >>> I use sha1 hash now... but there's also a little possibility of >> >>> collision >> >> >>> sha1_hash(self.agg_name) >> >> >>> def utf8encoded(data): >> >>> if data is None: >> >>> return None >> >>> if isinstance(data, unicode): >> >>> return unicode(data).encode('utf-8') >> >>> else: >> >>> return data >> >> >>> def sha1_hash(value): >> >>> return hashlib.sha1(utf8encoded(value)).hexdigest() >> >> >>> On Oct 24, 9:26 pm, Robert Kluin <robert.kl...@gmail.com> wrote: >> >>>> Hi Dmitry, >> >>>> Glad to hear it was helpful! Not sure when you checked it out last, >> >>>> but I made a number of good (I think) improvements in the last couple >> >>>> days, such as continuations to allow splitting large groups of work >> >>>> up. >> >> >>>> Robert >> >> >>>> On Sun, Oct 24, 2010 at 07:57, Dmitry <dmitry.lukas...@gmail.com> wrote: >> >>>>> Robert, >> >> >>>>> You grouping_with_date_rollup.py example was extremely helpful. Thanks >> >>>>> a lot again! :) >> >> >>>>> On Oct 14, 8:47 pm, Robert Kluin <robert.kl...@gmail.com> wrote: >> >>>>>> Hey Carles, >> >>>>>> Glad it seems helpful. I am hoping to get time today to push out >> >>>>>> some revisions and sample code. >> >> >>>>>> Robert >> >> >>>>>> On Thu, Oct 14, 2010 at 05:50, Carles Gonzalez <carle...@gmail.com> >> >>>>>> wrote: >> >>>>>>> Robert, I took a brief inspection at your code and seems very cool. >> >>>>>>> Exactly >> >>>>>>> what i was lloking for for my report generation and such. >> >>>>>>> I'm looking forward for more examples, but it seems a very valuable >> >>>>>>> addition >> >>>>>>> for our toolbox. >> >>>>>>> Thanks a lot! >> >> >>>>>>> On Wed, Oct 13, 2010 at 9:20 PM, Carles Gonzalez >> >>>>>>> <carle...@gmail.com> wrote: >> >> >>>>>>>> Neat! I'm going to see this code, hopefully I'll understand >> >>>>>>>> something :) >> >>>>>>>> On Wednesday, October 13, 2010, Robert Kluin >> >>>>>>>> <robert.kl...@gmail.com> >> >>>>>>>> wrote: >> >>>>>>>>> Hey Dmitry, >> >>>>>>>>> In case it might help, I pushed some code to bitbucket. At the >> >>>>>>>>> moment I would (personally) say the code is not too pretty, but it >> >>>>>>>>> works well. :) >> >>>>>>>>> http://bitbucket.org/thebobert/slagg >> >> >>>>>>>>> Sorry it does not really have good documentation at the moment, >> >>>>>>>>> but >> >>>>>>>>> I think the basic example I threw together will give you a good >> >>>>>>>>> idea >> >>>>>>>>> of how to use it. I need to do another cleanup pass over the API >> >>>>>>>>> to >> >>>>>>>>> make a few more refinements. >> >> >>>>>>>>> I pulled this code out of one of my apps, and tried to quickly >> >>>>>>>>> refactor it to be a bit more generic. We are currently using >> >>>>>>>>> basically the same code in three apps to do some really complex >> >>>>>>>>> calculations. As soon as I get time I will get an example up >> >>>>>>>>> showing >> >>>>>>>>> how to use it for neat stuff, like overall, yearly, monthly, and >> >>>>>>>>> daily >> >>>>>>>>> aggregates across multiple values (like total dollars and >> >>>>>>>>> quantity). >> >>>>>>>>> The cool thing is that you can do all of those aggregations across >> >>>>>>>>> various groupings, like customer, company, contact, and >> >>>>>>>>> sales-person, >> >>>>>>>>> at once. I'll get that code pushed out in the next few days. >> >> >>>>>>>>> Would love to get some feedback on it. >> >> >>>>>>>>> Robert >> >> >>>>>>>>> On Tue, Oct 12, 2010 at 17:26, Dmitry <dmitry.lukas...@gmail.com> >> >>>>>>>>> wrote: >> >>>>>>>>>> Ben, thanks for your code! I'm trying to understand all this stuff >> >>>>>>>>>> too... >> >>>>>>>>>> Robert, any success with your "library"? May be you've already >> >>>>>>>>>> done >> >>>>>>>>>> all stuff we are trying to implement... >> >> >>>>>>>>>> p.s. where is Brett S.:) would like to hear his comments on this >> >> >>>>>>>>>> On Sep 21, 1:49 pm, Ben <pondneverfree...@yahoo.com> wrote: >> >>>>>>>>>>> Thanks for your insights. I would love feedback on this >> >>>>>>>>>>> implementation >> >>>>>>>>>>> (Brett S. suggested we send in our code for >> >>>>>>>>>>> this)http://pastebin.com/3pUhFdk8 >> >> >>>>>>>>>>> This implementation is for just one materialized view row at a >> >>>>>>>>>>> time >> >>>>>>>>>>> (e.g. a simple counter, no presence markers). Hopefully putting >> >>>>>>>>>>> an ETA >> >>>>>>>>>>> on the transactional task will relieve the write pressure, since >> >>>>>>>>>>> usually it should be an old update with an out-of-date sequence >> >>>>>>>>>>> number >> >>>>>>>>>>> and be discarded (the update having already been completed in >> >>>>>>>>>>> batch by >> >>>>>>>>>>> the fork-join-queue). >> >> >>>>>>>>>>> I'd love to generalize this to do more than one materialized >> >>>>>>>>>>> view row >> >>>>>>>>>>> but thought I'd get feedback first. >> >> >>>>>>>>>>> Thanks, >> >>>>>>>>>>> Ben >> >> >>>>>>>>>>> On Sep 17, 7:30 am, Robert Kluin <robert.kl...@gmail.com> wrote: >> >> >>>>>>>>>>>> Responses inline. >> >> >>>>>>>>>>>> On Thu, Sep 16, 2010 at 17:32, Ben <pondneverfree...@yahoo.com> >> >>>>>>>>>>>> wrote: >> >>>>>>>>>>>>> I have a question about Brett Slatkin's talk at I/O 2010 on >> >>>>>>>>>>>>> data >> >>>>>>>>>>>>> pipelines. The question is about slide #67 of his pdf, >> >>>>>>>>>>>>> corresponding >> >>>>>>>>>>>>> to minute 51:30 of his talk >> >> >>>>>>>>>>>>>>http://code.google.com/events/io/2010/sessions/high-throughput-data-p... >> >> >>>>>>>>>>>>> I am wondering what is supposed to happen in the transactional >> >>>>>>>>>>>>> task >> >>>>>>>>>>>>> (bullet point 2c). Would these updates to the materialized view >> >>>>>>>>>>>>> cause >> >>>>>>>>>>>>> you to write too frequently to the entity group containing the >> >>>>>>>>>>>>> materialized view? >> >> >>>>>>>>>>>> I think there are really two different approaches you can use to >> >>>>>>>>>>>> insert your work models. >> >>>>>>>>>>>> 1) The work models get added to the original entity's group. >> >>>>>>>>>>>> So, >> >>>>>>>>>>>> inside of the original transaction you do not write to the >> >>>>>>>>>>>> entity >> >>>>>>>>>>>> group containing the materialized view -- so no contention on >> >>>>>>>>>>>> it. >> >>>>>>>>>>>> Commit the transaction and proceed to step 3. >> >>>>>>>>>>>> 2) You kick off a transactional task to insert the work model, >> >>>>>>>>>>>> or >> >>>>>>>>>>>> fan-out more tasks to create work models :). Then you >> >>>>>>>>>>>> proceed to >> >>>>>>>>>>>> step 3. >> >> >>>>>>>>>>>> You can use method 1 if you have only a few aggregates. If you >> >>>>>>>>>>>> have >> >>>>>>>>>>>> more aggregates use the second method. I have a "library" I am >> >>>>>>>>>>>> almost >> >>>>>>>>>>>> ready to open source that makes method 2 really easy, so you can >> >>>>>>>>>>>> have >> >>>>>>>>>>>> lots of aggregates. I'll post to this group when I release it. >> >> >>>>>>>>>>>>> And a related question, what happens if there is a failure just >> >>>>>>>>>>>>> after >> >>>>>>>>>>>>> the transaction in bullet #2, but right before the named task >> >>>>>>>>>>>>> gets >> >>>>>>>>>>>>> inserted in bullet #3. In my current implementation I just left >> >>>>>>>>>>>>> out >> >>>>>>>>>>>>> the transactional task (bullet point 2c) but I think that >> >>>>>>>>>>>>> causes >> >>>>>>>>>>>>> me to >> >>>>>>>>>>>>> lose the eventual consistency. >> >> >>>>>>>>>>>> Failure between steps 2 and 3 just means _that_ particular >> >>>>>>>>>>>> update >> >>>>>>>>>>>> will >> >>>>>>>>>>>> not try to kick-off, ie insert, the fan-in (aggregation) task. >> >>>>>>>>>>>> But >> >>>>>>>>>>>> it >> >>>>>>>>>>>> might have already been inserted by the previous update, or the >> >>>>>>>>>>>> next >> >>>>>>>>>>>> update. However, if nothing else kicks of the fan-in task you >> >>>>>>>>>>>> will >> >>>>>>>>>>>> need some periodic "cleanup" method to catch the update and >> >>>>>>>>>>>> kick of >> >>>>>>>>>>>> the fan-in task. Depending on exactly how you implemented step >> >>>>>>>>>>>> 2 >> >>>>>>>>>>>> you >> >>>>>>>>>>>> may not need a transactional task. >> >> >>>>>>>>>>>> Robert >> >> >>>>>>>>>>>>> Thanks! >> >> >>>>>>> -- >> >>>>>>> You received this message because you are subscribed to the Google >> >>>>>>> Groups >> >>>>>>> "Google App Engine" group. >> >> ... >> >> read more >> > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to google-appeng...@googlegroups.com. > To unsubscribe from this group, send email to > google-appengine+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appeng...@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.