Re: [google-appengine] Re: Fan-in with materialized views: A sketch

Robert Kluin Thu, 04 Nov 2010 20:46:38 -0700

Dmitry,
   I finally got the time to make these changes.  Let me know if that
works for your use-case.


   I really appreciate all of your suggestions and help with this.

Robert






2010/11/3 Dmitry <dmitry.lukas...@gmail.com>:
> oops I read expression in wrong direction. This will definitely work!
>
> On Nov 3, 7:43 pm, Robert Kluin <robert.kl...@gmail.com> wrote:
>> Dmitry,
>>   Right, I know those will cause problems. So what about my suggested 
>> solution of using:
>>
>>  if not re.match("^[a-zA-Z0-9-]+$", task_name):
>>       task_name =  sha1_hash(task_name)
>>
>> That should correctly handle your use cases, since the full name will be 
>> hashed.
>>
>> Are there issues with that solution I am not seeing?
>>
>> Robert
>>
>> On Nov 3, 2010, at 3:52, Dmitry <dmitry.lukas...@gmail.com> wrote:
>>
>> > Robert,
>>
>> > You will get into the trouble with these aggregations:
>>
>> > urls:
>> > http://правительство.рф/search/?phrase=налог&section=gov_events ->
>> > httpsearchphrase
>> > http://правительство.рф/search/?phrase=президент&section=gov_events ->
>> > httpsearchphrase
>>
>> > or usernames:
>> > мститель2000 -> 2000
>> > тест2000 -> 2000
>>
>> > but anyway in most cases your approach will work well:) You can leave
>> > it up to the user (add some kind of flag "use_hash").
>>
>> > or we can try to url encode strings:
>> > urllib.quote(task_name.encode('utf-8'))
>> > http3AD0BFD180D0B0D0B2D0B8D182D0B5D0BBD18CD181D182D0B2D0BED180D184search3Fphrase3DD0BDD0B0D0BBD0BED0B3
>> > http3AD0BFD180D0B0D0B2D0B8D182D0B5D0BBD18CD181D182D0B2D0BED180D184search3Fphrase3DD0BFD180D0B5D0B7D0B8D0B4D0B5D0BDD182
>>
>> > but this is not better that hash :-D
>>
>> > thanks
>>
>> > On Nov 3, 7:13 am, Robert Kluin <robert.kl...@gmail.com> wrote:
>> >> Hey Dmitry,
>> >>   I am sure the "fix" in that commit is _not_ a good idea.  Originally
>> >> I stuck it in because I use entity keys as the task-name, sometimes
>> >> they contains characters not allowed in task-names.  I actually
>> >> debated for several days about pushing that update out;  finally I
>> >> decide to push and hope someone would notice and offer their thoughts.
>>
>> >>   I like your idea a lot.  But, for many aggregations I like to use
>> >> entity keys, it makes it possible for me to visually see what a task
>> >> is doing.  What do you think about something like the following
>> >> approach:
>>
>> >>   if not re.match("^[a-zA-Z0-9-]+$", task_name):
>> >>       task_name = sha1_hash(task_name)
>>
>> >> That should allow 'valid' names to remain as-is, but it will safely
>> >> encode non-valid task-names.  Do you think that is an acceptable
>> >> method?
>>
>> >> Thanks a lot for your feedback.
>>
>> >> Robert
>>
>> >> On Tue, Nov 2, 2010 at 07:15, Dmitry <dmitry.lukas...@gmail.com> wrote:
>> >>> Hi Robert,
>>
>> >>> Regarding your latest commit:
>>
>> >>> # TODO: find a better solution for cleaning up the name.
>> >>> task_name = re.sub('[^a-zA-Z0-9-]', '', task_name)[:500]
>>
>> >>> Don't think this is a good idea:) For example I have unicode
>> >>> characters in aggregation value. In this case regexp will return
>> >>> nothing.
>> >>> I use sha1 hash now... but there's also a little possibility of
>> >>> collision
>>
>> >>> sha1_hash(self.agg_name)
>>
>> >>> def utf8encoded(data):
>> >>>  if data is None:
>> >>>    return None
>> >>>  if isinstance(data, unicode):
>> >>>    return unicode(data).encode('utf-8')
>> >>>  else:
>> >>>    return data
>>
>> >>> def sha1_hash(value):
>> >>>  return hashlib.sha1(utf8encoded(value)).hexdigest()
>>
>> >>> On Oct 24, 9:26 pm, Robert Kluin <robert.kl...@gmail.com> wrote:
>> >>>> Hi Dmitry,
>> >>>>   Glad to hear it was helpful!  Not sure when you checked it out last,
>> >>>> but I made a number of good (I think) improvements in the last couple
>> >>>> days, such as continuations to allow splitting large groups of work
>> >>>> up.
>>
>> >>>> Robert
>>
>> >>>> On Sun, Oct 24, 2010 at 07:57, Dmitry <dmitry.lukas...@gmail.com> wrote:
>> >>>>> Robert,
>>
>> >>>>> You grouping_with_date_rollup.py example was extremely helpful. Thanks
>> >>>>> a lot again! :)
>>
>> >>>>> On Oct 14, 8:47 pm, Robert Kluin <robert.kl...@gmail.com> wrote:
>> >>>>>> Hey Carles,
>> >>>>>>   Glad it seems helpful.  I am hoping to get time today to push out
>> >>>>>> some revisions and sample code.
>>
>> >>>>>> Robert
>>
>> >>>>>> On Thu, Oct 14, 2010 at 05:50, Carles Gonzalez <carle...@gmail.com> 
>> >>>>>> wrote:
>> >>>>>>> Robert, I took a brief inspection at your code and seems very cool. 
>> >>>>>>> Exactly
>> >>>>>>> what i was lloking for for my report generation and such.
>> >>>>>>> I'm looking forward for more examples, but it seems a very valuable 
>> >>>>>>> addition
>> >>>>>>> for our toolbox.
>> >>>>>>> Thanks a lot!
>>
>> >>>>>>> On Wed, Oct 13, 2010 at 9:20 PM, Carles Gonzalez 
>> >>>>>>> <carle...@gmail.com> wrote:
>>
>> >>>>>>>> Neat! I'm going to see this code, hopefully I'll understand 
>> >>>>>>>> something :)
>> >>>>>>>> On Wednesday, October 13, 2010, Robert Kluin 
>> >>>>>>>> <robert.kl...@gmail.com>
>> >>>>>>>> wrote:
>> >>>>>>>>> Hey Dmitry,
>> >>>>>>>>>    In case it might help, I pushed some code to bitbucket.  At the
>> >>>>>>>>> moment I would (personally) say the code is not too pretty, but it
>> >>>>>>>>> works well.  :)
>> >>>>>>>>>      http://bitbucket.org/thebobert/slagg
>>
>> >>>>>>>>>   Sorry it does not really have good documentation at the moment, 
>> >>>>>>>>> but
>> >>>>>>>>> I think the basic example I threw together will give you a good 
>> >>>>>>>>> idea
>> >>>>>>>>> of how to use it.  I need to do another cleanup pass over the API 
>> >>>>>>>>> to
>> >>>>>>>>> make a few more refinements.
>>
>> >>>>>>>>>    I pulled this code out of one of my apps, and tried to quickly
>> >>>>>>>>> refactor it to be a bit more generic.  We are currently using
>> >>>>>>>>> basically the same code in three apps to do some really complex
>> >>>>>>>>> calculations.  As soon as I get time I will get an example up 
>> >>>>>>>>> showing
>> >>>>>>>>> how to use it for neat stuff, like overall, yearly, monthly, and 
>> >>>>>>>>> daily
>> >>>>>>>>> aggregates across multiple values (like total dollars and 
>> >>>>>>>>> quantity).
>> >>>>>>>>> The cool thing is that you can do all of those aggregations across
>> >>>>>>>>> various groupings, like customer, company, contact, and 
>> >>>>>>>>> sales-person,
>> >>>>>>>>> at once.  I'll get that code pushed out in the next few days.
>>
>> >>>>>>>>>   Would love to get some feedback on it.
>>
>> >>>>>>>>> Robert
>>
>> >>>>>>>>> On Tue, Oct 12, 2010 at 17:26, Dmitry <dmitry.lukas...@gmail.com> 
>> >>>>>>>>> wrote:
>> >>>>>>>>>> Ben, thanks for your code! I'm trying to understand all this stuff
>> >>>>>>>>>> too...
>> >>>>>>>>>> Robert, any success with your "library"? May be you've already 
>> >>>>>>>>>> done
>> >>>>>>>>>> all stuff we are trying to implement...
>>
>> >>>>>>>>>> p.s. where is Brett S.:) would like to hear his comments on this
>>
>> >>>>>>>>>> On Sep 21, 1:49 pm, Ben <pondneverfree...@yahoo.com> wrote:
>> >>>>>>>>>>> Thanks for your insights. I would love feedback on this 
>> >>>>>>>>>>> implementation
>> >>>>>>>>>>> (Brett S. suggested we send in our code for
>> >>>>>>>>>>> this)http://pastebin.com/3pUhFdk8
>>
>> >>>>>>>>>>> This implementation is for just one materialized view row at a 
>> >>>>>>>>>>> time
>> >>>>>>>>>>> (e.g. a simple counter, no presence markers). Hopefully putting 
>> >>>>>>>>>>> an ETA
>> >>>>>>>>>>> on the transactional task will relieve the write pressure, since
>> >>>>>>>>>>> usually it should be an old update with an out-of-date sequence 
>> >>>>>>>>>>> number
>> >>>>>>>>>>> and be discarded (the update having already been completed in 
>> >>>>>>>>>>> batch by
>> >>>>>>>>>>> the fork-join-queue).
>>
>> >>>>>>>>>>> I'd love to generalize this to do more than one materialized 
>> >>>>>>>>>>> view row
>> >>>>>>>>>>> but thought I'd get feedback first.
>>
>> >>>>>>>>>>> Thanks,
>> >>>>>>>>>>> Ben
>>
>> >>>>>>>>>>> On Sep 17, 7:30 am, Robert Kluin <robert.kl...@gmail.com> wrote:
>>
>> >>>>>>>>>>>> Responses inline.
>>
>> >>>>>>>>>>>> On Thu, Sep 16, 2010 at 17:32, Ben <pondneverfree...@yahoo.com>
>> >>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>> I have a question about Brett Slatkin's talk at I/O 2010 on 
>> >>>>>>>>>>>>> data
>> >>>>>>>>>>>>> pipelines. The question is about slide #67 of his pdf,
>> >>>>>>>>>>>>> corresponding
>> >>>>>>>>>>>>> to minute 51:30 of his talk
>>
>> >>>>>>>>>>>>>>http://code.google.com/events/io/2010/sessions/high-throughput-data-p...
>>
>> >>>>>>>>>>>>> I am wondering what is supposed to happen in the transactional
>> >>>>>>>>>>>>> task
>> >>>>>>>>>>>>> (bullet point 2c). Would these updates to the materialized view
>> >>>>>>>>>>>>> cause
>> >>>>>>>>>>>>> you to write too frequently to the entity group containing the
>> >>>>>>>>>>>>> materialized view?
>>
>> >>>>>>>>>>>> I think there are really two different approaches you can use to
>> >>>>>>>>>>>> insert your work models.
>> >>>>>>>>>>>> 1)  The work models get added to the original entity's group.  
>> >>>>>>>>>>>> So,
>> >>>>>>>>>>>> inside of the original transaction you do not write to the 
>> >>>>>>>>>>>> entity
>> >>>>>>>>>>>> group containing the materialized view -- so no contention on 
>> >>>>>>>>>>>> it.
>> >>>>>>>>>>>> Commit the transaction and proceed to step 3.
>> >>>>>>>>>>>> 2)  You kick off a transactional task to insert the work model, 
>> >>>>>>>>>>>> or
>> >>>>>>>>>>>> fan-out more tasks to create work models  :).   Then you 
>> >>>>>>>>>>>> proceed to
>> >>>>>>>>>>>> step 3.
>>
>> >>>>>>>>>>>> You can use method 1 if you have only a few aggregates.  If you 
>> >>>>>>>>>>>> have
>> >>>>>>>>>>>> more aggregates use the second method.  I have a "library" I am
>> >>>>>>>>>>>> almost
>> >>>>>>>>>>>> ready to open source that makes method 2 really easy, so you can
>> >>>>>>>>>>>> have
>> >>>>>>>>>>>> lots of aggregates.  I'll post to this group when I release it.
>>
>> >>>>>>>>>>>>> And a related question, what happens if there is a failure just
>> >>>>>>>>>>>>> after
>> >>>>>>>>>>>>> the transaction in bullet #2, but right before the named task 
>> >>>>>>>>>>>>> gets
>> >>>>>>>>>>>>> inserted in bullet #3. In my current implementation I just left
>> >>>>>>>>>>>>> out
>> >>>>>>>>>>>>> the transactional task (bullet point 2c) but I think that 
>> >>>>>>>>>>>>> causes
>> >>>>>>>>>>>>> me to
>> >>>>>>>>>>>>> lose the eventual consistency.
>>
>> >>>>>>>>>>>> Failure between steps 2 and 3 just means _that_ particular 
>> >>>>>>>>>>>> update
>> >>>>>>>>>>>> will
>> >>>>>>>>>>>> not try to kick-off, ie insert, the fan-in (aggregation) task.  
>> >>>>>>>>>>>> But
>> >>>>>>>>>>>> it
>> >>>>>>>>>>>> might have already been inserted by the previous update, or the 
>> >>>>>>>>>>>> next
>> >>>>>>>>>>>> update.  However, if nothing else kicks of the fan-in task you 
>> >>>>>>>>>>>> will
>> >>>>>>>>>>>> need some periodic "cleanup" method to catch the update and 
>> >>>>>>>>>>>> kick of
>> >>>>>>>>>>>> the fan-in task.  Depending on exactly how you implemented step 
>> >>>>>>>>>>>> 2
>> >>>>>>>>>>>> you
>> >>>>>>>>>>>> may not need a transactional task.
>>
>> >>>>>>>>>>>> Robert
>>
>> >>>>>>>>>>>>> Thanks!
>>
>> >>>>>>> --
>> >>>>>>> You received this message because you are subscribed to the Google 
>> >>>>>>> Groups
>> >>>>>>> "Google App Engine" group.
>>
>> ...
>>
>> read more >>
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Google App Engine" group.
> To post to this group, send email to google-appeng...@googlegroups.com.
> To unsubscribe from this group, send email to 
> google-appengine+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/google-appengine?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Re: Fan-in with materialized views: A sketch

Reply via email to