[Boston.pm] Perl and recursion

Adam Russell Fri, 05 Apr 2013 17:13:42 -0700

I am currently in the midst of implementing a fairly non-trivial recursive 
algorithm in Perl. The depth of the recursion is quite large, so much so that I 
have set no warning recursion which warns with a depth over 100. This seems 
pretty small to me! If the default is to warn at a depth of 100 does this imply 
that Perl runs into issues internally somewhere? A quick profiling via top does 
indeed confirm what I consider an unusually large memory allocation when run 
with inputs that force a depth of greater than a few hundred. Googling did not 
provide much insight and in fact it seems only small trivial examples are 
mainly discussed on this topic. Can anyone here comment on what should be 
considered here?
Best Regards,
Adam


Sent from my iPhone

On Apr 5, 2013, at 7:46 PM, boston-pm-requ...@mail.pm.org wrote:

> Send Boston-pm mailing list submissions to
>    boston-pm@mail.pm.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>    http://mail.pm.org/mailman/listinfo/boston-pm
> or, via email, send a message with subject or body 'help' to
>    boston-pm-requ...@mail.pm.org
> 
> You can reach the person managing the list at
>    boston-pm-ow...@mail.pm.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Boston-pm digest..."
> 
> 
> Today's Topics:
> 
>   1. Re: Passing large complex data structures between    process
>      (John Redford)
>   2. Re: Passing large complex data structures between    process
>      (Anthony Caravello)
>   3. Re: Passing large complex data structures between    process
>      (Ben Tilly)
>   4. Re: Passing large complex data structures    between    process
>      (John Redford)
>   5. Re: Passing large complex data structures between    process
>      (John Redford)
>   6. Re: Passing large complex data structures between    process
>      (Anthony Caravello)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Fri, 5 Apr 2013 15:04:21 -0400
> From: John Redford <eire...@hotmail.com>
> To: "'Ben Tilly'" <bti...@gmail.com>
> Cc: 'L-boston-pm' <boston-pm@mail.pm.org>
> Subject: Re: [Boston.pm] Passing large complex data structures between
>    process
> Message-ID: <blu172-ds717390f3b68dc4f878a8bb8...@phx.gbl>
> Content-Type: text/plain; charset="us-ascii"
> 
> Ben Tilly emitted:
>> 
>> Pro tip.  I've seen both push based systems and pull based systems at
> work.  The
>> push based systems tend to break whenever the thing that you're pushing to
>> has problems.  Pull-based systems tend to be much more reliable in my
>> experience.
> [...]
>> 
>> If you disregard this tip, then learn from experience and give thought in
>> advance to how you're going to monitor the things that you're pushing to,
>> notice their problems, and fix them when they break.
>> (Rather than 2 weeks later when someone wonders why their data stopped
>> updating.)
> 
> Your writing is FUD.
> 
> Pro tip.  Learn to use a database.  I know that it can be fun to play with
> the latest piece of shiny technofrippery, like Redis, and to imagine that
> because it is new, it somehow is better than anything that came before and
> that it can solve problems that have never been solved before.  It's not.
> There's nothing specifically wrong with it, but it's not a silver bullet and
> parallelism is not a werewolf.
> 
> 
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Fri, 5 Apr 2013 16:26:41 -0400
> From: Anthony Caravello <t...@caravello.us>
> To: John Redford <eire...@hotmail.com>
> Cc: L-boston-pm <boston-pm@mail.pm.org>
> Subject: Re: [Boston.pm] Passing large complex data structures between
>    process
> Message-ID:
>    <caglsx2rwg-avd__y+lr++7qry4noiw_7fketiuhp5-slwew...@mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> Queuing systems aren't really new or 'technofrippery'.  In-memory FIFO
> stacks are ridiculously fast compared to transaction safe rdbms' for this
> simple purpose.  Databases incur a lot of overhead for wonderful things
> that don't aid this cause.
> 
> This isn't magic, sometimes it's just the right tool for the job.
> 
> On Fri, Apr 5, 2013 at 3:04 PM, John Redford <eire...@hotmail.com> wrote:
> 
>> Ben Tilly emitted:
>>> 
>>> Pro tip.  I've seen both push based systems and pull based systems at
>> work.  The
>>> push based systems tend to break whenever the thing that you're pushing
>> to
>>> has problems.  Pull-based systems tend to be much more reliable in my
>>> experience.
>> [...]
>>> 
>>> If you disregard this tip, then learn from experience and give thought in
>>> advance to how you're going to monitor the things that you're pushing to,
>>> notice their problems, and fix them when they break.
>>> (Rather than 2 weeks later when someone wonders why their data stopped
>>> updating.)
>> 
>> Your writing is FUD.
>> 
>> Pro tip.  Learn to use a database.  I know that it can be fun to play with
>> the latest piece of shiny technofrippery, like Redis, and to imagine that
>> because it is new, it somehow is better than anything that came before and
>> that it can solve problems that have never been solved before.  It's not.
>> There's nothing specifically wrong with it, but it's not a silver bullet
>> and
>> parallelism is not a werewolf.
>> 
>> 
>> 
>> _______________________________________________
>> Boston-pm mailing list
>> Boston-pm@mail.pm.org
>> http://mail.pm.org/mailman/listinfo/boston-pm
> 
> 
> ------------------------------
> 
> Message: 3
> Date: Fri, 5 Apr 2013 14:43:00 -0700
> From: Ben Tilly <bti...@gmail.com>
> To: John Redford <eire...@hotmail.com>
> Cc: L-boston-pm <boston-pm@mail.pm.org>
> Subject: Re: [Boston.pm] Passing large complex data structures between
>    process
> Message-ID:
>    <canoac9w2ywf7xmbda3ogtr+pila4ubofjz2j288jcqwoztm...@mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> On Fri, Apr 5, 2013 at 12:04 PM, John Redford <eire...@hotmail.com> wrote:
>> Ben Tilly emitted:
>>> 
>>> Pro tip.  I've seen both push based systems and pull based systems at
>> work.  The
>>> push based systems tend to break whenever the thing that you're pushing to
>>> has problems.  Pull-based systems tend to be much more reliable in my
>>> experience.
>> [...]
>>> 
>>> If you disregard this tip, then learn from experience and give thought in
>>> advance to how you're going to monitor the things that you're pushing to,
>>> notice their problems, and fix them when they break.
>>> (Rather than 2 weeks later when someone wonders why their data stopped
>>> updating.)
>> 
>> Your writing is FUD.
> 
> Are you reading something into what I wrote that wasn't there?
> Because I'm pretty sure that what I wrote isn't FUD.
> 
> A pull-based system relies on having the job that does the work ask
> for work when it's ready.  A push-based system relies on pushing to a
> worker.  If the worker in question is busy on a long job, or has
> crashed for some reason, it is easy for work to get delayed or lost
> with a push-based system while other workers sit idle.  A recent
> well-publicized example of resulting sporadic problems is
> http://rapgenius.com/James-somers-herokus-ugly-secret-lyrics.  A
> pull-based system avoids that failure mode unless all workers crash at
> once.
> 
> For an example of an interesting failure case, consider a request that
> crashes whatever worker tries to do it.  With a push-based system, a
> worker gets it, crashes, might be brought up automatically, tries the
> same request, crashes again, and all requests sent to the unlucky
> worker are permanently lost.  With a pull-based system, the bad
> request will start crashing workers left and right, but progress
> continues to be made on everything.
> 
> This is not to say that push-based systems are always inappropriate.
> http is a push-based system, so often push-based system is simpler to
> build and design.  But if you have an even choice, prefer the
> pull-based system.  Yes, you will have to poll, but it tends to have
> better failure modes.
> 
>> Pro tip.  Learn to use a database.  I know that it can be fun to play with
>> the latest piece of shiny technofrippery, like Redis, and to imagine that
>> because it is new, it somehow is better than anything that came before and
>> that it can solve problems that have never been solved before.  It's not.
>> There's nothing specifically wrong with it, but it's not a silver bullet and
>> parallelism is not a werewolf.
> 
> What makes you think that I don't know how to use a database?  (Here
> is a hint: a separate table per downloader is not exactly a best
> practice.)  If you'll note, my first suggestion was to implement
> polling on the database.  That's because I've been there, done that.
> It works and the database gets better throughput than most people
> realize it can.  In fact it probably gets more than sufficient for
> this particular application.
> 
> If the queries are properly designed (often means that someone else
> did the heavy work putting things into the queue), distributing
> hundreds of jobs per second to workers is pretty easy.  (I don't know
> the limit, 100/second was sufficient when I needed to do this last,
> and MySQL didn't break a sweat on that.)  I'll describe how to do that
> in a second.
> 
> But this particular use case isn't a great fit for a database's
> capabilities.  It is like using army tanks for picking up groceries
> from the corner store.  If you've got the tanks, might as well, but
> there are more appropriate tools.  With Redis will distribute tens of
> thousands of jobs per second pretty easily.  Scaling farther than that
> requires distributing work in a more sophisticated way, but it sounds
> like they have a long way to go before running into that barrier.
> 
> (NOTE FOR DAVID: here is a blueprint for something that might be easy
> for you to build, to solve your current scaling problem.  It will also
> allow you to trivially distribute downloading across multiple machines
> for better throughput, without introducing new technologies into your
> stack.)
> 
> Now if you're curious how to achieve that throughput with a database
> and polling, here you go.  This is based on a system that I've built
> variations of several times.  Have two tables, let's call them
> job_order and job_pickup.  We insert into job_order when we want work
> done.  A worker inserts into job_pickup when it's ready to do work.
> 
> When a worker wakes up, it checks whether the top id for job_order
> exceeds the top id for job_pickup.  If not, sleep.  If it does, then
> insert a row into job_pickup.  The id of that row is your job.  Start
> polling for that job_order.  When you find it, update the record with
> a new status.  Once you're done, mark it done.  If your job_order had
> been there right away, assume that there is another, and insert into
> job_pickup until the workers have caught up with requests.  Then after
> that job, sleep.
> 
> When I say sleep, I mean something like usleep(rand(0.2)).  The rand
> avoids a "thundering herd problem".  When you're polling, put a
> smaller random sleep between poll requests to avoid overloading the
> system.  You can play with those numbers depending on how many
> workers, requests, etc that you have.  But the excess polling overhead
> can easily be limited.
> 
> If you want to have multiple types of jobs, and not all workers able
> to handle all kinds of jobs, you won't be able to use the
> autoincrementing ID for synchronization.  But you can use the same
> tables and a pair of sequences per type.  See
> http://www.postgresql.org/docs/8.1/static/sql-createsequence.html for
> information on how to do that with PostgreSQL.
> 
> As a sanity check you can have a monitor that will look for jobs that
> do not seem to be processed, mark them as failed, and resubmit them.
> (If you lock properly, the monitor is safe.  Most developers do not
> understand MVCC well enough to avoid a small race condition, but the
> odds of hitting that are very small.)  Put a concatenated index on
> (status, create_datetime) and the queries that it needs to make will
> be extremely efficient.  Also thanks to row-level locking, there is
> almost no contention between processes.
> 
> That's a design for a generic polling-based job control system using a
> SQL database for a back end.  It works.  It isn't great but scales
> much farther than most developers would expect.  And when it maxes
> out, well, that's a perfect use case for Redis.  And when Redis maxes
> out, come back and talk, i know how to do that as well.  I picked up a
> lot of knowledge about how to build reliable and scaleable systems
> when I worked at Google.
> 
> (That said, Redis is a good piece of software.  Why are you resistant
> to learning it?)
> 
> 
> ------------------------------
> 
> Message: 4
> Date: Fri, 5 Apr 2013 18:05:35 -0400
> From: John Redford <eire...@hotmail.com>
> To: "'Anthony Caravello'" <t...@caravello.us>
> Cc: 'L-boston-pm' <boston-pm@mail.pm.org>
> Subject: Re: [Boston.pm] Passing large complex data structures    between
>    process
> Message-ID: <blu172-ds240d271c5decb830d453cdb8...@phx.gbl>
> Content-Type: text/plain; charset="us-ascii"
> 
> Anthony Caravello writes:
>> 
>> Queuing systems aren't really new or 'technofrippery'.  In-memory FIFO
> stacks
>> are ridiculously fast compared to transaction safe rdbms' for this simple
>> purpose.  Databases incur a lot of overhead for wonderful things that
> don't aid
>> this cause.
> 
> No one said queuing systems are new. If you intend to mistake an instance
> for a class, then you shall only address your mistake.
> 
> Indeed, a purely in-memory system is fast.  In the relational database world
> this is generally called a "temporary table", and it can be used to
> efficiently process information between phases that require persistence.
> 
> I am not planning to explain everything about RDBMS technology that people
> might do well to understand.
> 
> 
> 
> ------------------------------
> 
> Message: 5
> Date: Fri, 5 Apr 2013 18:35:58 -0400
> From: John Redford <eire...@hotmail.com>
> To: "'Ben Tilly'" <bti...@gmail.com>
> Cc: 'L-boston-pm' <boston-pm@mail.pm.org>
> Subject: Re: [Boston.pm] Passing large complex data structures between
>    process
> Message-ID: <blu172-ds1625aaf8900e878469cb2fb8...@phx.gbl>
> Content-Type: text/plain; charset="us-ascii"
> 
> Ben Tilly expands:
>> On Fri, Apr 5, 2013 at 12:04 PM, John Redford <eire...@hotmail.com> wrote:
>>> Your writing is FUD.
>> 
>> Are you reading something into what I wrote that wasn't there?
>> Because I'm pretty sure that what I wrote isn't FUD.
> 
> It was. Ask anyone. I'm not your English tutor.
> 
>> A pull-based system relies on having the job that does the work ask for
> work
>> when it's ready.  A push-based system relies on pushing to a worker.
> 
> So, let's get this straight, pro.
> 
> Any given download client would connect to the database, read its table, and
> do the work it finds there.  When it eventually completes the work, it calls
> back & reports the results.  The client __PULLS__ its workload.  I am not
> sure how you missed this.  In no way does the database server block on the
> client.  In no way does Alfie block on the client. Alfie, periodically
> checks each downloader's queue tables and ensures that any empty queue gets
> filled with work.  It doesn't wait for anything.
> 
> So, that seems to be pretty pull-like, which you say you like.  I'll leave
> it to you to work out how you misread what I wrote after you work out how
> you misread what you wrote.
> 
>> What makes you think that I don't know how to use a database?  (Here is a
> hint:
>> a separate table per downloader is not exactly a best
>> practice.)  If you'll note, my first suggestion was to implement polling
> on the
>> database.  That's because I've been there, done that.
>> It works and the database gets better throughput than most people realize
> it
>> can.  In fact it probably gets more than sufficient for this particular
> application.
> 
> The fact that you don't recognize how to use it.  And, here, the fact that
> you appeal to "best practice" as being in conflict with and overriding
> efficient use of the database.
> 
> A lot of people will espouse best practices, like "normalize your data" and
> "have a single-column primary key", which will indeed generally make sense
> most of the time, and which are useful for producing an
> aesthetically-pleasing data model.  But the ultimate "best practice" is to
> solve the problem and to solve it in a way that functions within acceptable
> parameters of cost, performance, support, reliability and so forth. Lots of
> so-called "best practices" -- in databases and programming languages --
> conflict with efficiency.
> 
> For example, consider Duff's Device -- it is the programmatic equivalent of
> denormalized data -- it obviously violates some best practices; it's clearly
> more difficult to read than a simple loop -- and yet even Perl programmers
> have been known to understand the benefits of this technique.
> 
> When you understand how to use a database, you will understand how to use it
> efficiently.
> 
>> (That said, Redis is a good piece of software.  Why are you resistant to
> learning
>> it?)
> 
> See http://en.wikipedia.org/wiki/Loaded_question.
> 
> 
> 
> 
> ------------------------------
> 
> Message: 6
> Date: Fri, 5 Apr 2013 19:46:13 -0400
> From: Anthony Caravello <t...@caravello.us>
> To: John Redford <eire...@hotmail.com>
> Cc: L-boston-pm <boston-pm@mail.pm.org>
> Subject: Re: [Boston.pm] Passing large complex data structures between
>    process
> Message-ID:
>    <CAGLSX2qZh5KNwrxz4Jc-Bbzok4QWF2=u_mr4s_e401mkwwv...@mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> I bow to you.  I've been on this list for a long time and figured my 20
> years of development and engineering experience might be of assistance and
> for the first time I offered it.  From now on, you should answer all the
> questions.
> 
> -unsubscribe
> On Apr 5, 2013 6:05 PM, "John Redford" <eire...@hotmail.com> wrote:
> 
>> Anthony Caravello writes:
>>> 
>>> Queuing systems aren't really new or 'technofrippery'.  In-memory FIFO
>> stacks
>>> are ridiculously fast compared to transaction safe rdbms' for this simple
>>> purpose.  Databases incur a lot of overhead for wonderful things that
>> don't aid
>>> this cause.
>> 
>> No one said queuing systems are new. If you intend to mistake an instance
>> for a class, then you shall only address your mistake.
>> 
>> Indeed, a purely in-memory system is fast.  In the relational database
>> world
>> this is generally called a "temporary table", and it can be used to
>> efficiently process information between phases that require persistence.
>> 
>> I am not planning to explain everything about RDBMS technology that people
>> might do well to understand.
> 
> 
> ------------------------------
> 
> Subject: Digest Footer
> 
> 
> _______________________________________________
> Boston-pm mailing list
> Boston-pm@mail.pm.org
> http://mail.pm.org/mailman/listinfo/boston-pm
> 
> ------------------------------
> 
> End of Boston-pm Digest, Vol 118, Issue 5
> *****************************************

_______________________________________________
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

[Boston.pm] Perl and recursion

Reply via email to