EUREKA!

> -----Original Message-----
> From: Stuart Dallas [mailto:stu...@3ft9.com]
> Sent: Tuesday, September 03, 2013 6:31 AM
> To: Daevid Vincent
> Cc: php-general@lists.php.net
> Subject: Re: [PHP] refernces, arrays, and why does it take up so much
> memory?
> 
> On 3 Sep 2013, at 02:30, Daevid Vincent <dae...@daevid.com> wrote:
> 
> > I'm confused on how a reference works I think.
> >
> > I have a DB result set in an array I'm looping over. All I simply want
to
> do
> > is make the array key the "id" of the result set row.
> >
> > This is the basic gist of it:
> >
> >       private function _normalize_result_set()
> >       {
> >              foreach($this->tmp_results as $k => $v)
> >              {
> >                     $id = $v['id'];
> >                     $new_tmp_results[$id] =& $v; //2013-08-29 [dv] using
a
> > reference here cuts the memory usage in half!
> 
> You are assigning a reference to $v. In the next iteration of the loop, $v
> will be pointing at the next item in the array, as will the reference
you're
> storing here. With this code I'd expect $new_tmp_results to be an array
> where the keys (i.e. the IDs) are correct, but the data in each item
matches
> the data in the last item from the original array, which appears to be
what
> you describe.
> 
> >                     unset($this->tmp_results[$k]);
> 
> Doing this for every loop is likely very inefficient. I don't know how the
> inner workings of PHP process something like this, but I wouldn't be
> surprised if it's allocating a new chunk of memory for a version of the
> array without this element. You may find it better to not unset anything
> until the loop has finished, at which point you can just unset($this-
> >tmp_results).
> 
> >
> >                     /*
> >                     if ($i++ % 1000 == 0)
> >                     {
> >                           gc_enable(); // Enable Garbage Collector
> >                           var_dump(gc_enabled()); // true
> >                           var_dump(gc_collect_cycles()); // # of
elements
> > cleaned up
> >                           gc_disable(); // Disable Garbage Collector
> >                     }
> >                     */
> >              }
> >              $this->tmp_results = $new_tmp_results;
> >              //var_dump($this->tmp_results); exit;
> >              unset($new_tmp_results);
> >       }
> 
> 
> Try this:
> 
> private function _normalize_result_set()
> {
>   // Initialise the temporary variable.
>   $new_tmp_results = array();
> 
>   // Loop around just the keys in the array.
>   foreach (array_keys($this->tmp_results) as $k)
>   {
>     // Store the item in the temporary array with the ID as the key.
>     // Note no pointless variable for the ID, and no use of &!
>     $new_tmp_results[$this->tmp_results[$k]['id']] =
$this->tmp_results[$k];
>   }
> 
>   // Assign the temporary variable to the original variable.
>   $this->tmp_results = $new_tmp_results;
> }
> 
> I'd appreciate it if you could plug this in and see what your memory usage
> reports say. In most cases, trying to control the garbage collection
through
> the use of references is the worst way to go about optimising your code.
In
> my code above I'm relying on PHPs copy-on-write feature where data is only
> duplicated when assigned if it changes. No unsets, just using scope to
mark
> a variable as able to be cleaned up.
> 
> Where is this result set coming from? You'd save yourself a lot of
> memory/time by putting the data in to this format when you read it from
the
> source. For example, if reading it from MySQL, $this-
> >tmp_results[$row['id']] = $row when looping around the result set.
> 
> Also, is there any reason why you need to process this full set of data in
> one go? Can you not break it up in to smaller pieces that won't put as
much
> strain on resources?
> 
> -Stuart

There were reasons I had the $id -- I only showed the relevant parts of the
code for sake of not overly complicating what I was trying to illustrate.
There is other processing that had to be done too in the loop and that is
also what I illustrated.

Here is your version effectively:

        private function _normalize_result_set() //Stuart
        {
                  if (!$this->tmp_results || count($this->tmp_results) < 1)
return;

                  $new_tmp_results = array();

                  // Loop around just the keys in the array.
                  $D_start_mem_usage = memory_get_usage();
                  foreach (array_keys($this->tmp_results) as $k)
                  {
                        /*
                        if ($this->tmp_results[$k]['genres'])
                        {
                                // rip through each scene's `genres` and
store them as an array since we'll need'em later too
                                $g = explode('|',
$this->tmp_results[$k]['genres']);
                                array_pop($g); // there is an extra ''
element due to the final | character. :-\
                                $this->tmp_results[$k]['g'] = $g;
                        }
                        */

                        // Store the item in the temporary array with the ID
as the key.
                    // Note no pointless variable for the ID, and no use of
&!
                    $new_tmp_results[$this->tmp_results[$k]['id']] =
$this->tmp_results[$k];
                  }

                  // Assign the temporary variable to the original variable.
                  $this->tmp_results = $new_tmp_results;
                  echo "\nMEMORY USED FOR STUART's version:
".number_format(memory_get_usage() - $D_start_mem_usage)." PEAK:
(".number_format(memory_get_peak_usage(true)).")<br>\n";
                  var_dump($this->tmp_results);
                  exit();
        }

MEMORY USED FOR STUART's version: -128 PEAK: (90,439,680)

With the processing in the genres block
MEMORY USED FOR STUART's version: 97,264,368 PEAK: (187,695,104)

So a slight improvement from the original of -28,573,696
MEMORY USED FOR _normalize_result_set(): 97,264,912 PEAK: (216,268,800)


No matter what I tried however it seems that frustratingly just the simple
act of adding a new hash to the array is causing a significant memory jump.
That really blows! Therefore my solution was to not store the $g as ['g'] --
which would seem to be the more efficient way of doing this once and re-use
the array over and over, but instead I am forced to inline rip through and
explode() in three different places of my code. 

We get over 30,000 hits per second, and even with lots of caching, 216MB vs
70-96MB is significant and the speed hit is only about 1.5 seconds more per
page.

Here are three distinctly different example pages that exercise different
parts of the code path:

PAGE RENDERED IN 7.0466279983521 SECONDS
MEMORY USED @START: 262,144 - @END: 26,738,688 = 26,476,544 BYTES
MEMORY PEAK USAGE: 69,730,304 BYTES

PAGE RENDERED IN 6.9327299594879 SECONDS
MEMORY USED @START: 262,144 - @END: 53,739,520 = 53,477,376 BYTES
MEMORY PEAK USAGE: 79,167,488 BYTES

PAGE RENDERED IN 7.558168888092 SECONDS
MEMORY USED @START: 262,144 - @END: 50,855,936 = 50,593,792 BYTES
MEMORY PEAK USAGE: 96,206,848 BYTES

Furthermore I investigated what Jim Giner suggested and it turns out there
was a way for me to wedge into our Connection class a way to mangle the
results at that point, which is actually a more elegant solution overall as
we can re-use that in many more places going forward.

        /**
         * Execute a database SQL query and return all the results in an
associative array
         *
         * @access      public
         * @return      array or false
         * @param       string $sql the SQL code to execute
         * @param       boolean $print (false) Print a color coded version
of the query.
         * @param       boolean $get_first (false) return the first element
only. useful for when 1 row is returned such as "LIMIT 1"
         * @param       string $key (null) if a column name, such as 'id' is
used here, then that column will be used as the array key
         * @author      Daevid Vincent [dae...@sctr.net]
         * @date      2013-09-03
         * @see get_instance(), execute(), fetch_query_pair()
         */
        public function fetch_query($sql = "", $print = false,
$get_first=false, $key=null)
        {
                //$D_start_mem_usage = memory_get_usage();
                if (!$this->execute($sql, $print)) return false;

                $tmp = array();

                if (is_null($key))
                        while($arr = $this->fetch_array(MYSQL_ASSOC)) $tmp[]
= $arr;
                else
                        while($arr = $this->fetch_array(MYSQL_ASSOC))
$tmp[$arr[$key]] = $arr;

                $this->free_result(); // freeing result from memory
                //echo "\nMEMORY USED FOR fetch_query():
".number_format(memory_get_usage() - $D_start_mem_usage)." PEAK:
(".number_format(memory_get_peak_usage(true)).")<br>\n";
                return (($get_first) ? array_shift($tmp) : $tmp);
        }



-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to