Re: [symfony-users] Re: CLI sf Task, allowed memory issue
Hi, We've implemented a number of crawlers ourselves and the benefit of using PHP is that they are easier to maintain if they are built in a language a larger range of people cah use. To solve your specific problems build a queue system. Create a list table, which lists urls that you want to scrape. This may include details on how to log in to these and what method (POST, GET) to use. Then create a daemon by doing the following: - create a cron job that runs every minute: - set-time_limit(58+$time_of_curl_request+$a_bit); - get next url from list - get contents - scrape out data you need - this may include generating new list urls - remove from list table - measure time, if 58 seconds have passed terminate. If you have to manage load on the target site and have certain turnaround times for scrape data, you may also need a scheduler, which decides what url gets scheduled for scrape when.On Thu, Sep 2, 2010 at 2:20 PM, Dennis wrote: > Also, I listened to a conversation with Justin Wage and some one from > Digg. They use daemons for various things (kind of what you need to > do). They kept a counter for each daemon and when the counter was at > some magic number of times the daemon had been assigned and completed > a task, it was killed, and a new one started to replace it. This was > more of a 'top level' garbage collection scheme IN PRODUCTION right > now. > > On Sep 1, 11:17 am, pghoratiu wrote: >> Hi! >> >> My suggestion is to use PHP 5.3.X, it has improved garbage collection >> and it should help with reclaiming unused memory. Also you should >> group the code that is leaking inside a separate function(s), this way >> the PHP runtime knows that it can release the memory for variables >> within the scope. >> >> gabriel >> >> On Sep 1, 12:11 pm, "PieR." wrote: >> >> > Hi, >> >> > I have a sfTask in CLI wich use lot of foreach and preg_matches, and >> > unfortunatly PHP return an error "Allowed memory size" in few >> > minutes. >> >> > I read that PHP clear the memory when a script ends, so I tried to run >> > tasks inside the main task, but the problem still remains. >> >> > How to manage this memory issue ? clear memory or launch tasks in >> > separate processes ? >> >> > The final aim is to build a web crawler, wich runs many hours per >> > days. >> >> > Thanks in advance for help, >> >> > Regards, >> >> > Pierre >> >> > > -- > If you want to report a vulnerability issue on symfony, please send it to > security at symfony-project.com > > You received this message because you are subscribed to the Google > Groups "symfony users" group. > To post to this group, send email to symfony-users@googlegroups.com > To unsubscribe from this group, send email to > symfony-users+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/symfony-users?hl=en > -- If you want to report a vulnerability issue on symfony, please send it to security at symfony-project.com You received this message because you are subscribed to the Google Groups "symfony users" group. To post to this group, send email to symfony-users@googlegroups.com To unsubscribe from this group, send email to symfony-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/symfony-users?hl=en
[symfony-users] Re: CLI sf Task, allowed memory issue
Also, I listened to a conversation with Justin Wage and some one from Digg. They use daemons for various things (kind of what you need to do). They kept a counter for each daemon and when the counter was at some magic number of times the daemon had been assigned and completed a task, it was killed, and a new one started to replace it. This was more of a 'top level' garbage collection scheme IN PRODUCTION right now. On Sep 1, 11:17 am, pghoratiu wrote: > Hi! > > My suggestion is to use PHP 5.3.X, it has improved garbage collection > and it should help with reclaiming unused memory. Also you should > group the code that is leaking inside a separate function(s), this way > the PHP runtime knows that it can release the memory for variables > within the scope. > > gabriel > > On Sep 1, 12:11 pm, "PieR." wrote: > > > Hi, > > > I have a sfTask in CLI wich use lot of foreach and preg_matches, and > > unfortunatly PHP return an error "Allowed memory size" in few > > minutes. > > > I read that PHP clear the memory when a script ends, so I tried to run > > tasks inside the main task, but the problem still remains. > > > How to manage this memory issue ? clear memory or launch tasks in > > separate processes ? > > > The final aim is to build a web crawler, wich runs many hours per > > days. > > > Thanks in advance for help, > > > Regards, > > > Pierre > > -- If you want to report a vulnerability issue on symfony, please send it to security at symfony-project.com You received this message because you are subscribed to the Google Groups "symfony users" group. To post to this group, send email to symfony-users@googlegroups.com To unsubscribe from this group, send email to symfony-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/symfony-users?hl=en
[symfony-users] Re: CLI sf Task, allowed memory issue
PHP is the easiest, but not the best for that purpose. If you're crawling web sites, why not use web crawling software? Take a look @ apache web crawler. The Apache foundation has some amazing software in it's 'forge'. If you're doing web scraping, try dapper.net. Follow a link on the main page to get to the 'old site'. On Sep 1, 11:17 am, pghoratiu wrote: > Hi! > > My suggestion is to use PHP 5.3.X, it has improved garbage collection > and it should help with reclaiming unused memory. Also you should > group the code that is leaking inside a separate function(s), this way > the PHP runtime knows that it can release the memory for variables > within the scope. > > gabriel > > On Sep 1, 12:11 pm, "PieR." wrote: > > > Hi, > > > I have a sfTask in CLI wich use lot of foreach and preg_matches, and > > unfortunatly PHP return an error "Allowed memory size" in few > > minutes. > > > I read that PHP clear the memory when a script ends, so I tried to run > > tasks inside the main task, but the problem still remains. > > > How to manage this memory issue ? clear memory or launch tasks in > > separate processes ? > > > The final aim is to build a web crawler, wich runs many hours per > > days. > > > Thanks in advance for help, > > > Regards, > > > Pierre > > -- If you want to report a vulnerability issue on symfony, please send it to security at symfony-project.com You received this message because you are subscribed to the Google Groups "symfony users" group. To post to this group, send email to symfony-users@googlegroups.com To unsubscribe from this group, send email to symfony-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/symfony-users?hl=en
[symfony-users] Re: CLI sf Task, allowed memory issue
Hi! My suggestion is to use PHP 5.3.X, it has improved garbage collection and it should help with reclaiming unused memory. Also you should group the code that is leaking inside a separate function(s), this way the PHP runtime knows that it can release the memory for variables within the scope. gabriel On Sep 1, 12:11 pm, "PieR." wrote: > Hi, > > I have a sfTask in CLI wich use lot of foreach and preg_matches, and > unfortunatly PHP return an error "Allowed memory size" in few > minutes. > > I read that PHP clear the memory when a script ends, so I tried to run > tasks inside the main task, but the problem still remains. > > How to manage this memory issue ? clear memory or launch tasks in > separate processes ? > > The final aim is to build a web crawler, wich runs many hours per > days. > > Thanks in advance for help, > > Regards, > > Pierre -- If you want to report a vulnerability issue on symfony, please send it to security at symfony-project.com You received this message because you are subscribed to the Google Groups "symfony users" group. To post to this group, send email to symfony-users@googlegroups.com To unsubscribe from this group, send email to symfony-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/symfony-users?hl=en