Re: [PHP] Application settings, configuration, and preferences.
I am partial to the filesystem but I can see scenarios where the db approach might be useful (single point of control) with good caching strategy using apc or other mechanisms. One approach I have followed is that if the config. field and values are simple key-value pairs, you could store them in a dedicated conf file and have it included in the main apache conf file (Include directive). This way, all the configs are accessible via $_SERVER. The separate conf file can be checked in svn, pushed separately as part of release process etc... The same approach also works in standalone php cli scripts via a shell wrapper - e.g.: #!/bin/bash . # list fields directly here or load them separately - e.g: # . /path/to/some_file.conf export FIELD1=foo export FIELD2 =bar # Note: values can have some structure too export FIELD3=abc,cde,fgh . . /usr/bin/php some_script.php You could also use the php ini style confs: http://php.net/manual/en/function.parse-ini-file.php In the $_SERVER approach above, the parsing is done at start-up, so there is no setup cost at every request. For the ini or xml parsing approach, you may need to cache the result if this parsing cost needs to be avoided on every request. Ravi On Wed, Oct 27, 2010 at 10:17 AM, Michael Shadle mike...@gmail.com wrote: I find json to be the most ideal data exchange format but using it for configuration files one may edit by hand is horrible. XML, ini or yaml would be better. I still prefer XML. Albeit verbose it is the easiest to read and easy to validate against. On Oct 27, 2010, at 10:12 AM, Bob McConnell r...@cbord.com wrote: From: Ken Guest On Wed, Oct 27, 2010 at 5:27 PM, mmest...@nagios.com wrote: Recently we had a discussion about weather our code should be configured using files or a DB back-end. As this topic effects nearly every development team I was wondering if there shouldn't be a common library that deals with all of these issues. We came to the conclusion that either should work, but our project must work on systems that would not have an SQLDB installed. There fore it must be configured using files as supporting both would be a waste of our development time. Looking around for a solution I came across an extension to getopt that read command line parameters from a file, essentially emulating exec $(cat);. As this did allow configuration from either the command line or a file it's a good start. However we are specificually looking for something that would accept configuration from a file or a DB, command line options are not important. Though a great solution would take configuration from anywhere. A full featured solution would also support containing user preferences and administrative settings. Allowing any of these to come from almost anywhere. Here is how an example deployment might work. As this would be a programming tool the user would be an administrator installing and configuring the software. Some configuration information contained in php should be extensible so that all the configuration could be done there. In this case settings and user preferences would be read-only, configuration information is always read-only. This would usually specify a config file to be located in the same folder or a subfolder. This configuration file would have a default format that is configurable in the php. Would be one of PHP, XML, bind, apache, and several other config file formats. This file would contain information on where settings and preferences could be written to, either another configuration file some where in /var or connection information for a DB. From an application developers stand point this should all be as difficult as getopt to setup, design decisions like what format the config file is in should be left up to the admin installing the software. The developer need only be concerned with defining the values stored, there type, and other properties. Does anything like this exist? This seams like an essential piece of code that is re-invented for every project. PEAR's Config package sounds like a good match for what you are looking for. It parses and outputs various formats and edits existing config files http://pear.php.net/package/Config There's a brief intro to what it can do at http://pear.php.net/manual/en/package.configuration.config.intro.php I have to admit I am somewhat biased as I'm currently on the PEAR Group (read 'committee') - but I'd be surprised if there's not a Zend or ezComponents/zetaComponents equivalent. I also have to admit there are some outstanding issues that need to be addressed for PEAR's Config package - the good news is someone has volunteered to resolve these today. There are nearly as many ways to do this as there are languages to implement them in. I have been using YAML files for a while now, not only for configuration and parameter storage, but also input for data driven testing,
Re: [PHP] Application settings, configuration, and preferences.
On Wed, Oct 27, 2010 at 11:39 AM, J Ravi Menon jravime...@gmail.com wrote: I am partial to the filesystem but I can see scenarios where the db approach might be useful (single point of control) with good caching strategy using apc or other mechanisms. One approach I have followed is that if the config. field and values are simple key-value pairs, you could store them in a dedicated conf file and have it included in the main apache conf file (Include directive). This way, all the configs are accessible via $_SERVER. The separate conf file can be checked in svn, pushed separately as part of release process etc... The same approach also works in standalone php cli scripts via a shell wrapper - e.g.: #!/bin/bash . # list fields directly here or load them separately - e.g: # . /path/to/some_file.conf export FIELD1=foo export FIELD2 =bar # Note: values can have some structure too export FIELD3=abc,cde,fgh . . /usr/bin/php some_script.php You could also use the php ini style confs: http://php.net/manual/en/function.parse-ini-file.php In the $_SERVER approach above, the parsing is done at start-up, so there is no setup cost at every request. For the ini or xml parsing approach, you may need to cache the result if this parsing cost needs to be avoided on every request. Ravi On Wed, Oct 27, 2010 at 10:17 AM, Michael Shadle mike...@gmail.com wrote: I find json to be the most ideal data exchange format but using it for configuration files one may edit by hand is horrible. XML, ini or yaml would be better. I still prefer XML. Albeit verbose it is the easiest to read and easy to validate against. On Oct 27, 2010, at 10:12 AM, Bob McConnell r...@cbord.com wrote: From: Ken Guest On Wed, Oct 27, 2010 at 5:27 PM, mmest...@nagios.com wrote: Recently we had a discussion about weather our code should be configured using files or a DB back-end. As this topic effects nearly every development team I was wondering if there shouldn't be a common library that deals with all of these issues. We came to the conclusion that either should work, but our project must work on systems that would not have an SQLDB installed. There fore it must be configured using files as supporting both would be a waste of our development time. Looking around for a solution I came across an extension to getopt that read command line parameters from a file, essentially emulating exec $(cat);. As this did allow configuration from either the command line or a file it's a good start. However we are specificually looking for something that would accept configuration from a file or a DB, command line options are not important. Though a great solution would take configuration from anywhere. A full featured solution would also support containing user preferences and administrative settings. Allowing any of these to come from almost anywhere. Here is how an example deployment might work. As this would be a programming tool the user would be an administrator installing and configuring the software. Some configuration information contained in php should be extensible so that all the configuration could be done there. In this case settings and user preferences would be read-only, configuration information is always read-only. This would usually specify a config file to be located in the same folder or a subfolder. This configuration file would have a default format that is configurable in the php. Would be one of PHP, XML, bind, apache, and several other config file formats. This file would contain information on where settings and preferences could be written to, either another configuration file some where in /var or connection information for a DB. From an application developers stand point this should all be as difficult as getopt to setup, design decisions like what format the config file is in should be left up to the admin installing the software. The developer need only be concerned with defining the values stored, there type, and other properties. Does anything like this exist? This seams like an essential piece of code that is re-invented for every project. PEAR's Config package sounds like a good match for what you are looking for. It parses and outputs various formats and edits existing config files http://pear.php.net/package/Config There's a brief intro to what it can do at http://pear.php.net/manual/en/package.configuration.config.intro.php I have to admit I am somewhat biased as I'm currently on the PEAR Group (read 'committee') - but I'd be surprised if there's not a Zend or ezComponents/zetaComponents equivalent. I also have to admit there are some outstanding issues that need to be addressed for PEAR's Config package - the good news is someone has volunteered to resolve these today. There are nearly as many ways to do this as there are languages to implement them in. I have been using YAML files
Re: [PHP] PHP Email Question
On Wed, Sep 29, 2010 at 1:37 PM, Joe Jackson priory...@googlemail.com wrote: Hi I am trying the following snippet as Bostjan suggested, and an email is getting sent when I submit the form however in the body of the email I am getting none of the form data in the body of the email. All I am getting is the letter 'z' ? Also in the from field of the email this is showing as my email address and not the email address of the user who has sent the form Any ideas on where I am going wrong with this snippet? Any advice would be much appreciated $msgContent = Name: . $values['name'] .\n; $msgContent .= Address: . $values['address'] .\n; $msgContent .= Telephone: . $values['telephone'] .\n; $msgContent .= Email Address: . $values['emailaddress'] .\n; $msgContent .= Message: . $values['message'] .\n; function ProcessForm($values) { mail('myemail:domain.com', 'Website Enquiry', $msgContent, From: \{$values['name']}\ {$values['emailaddress']}); // Replace with actual page or redirect :P echo htmlheadtitleThank you!/title/headbodyThank you!/body/html; Not sure if it it is a typo above, are you actually passing $msgContent in the function above? If it is a global variable, you would need to add a 'global' declaration: function ProcessForm($values) { global $msgContent; mail('myemail:domain.com', 'Website Enquiry', $msgContent, From: \{$values['name']}\ {$values['emailaddress']}\r\n); . . . } Also try adding CRLF sequence at the end of the header line as shown above. Ravi -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] PHP Email Question
Just on this topic, I found swiftmailer library to be really useful esp. in dealing with 'template' emails with custom variables per recipient: http://swiftmailer.org/ The e.g. on email template processing: http://swiftmailer.org/docs/decorator-plugin-howto There are batchSend() functionalities, ability to compose various mime type emails etc... Ravi On Mon, Sep 20, 2010 at 8:20 AM, chris h chris...@gmail.com wrote: Ignore the other parameters unless you are very familiar with RFCs 2821, 2822 and their associated RFCs I would advise against ignoring the other parameters. Doing so will pretty much guarantee having your email end up in SPAM. Instead look up the examples in the docs, or better yet use something like phpmailer as Tom suggested. Chris. On Sun, Sep 19, 2010 at 6:37 PM, TR Shaw ts...@oitc.com wrote: On Sep 19, 2010, at 6:00 PM, Joe Jackson wrote: Hi Sorry for the simple question but I am trying to get my head around PHP. I have a sample PHP script that I am trying to use to send a php powered email message. The snippet of code is shown below mail('em...@address.com', 'Subject', $values['message'], From: \{$values['name']}\ {$values['emailaddress']}); This works fine, but how can I add in other fields to the email that is recieved? For example in the form there are fields called, 'emailaddress', 'telephone', 'address' and 'name' which I need to add into the form along with the message field Also with the formatting how can I change the format of the email to Name: $values['name'], Address: etc Message: Joe The mail command lets you send mail (an RFC2821 envelop). The function is: bool mail ( string $to , string $subject , string $message [, string $additional_headers [, string$additional_parameters ]] ) $to is where you want it to go $subject is whatever you want the subject to be $message is the information you want to send Ignore the other parameters unless you are very familiar with RFCs 2821, 2822 and their associated RFCs So if you want to send info from a form you might want to roll it up in xml and send it via the message part. when you receive it you can easily decode it. If you don't want to do that put it in a format that you can easily decode on the receiving end. Basically mail is a way to deliver information in the $message body. How you format the information there is up to you. However, depending on your system's config you are probably constrained to placing only 7bit ascii in the $message body. You might also move away from the mail function and look at phpmailer at sf.net if you need more complex capabilities. Tom -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] php cli question
Thanks Bostjan for the suggestion. I did raise the issue and here is the reply: http://news.php.net/php.internals/49672 Thx, Ravi On Wed, Sep 15, 2010 at 2:38 AM, Bostjan Skufca bost...@a2o.si wrote: Here are the results I got when question of migration from apache to nginx was brought up: http://blog.a2o.si/2009/06/24/apache-mod_php-compared-to-nginx-php-fpm/ (BTW there is some FPM in main PHP distribution now) As for resource management, I recommend looking at php sources (Zend/zend_alloca.c:zend_mm_shutdown() specifically) and building a custom extension that frees discarded memory resources on your request or timer or sth else. Not sure if it is possible like that but this is just a suggestion, don't quote me on that :) Also, for such questions I recommend you to join php-internals mailing list, it seems more appropriate. b. On 15 September 2010 04:19, J Ravi Menon jravime...@gmail.com wrote: On Tue, Sep 14, 2010 at 1:15 PM, Per Jessen p...@computer.org wrote: J Ravi Menon wrote: On Tue, Sep 14, 2010 at 12:43 AM, Per Jessen p...@computer.org wrote: J Ravi Menon wrote: Few questions: 1) Does opcode cache really matter in such cli-based daemons? As 'SomeClass' is instantiated at every loop, I am assuming it is only compiled once as it has already been 'seen'. Yup. Just to clarify, you mean we don't need the op-code cache here right? That is correct. 2) What about garbage collection? In a standard apache-mod-php setup, we rely on the end of a request-cycle to free up resources - close file descriptiors, free up memory etc.. I am assuming in the aforesaid standalone daemon case, we would have to do this manually? Yes. So 'unset($some_big_array)' or 'unset($some_big_object)' etc.. is the right way to go for non-resource based items? i.e. it needs to be explicitly done? It's not quite like C - if you reassign something, the previous contents are automagically freed. I use unset() if I know it could be a while (hours) before it'll likely be reassigned, but it won't be used in the meantime. Thanks Per for clarifying this for me. Now on my follow up question: [Note: I think it is related to the issues discussed above hence keeping it on this thread but if I am violating any guidelines here, do let me know] One reason the aforesaid questions got triggered was that in our company right now, there is a big discussion on moving away from apache+mod_php solution to nginx+fast-cgi based model for handling all php-based services. The move seems to be more based some anecdotal observations and possibly not based on a typical php-based app (i.e. the php script involved was trivial one acting as some proxy to another backend service). I have written fast-cgi servers in the past in C++, and I am aware how the apahcefast-cgi-servers work (in unix socket setups). All our php apps are written with apache+mod_php in mind (no explicit resource mgmt ), so this was a concern to me. If the same scripts now need to run 'forever' as a fastcgi server, are we forced to do such manual resource mgmt? Or are there solutions here that work just as in mod_php? This reminded me of the cli daemons that I had written earlier where such manual cleanups were done, and hence my doubts on this nginx+fast-cgi approach. thx, Ravi -- Per Jessen, Zürich (14.6°C) -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] php cli question
On Tue, Sep 14, 2010 at 12:43 AM, Per Jessen p...@computer.org wrote: J Ravi Menon wrote: Few questions: 1) Does opcode cache really matter in such cli-based daemons? As 'SomeClass' is instantiated at every loop, I am assuming it is only compiled once as it has already been 'seen'. Yup. Just to clarify, you mean we don't need the op-code cache here right? 2) What about garbage collection? In a standard apache-mod-php setup, we rely on the end of a request-cycle to free up resources - close file descriptiors, free up memory etc.. I am assuming in the aforesaid standalone daemon case, we would have to do this manually? Yes. So 'unset($some_big_array)' or 'unset($some_big_object)' etc.. is the right way to go for non-resource based items? i.e. it needs to be explicitly done? thx, Ravi Note: I have written pre-forker deamons in php directly and successfully deployed them in the past, but never looked at in depth to understand all the nuances. Anecdotally, I have done 'unset()' at some critical places were large arrays were used, and I think it helped. AFAIK, unlike Java, there is no 'garbage collector' thread that does all the magic? Correct. -- Per Jessen, Zürich (12.9°C) -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] php cli question
On Tue, Sep 14, 2010 at 1:15 PM, Per Jessen p...@computer.org wrote: J Ravi Menon wrote: On Tue, Sep 14, 2010 at 12:43 AM, Per Jessen p...@computer.org wrote: J Ravi Menon wrote: Few questions: 1) Does opcode cache really matter in such cli-based daemons? As 'SomeClass' is instantiated at every loop, I am assuming it is only compiled once as it has already been 'seen'. Yup. Just to clarify, you mean we don't need the op-code cache here right? That is correct. 2) What about garbage collection? In a standard apache-mod-php setup, we rely on the end of a request-cycle to free up resources - close file descriptiors, free up memory etc.. I am assuming in the aforesaid standalone daemon case, we would have to do this manually? Yes. So 'unset($some_big_array)' or 'unset($some_big_object)' etc.. is the right way to go for non-resource based items? i.e. it needs to be explicitly done? It's not quite like C - if you reassign something, the previous contents are automagically freed. I use unset() if I know it could be a while (hours) before it'll likely be reassigned, but it won't be used in the meantime. Thanks Per for clarifying this for me. Now on my follow up question: [Note: I think it is related to the issues discussed above hence keeping it on this thread but if I am violating any guidelines here, do let me know] One reason the aforesaid questions got triggered was that in our company right now, there is a big discussion on moving away from apache+mod_php solution to nginx+fast-cgi based model for handling all php-based services. The move seems to be more based some anecdotal observations and possibly not based on a typical php-based app (i.e. the php script involved was trivial one acting as some proxy to another backend service). I have written fast-cgi servers in the past in C++, and I am aware how the apahcefast-cgi-servers work (in unix socket setups). All our php apps are written with apache+mod_php in mind (no explicit resource mgmt ), so this was a concern to me. If the same scripts now need to run 'forever' as a fastcgi server, are we forced to do such manual resource mgmt? Or are there solutions here that work just as in mod_php? This reminded me of the cli daemons that I had written earlier where such manual cleanups were done, and hence my doubts on this nginx+fast-cgi approach. thx, Ravi -- Per Jessen, Zürich (14.6°C) -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
[PHP] Re: php cli question
On Sat, Sep 11, 2010 at 8:50 PM, Shawn McKenzie nos...@mckenzies.net wrote: On 09/10/2010 11:13 AM, J Ravi Menon wrote: Hi, I have some basic questions on running php (5.2.x series on Linux 2.6) as a standalone daemon using posix methods (fork() etc..): #!/usr/bin/php ?php require_once ('someclass.php'); // do some initializations . // main 'forever' loop - the '$shutdown' will // be set to true via a signal handler while(!$shutdown) { $a = new SomeClass(); $a-doSomething() } // shutdown logic. The 'someclass.php' in turn will include other files (via require_once). The above file will be executed directly from the shell. The main loop could be listening to new requests via sockets etc.. Few questions: 1) Does opcode cache really matter in such cli-based daemons? As 'SomeClass' is instantiated at every loop, I am assuming it is only compiled once as it has already been 'seen'. I am not very clear on how apc (or eaccelerator) works in such cases. 2) What about garbage collection? In a standard apache-mod-php setup, we rely on the end of a request-cycle to free up resources - close file descriptiors, free up memory etc.. I am assuming in the aforesaid standalone daemon case, we would have to do this manually? In the loop above, would it be better to 'unset($a)' explicitly at the end of it before it goes to the next iteration? Note: I have written pre-forker deamons in php directly and successfully deployed them in the past, but never looked at in depth to understand all the nuances. Anecdotally, I have done 'unset()' at some critical places were large arrays were used, and I think it helped. AFAIK, unlike Java, there is no 'garbage collector' thread that does all the magic? Thanks, Ravi If I have time when you reply I'll answer the questions, but I must ask: Is this purely academic? Why is this a concern? Have you encountered issues? If so, what? @Tom: I have compiled php with pcntl on and this has never been an issue. It works well (on a linux setup), and I have deployed standalone daemons with out any major problems. I have a home-grown 'preforker' framework (which I hope to share soon) which can be used to exploit multi-core boxes. @Shawn: It is not academic. There is a follow-up I am planning based on the doubts above. I have deployed such daemons in the past with some assumptions on (2) by doing manual cleanups - e.g. closing curl connections, closing up db handles etc... Really want to understand how php works in such setups outside of apache+mod_php. thanks, Ravi -- Thanks! -Shawn http://www.spidean.com -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Re: Best Practices Book, Document, Web Site?
Other than coding standards, the other good read is: (it seems to cover most topics I have ran into while maintaining a high traffic site implemented in php 5): http://phplens.com/lens/php-book/optimizing-debugging-php.php It is 'best practices' from another angle - use of opcode cache (apc etc..), output buffering and so on. Coding standards vary a lot, so I would recommend sticking to one style once a consensus is reached among the team and preferably enforce it in automated fashion (e.g. http://pear.php.net/package/PHP_CodeSniffer/ as a svn pre-commit hook). The other easily over-looked part I have seen in many php projects is the code layout and directory structure, including dependency management (library code vs business logic etc..), and more importantly only exposing the main 'entry point' scripts (index.php or controller.php in a MVC model) in a apache doc root. Many times I have seen poorly laid out code that ends up getting deployed with the entire code bases exposed in a apache doc root. If care is not taken (e.g. naming some files .inc and no special apache rules to interpret them as a php handler), it is a security nightmare with critical files getting exposed. I have my own layout suggestion which has worked well for us, and once mastered, it makes everyone in the team very productive. Maybe this can be a separate topic in its own right. Ravi On Tue, Mar 2, 2010 at 9:51 AM, Hansen, Mike mike.han...@atmel.com wrote: -Original Message- From: Bob McConnell [mailto:r...@cbord.com] Sent: Tuesday, March 02, 2010 7:52 AM To: pan; php-general@lists.php.net Subject: RE: [PHP] Re: Best Practices Book, Document, Web Site? From: pan Hansen, Mike mike.han...@atmel.com wrote in message news:7941b2693f32294aaf16c26b679a258d0efdc...@csomb01.corp.atm el.com... Is there a PHP Best Practices Book, Document, or web site that has information similar to Perl Best Practices but for PHP? Yeah, it's hard to find this stuff. A google search on {+Best Practices +PHP} returned only 4,340,000 hits. Maybe, some day, someone will think to write something up. The problem with this method is that scanning these results reveals conflicting and contradictory recommendations that are all over the place. Some are so old they may not even be valid PHP any more. Reading even a small subset of these pages is an exercise in frustration. But that makes sense as there doesn't appear to be any consistency nor consensus within the community, or even within some of the larger projects. Speaking of consensus, based on a recent discussion on the Perl Beginners mailing list, the Perl Best Practices book is now considered to be deprecated among the active Perl community. Many of its recommendations are obsolete and no longer used. It is long past due for a major rewrite. Bob McConnell Yep. Perl Best Practices is due for a rewrite/update. I came across this page that attempts to update it: http://www.perlfoundation.org/perl5/index.cgi?pbp_module_recommendation_commentary For PHP, I'll stick with the PEAR recommendations and do the best I can with whatever is missing. Thanks. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] memory efficient hash table extension? like lchash ...
PHP does expose sys V shared-memory apis (shm_* functions): http://us2.php.net/manual/en/book.sem.php If you already have apc installed, you could also try: http://us2.php.net/manual/en/book.apc.php APC also allows you to store user specific data too (it will be in a shared memory). Haven't tried these myself, so I would do some quick tests to ensure if they meet your performance requirements. In theory, it should be faster than berkeley-db like solutions (which is also another option but it seems something similar like MongoDB was not good enough?). I am curious to know if someone here has run these tests. Note that with memcached installed locally (on the same box running php), it can be surprisingly efficient - using pconnect(), caching the handler in a static var for a given request cycle etc... Ravi On Sun, Jan 24, 2010 at 9:39 AM, D. Dante Lorenso da...@lorenso.com wrote: shiplu wrote: On Sun, Jan 24, 2010 at 3:11 AM, D. Dante Lorenso da...@lorenso.com wrote: All, I'm loading millions of records into a backend PHP cli script that I need to build a hash index from to optimize key lookups for data that I'm importing into a MySQL database. The problem is that storing this data in a PHP array is not very memory efficient and my millions of records are consuming about 4-6 GB of ram. What are you storing? An array of row objects?? In that case storing only the row id is will reduce the memory. I am querying a MySQL database which contains 40 million records and mapping string columns to numeric ids. You might consider it normalizing the data. Then, I am importing a new 40 million records and comparing the new values to the old values. Where the value matches, I update records, but where they do not match, I insert new records, and finally I go back and delete old records. So, the net result is that I have a database with 40 million records that I need to sync on a daily basis. If you are loading full row objects, it will take a lot of memory. But if you just load the row id values, it will significantly decrease the memory amount. For what I am trying to do, I just need to map a string value (32 bytes) to a bigint value (8 bytes) in a fast-lookup hash. Besides, You can load row ids in a chunk by chunk basis. if you have 10 millions of rows to process. load 1 rows as a chunk. process them then load the next chunk. This will significantly reduce memory usage. When importing the fresh 40 million records, I need to compare each record with 4 different indexes that will map the record to existing other records, or into a group_id that the record also belongs to. My current solution uses a trigger in MySQL that will do the lookups inside MySQL, but this is extremely slow. Pre-loading the mysql indexes into PHP ram and processing that was is thousands of times faster. I just need an efficient way to hold my hash tables in PHP ram. PHP arrays are very fast, but like my original post says, they consume way too much ram. A good algorithm can solve your problem anytime. ;-) It takes about 5-10 minutes to build my hash indexes in PHP ram currently which makes up for the 10,000 x speedup on key lookups that I get later on. I just want to not use the whole 6 GB of ram to do this. I need an efficient hashing API that supports something like: $value = (int) fasthash_get((string) $key); $exists = (bool) fasthash_exists((string) $key); fasthash_set((string) $key, (int) $value); Or ... it feels like a memcached api but where the data is stored locally instead of accessed via a network. So this is how my search led me to what appears to be a dead lchash extension. -- Dante -- D. Dante Lorenso da...@lorenso.com 972-333-4139 -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] memory efficient hash table extension? like lchash ...
values were stored, the APC storage began to slow down *dramatically*. I wasn't certain if APC was using only RAM or was possibly also writing to disk. Performance tanked so quickly that I set it aside as an option and moved on. IIRC, i think it is built over shm and there is no disk backing store. memcached gives no guarantee about data persistence. I need to have a hash table that will contain all the values I set. They don't need to survive a server shutdown (don't need to be written to disk), but I can not afford for the server to throw away values that don't fit into memory. If there is a way to configure memcached guarantee storage, that might work. True but the lru policy only kicks in lazily. So if you ensure that you never hit near the max allowed limit (-m option), and you store your key-val pairs with no expiry, it will be present till the next restart. So essentially you would have to estimate the value for the -m option to big enough to accommodate all possible key-val pairs (the evictions counter in memcached stats should remain 0). BTW, I have seen this implementation behavior in 1.2.x series but not sure it is necessarily guaranteed in future versions. Ravi On Mon, Jan 25, 2010 at 3:49 PM, D. Dante Lorenso da...@lorenso.com wrote: J Ravi Menon wrote: PHP does expose sys V shared-memory apis (shm_* functions): http://us2.php.net/manual/en/book.sem.php I will look into this. I really need a key/value map, though and would rather not have to write my own on top of SHM. If you already have apc installed, you could also try: http://us2.php.net/manual/en/book.apc.php APC also allows you to store user specific data too (it will be in a shared memory). I've looked into the apc_store and apc_fetch routines: http://php.net/manual/en/function.apc-store.php http://www.php.net/manual/en/function.apc-fetch.php ... but quickly ran out of memory for APC and though I figured out how to configure it to use more (adjust shared memory allotment), there were other problems. I ran into issues with logs complaining about cache slamming and other known bugs with APC version 3.1.3p1. Also, after several million values were stored, the APC storage began to slow down *dramatically*. I wasn't certain if APC was using only RAM or was possibly also writing to disk. Performance tanked so quickly that I set it aside as an option and moved on. Haven't tried these myself, so I would do some quick tests to ensure if they meet your performance requirements. In theory, it should be faster than berkeley-db like solutions (which is also another option but it seems something similar like MongoDB was not good enough?). I will run more tests against MongoDB. Initially I tried to use it to store everything. If I only store my indexes, it might fare better. Certainly, though, running queries and updates against a remote server will always be slower than doing the lookups locally in ram. I am curious to know if someone here has run these tests. Note that with memcached installed locally (on the same box running php), it can be surprisingly efficient - using pconnect(), caching the handler in a static var for a given request cycle etc... memcached gives no guarantee about data persistence. I need to have a hash table that will contain all the values I set. They don't need to survive a server shutdown (don't need to be written to disk), but I can not afford for the server to throw away values that don't fit into memory. If there is a way to configure memcached guarantee storage, that might work. -- Dante On Sun, Jan 24, 2010 at 9:39 AM, D. Dante Lorenso da...@lorenso.com wrote: shiplu wrote: On Sun, Jan 24, 2010 at 3:11 AM, D. Dante Lorenso da...@lorenso.com wrote: All, I'm loading millions of records into a backend PHP cli script that I need to build a hash index from to optimize key lookups for data that I'm importing into a MySQL database. The problem is that storing this data in a PHP array is not very memory efficient and my millions of records are consuming about 4-6 GB of ram. What are you storing? An array of row objects?? In that case storing only the row id is will reduce the memory. I am querying a MySQL database which contains 40 million records and mapping string columns to numeric ids. You might consider it normalizing the data. Then, I am importing a new 40 million records and comparing the new values to the old values. Where the value matches, I update records, but where they do not match, I insert new records, and finally I go back and delete old records. So, the net result is that I have a database with 40 million records that I need to sync on a daily basis. If you are loading full row objects, it will take a lot of memory. But if you just load the row id values, it will significantly decrease the memory amount. For what I am trying to do, I just need to map a string value (32 bytes
Re: [PHP] Object Oriented Programming question
Hi Bob, [Couldn't resist jumping into this topic :)] Even if you look at traditional unix (or similar) kernel internals, although they tend to use functional paradigms, they do have a OOP-like flavor. Example: Everything in a unix system is a 'file' (well not really with networking logic, but it is one of the most important abstractions). There is a notion of a 'abstract' base class 'file', and then there are different 'types' of files - regular, directory, devices etc... So you 'instantiate' a specific 'concrete' object when dealing with a specific file. What are the methods that apply to all files? There is open(), close(), read(), write(), ioctl() etc... Not all methods are valid for certain kinds of files - e.g. usually you don't write() to a keyboard device. In unix and C, the OOP is modeled using structs (to store various attributes, or data members), and each struct tends to have 'pointer-to-functions' (listed above in case of files) to actual implementation on how to deal with such objects in the system. In fact the device-driver framework in unix can be thought of as an excellent example of polymorphism where a table stores all the specific functions that operate on the device. Grouping data and its associated operations is one of the hallmarks of OOP. In C, there is no *direct* support to express such groupings where as in C++ (and other OOP languages), there is direct support via notion of 'classes' to express such relationships. I would recommend this book: 'The design and evolution of C++' by Bjarne Stroustrup where such topics are discussed more in depth. Hope this helps. Ravi On Wed, Jan 20, 2010 at 8:31 AM, Bob McConnell r...@cbord.com wrote: From: tedd At 10:26 AM -0500 1/19/10, Bob McConnell wrote: Some problems will fit into it, some don't. I teach OOP thinking at the local college and haven't run into a problem that doesn't fit. For example, in my last class I had a woman who wanted to pick out a blue dress for her upcoming wedding anniversary. The class worked out the problem with a OOP solution. Hi Tedd, Here's one you can think about. I have a box, purchased off the shelf, with multiple serial ports and an Ethernet port. It contains a 68EN383 CPU with expandable flash and RAM. The firmware includes a simple driver application to create extended serial ports for MS-Windows, but allows it to be replaced with a custom application. The included SDK consists of the gcc cross-compiler and libraries with a Xinu kernel and default drivers for a variety of standard protocols. I need to build a communications node replacing the default drivers with custom handlers for a variety of devices. It must connect to a server which will send it configuration messages telling it what hardware and protocols will be connected to each port. The Xinu package includes Posix threads. In the past 23 years I have solved this problem six times with five different pieces of hardware. But I still don't see how to apply OOP to it. Some people can look at problems and see objects and some can't. That's for certain -- but in time just about everyone can understand the basic concepts of OOP. Understanding basic concepts and understanding how to map them on to real problems are two entirely different skill sets. I understand the concepts, they just don't make any sense to me. All of the definitions are backwards from the way I learned to evaluate problems. I feel like a carpenter trying to figure out how to use a plumber's toolbox. There are some things in there I think I recognize, but most of it is entirely foreign to me. Cheers, Bob McConnell -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Re: PHP programming strategy; lots of little include files, or a few big ones?
Hi, A note on bytecode caching and include/include_once performance. A while ago when we were profiling our code, we did notice that file includes do take a noticeable percentage of overall overhead (enough for us to look into it more deep). We are using apc cache on a standard LAMP platform (linux 2.6 series, apache 2.2x and PHP 5 series). Our includes were using 'relative' paths (e.g. include_once '../common/somefile.inc' or include_once 'lib/somefuncs.inc' ) and within APC cache logic, it resolves such relative paths to absolute paths via a realpath() calls. This can be fairly file-system intensive (lots of syscalls like stat() and readlink() to resolve symlinks etc...). APC uses absolute paths as key into the opcode cache. This gets worse if it has to find your files via the 'ini_path' setting (and most of your library or common code is not in the first component or so ). So from APC cache perspective, it is most efficient if your include paths are all absolute (realpath() logic is skipped) - e.g.: include_once $BASE_DIR . '/common/somefile.inc'; include_once $BASE_DIR . '/lib/somefuncs.inc'; and so on where '$BASE_DIR' could be set via apache Setenv directives ( $_SERVER['BASE_DIR'] or even hardcoded all over the place). There were other issues with include vs include_once and apc cache, but I don't recall why there were performance difference (with include only even with relative paths, the performance was better, but managing dependencies is to cumbersome). Not sure how other bytecode cache handles relative paths but I suspect it has to do something similar. From a pure code readability point of view and more automated dependency management (as close to compiled languages as possible), I do favor include_once/require_once strategy with absolute path strategy, but it is not unheard of where to squeeze out maximal performance, a giant single 'include' is done. Sometimes this is done on prod. systems where a parser goes through and generates this big include file, and ensure it is placed somewhere in the beginning the main 'controller.php' (MVC model) and all other includes stripped off. Hope this helps in making your decision. Ravi On Fri, Jan 8, 2010 at 8:59 AM, Robert Cummings rob...@interjinn.com wrote: clanc...@cybec.com.au wrote: On Thu, 07 Jan 2010 22:48:59 -0500, rob...@interjinn.com (Robert Cummings) wrote: clanc...@cybec.com.au wrote: Thank you all for your comments. I did not know about bytecode caches. They're an interesting concept, but if I am interpreting the paper http://itst.net/654-php-on-fire-three-opcode-caches-compared correctly they only double the average speed of operation, which is rather less than I would have anticipated. I strongly advise that you take the time to try a bytecode cache. Within linux environments I am partial to eaccelerator. In IIS environments I now use WinCache from Microsoft. From my own observations with a multitude of different types of PHP web applications I find that the speed gain is closer to 5 times faster on average. Five times faster is certainly more attractive than twice as fast. But under what circumstances is this achieved? Unfortunately these days it is difficult to find any solid information on how things actually work, but my impression is that caches only work for pages which are frequently accessed. If this is correct, and (as I suspect) somebody looks at my website once an hour, the page will not be in the cache, so it won't help. Also one of the more popular parts of this website is my photo album, and for this much of the access time will be the download time of the photos. Furthermore as each visitor will look at a different set of photos, even with heavy access it is unlikely that any given photo would be in a cache. A particular cache of bytecode is usually pushed out of memory when the configured maximum amount of memory for the bytecode cache is about to be exceeded. Additionally, the particular cache that gets eliminated is usually the oldest or least used cache. Given this, and your purported usage patterns, your pages will most likely remain in the cache until such time as you update the code or restart the webserver. Despite these comments the access times for my websites seem to be pretty good -- certainly a lot better than many commercial websites -- but have a look at http://www.corybas.com/, and see what you think. (I am in the process of updating this, and know that the technical notes are not currently working, but there is plenty there to show you what I'm trying to do.) I'm not disputing your good enough statistics. I'm merely asserting that a bytecode cache will resolve your concerns about file access times when your code is strewn across many compartmentalized files. In addition, I am advising that it is good practice to always install a bytecode cache. One of the first things I do when setting up a new system is to ensure I put an
Re: [PHP] Re: PHP programming strategy; lots of little include files, or a few big ones?
Sorry forgot to mention that we used APC with apc.stat turned off which will give a little bit more performance gain, but it does mean flushing the cache on every code push (which is trivial). Ravi On Fri, Jan 8, 2010 at 11:30 AM, J Ravi Menon jravime...@gmail.com wrote: Hi, A note on bytecode caching and include/include_once performance. A while ago when we were profiling our code, we did notice that file includes do take a noticeable percentage of overall overhead (enough for us to look into it more deep). We are using apc cache on a standard LAMP platform (linux 2.6 series, apache 2.2x and PHP 5 series). Our includes were using 'relative' paths (e.g. include_once '../common/somefile.inc' or include_once 'lib/somefuncs.inc' ) and within APC cache logic, it resolves such relative paths to absolute paths via a realpath() calls. This can be fairly file-system intensive (lots of syscalls like stat() and readlink() to resolve symlinks etc...). APC uses absolute paths as key into the opcode cache. This gets worse if it has to find your files via the 'ini_path' setting (and most of your library or common code is not in the first component or so ). So from APC cache perspective, it is most efficient if your include paths are all absolute (realpath() logic is skipped) - e.g.: include_once $BASE_DIR . '/common/somefile.inc'; include_once $BASE_DIR . '/lib/somefuncs.inc'; and so on where '$BASE_DIR' could be set via apache Setenv directives ( $_SERVER['BASE_DIR'] or even hardcoded all over the place). There were other issues with include vs include_once and apc cache, but I don't recall why there were performance difference (with include only even with relative paths, the performance was better, but managing dependencies is to cumbersome). Not sure how other bytecode cache handles relative paths but I suspect it has to do something similar. From a pure code readability point of view and more automated dependency management (as close to compiled languages as possible), I do favor include_once/require_once strategy with absolute path strategy, but it is not unheard of where to squeeze out maximal performance, a giant single 'include' is done. Sometimes this is done on prod. systems where a parser goes through and generates this big include file, and ensure it is placed somewhere in the beginning the main 'controller.php' (MVC model) and all other includes stripped off. Hope this helps in making your decision. Ravi On Fri, Jan 8, 2010 at 8:59 AM, Robert Cummings rob...@interjinn.com wrote: clanc...@cybec.com.au wrote: On Thu, 07 Jan 2010 22:48:59 -0500, rob...@interjinn.com (Robert Cummings) wrote: clanc...@cybec.com.au wrote: Thank you all for your comments. I did not know about bytecode caches. They're an interesting concept, but if I am interpreting the paper http://itst.net/654-php-on-fire-three-opcode-caches-compared correctly they only double the average speed of operation, which is rather less than I would have anticipated. I strongly advise that you take the time to try a bytecode cache. Within linux environments I am partial to eaccelerator. In IIS environments I now use WinCache from Microsoft. From my own observations with a multitude of different types of PHP web applications I find that the speed gain is closer to 5 times faster on average. Five times faster is certainly more attractive than twice as fast. But under what circumstances is this achieved? Unfortunately these days it is difficult to find any solid information on how things actually work, but my impression is that caches only work for pages which are frequently accessed. If this is correct, and (as I suspect) somebody looks at my website once an hour, the page will not be in the cache, so it won't help. Also one of the more popular parts of this website is my photo album, and for this much of the access time will be the download time of the photos. Furthermore as each visitor will look at a different set of photos, even with heavy access it is unlikely that any given photo would be in a cache. A particular cache of bytecode is usually pushed out of memory when the configured maximum amount of memory for the bytecode cache is about to be exceeded. Additionally, the particular cache that gets eliminated is usually the oldest or least used cache. Given this, and your purported usage patterns, your pages will most likely remain in the cache until such time as you update the code or restart the webserver. Despite these comments the access times for my websites seem to be pretty good -- certainly a lot better than many commercial websites -- but have a look at http://www.corybas.com/, and see what you think. (I am in the process of updating this, and know that the technical notes are not currently working, but there is plenty there to show you what I'm trying to do.) I'm not disputing your good enough statistics. I'm merely asserting