php-general Digest 1 Feb 2011 17:59:42 -0000 Issue 7162
Topics (messages 311108 through 311115):
Re: Detecting Multi-Scope Variables
311108 by: Tommy Pham
Pulling from Multiple Databases
311109 by: Jon Hood
311110 by: Jay Blanchard
311111 by: Richard Quadling
311112 by: Jon Hood
311113 by: Adam Richardson
311114 by: Richard Quadling
311115 by: Jon Hood
Administrivia:
To subscribe to the digest, e-mail:
[email protected]
To unsubscribe from the digest, e-mail:
[email protected]
To post to the list, e-mail:
[email protected]
----------------------------------------------------------------------
--- Begin Message ---
> -----Original Message-----
> From: Brad Lorge [mailto:[email protected]]
> Sent: Monday, January 31, 2011 9:53 PM
> To: [email protected]
> Subject: [PHP] Detecting Multi-Scope Variables
>
> Hello All,
>
> I am new to the list so please be gentle :)
>
> I am working on a PHP framework and have run up against a functionality
> hurdle which I keep falling at. Basically, I have three mechanisms which
all
> function in a similar way and require this functionality: templating,
event
> handling and "action handling". Within the core code of the application,
as
> is common with many applications with plugin architecture, I pass a number
> of parameters to functions which have hooked into a particular "event".
Part
> of the mechanism is that parameters can be passed by reference to allow
> for the listeners to make modifications.
>
> $username="bob";$account_type="ISV";$password="fishbum";
>
> register_action_listener('process_user', function($username,
> $account_type, $password){$username.="." . $account_type;} // Or
> whatever
>
> call_action('process_user', &$username, &$account_type, &$password);
> //Result: $username == "bob.ISV"
I think you meant to use [1].
>
> Now, what I am trying to do is establish a method to prevent the "hook"
> functions from making changes by reference without reference explicitly
> being passed to them by the calling code.
>
Perhaps you should review [2] and see if your logic works with your
'call_action'.
> I have thought of a method which simply makes a copy of all the parameters
> for each listener within call_action(), however what I would really love
is a
> function which returns whether or not the supplied variable is available
in
> multiple scopes or is in the original scope which it was initialized in.
> Does anyone know of a way to achieve this?
>
> Regards,
> Brad
Happy coding,
Tommy
[1] http://php.net/call_user_func
[2] http://php.net/references
--- End Message ---
--- Begin Message ---
I have a website that is currently pulling from more than 30 databases,
combining the data, and displaying it to the user. As more and more
databases are added, the script continues to get slower and slower, and I've
realized that I need to either find a way to pull these data in parallel. So
- what is the preferred method of pulling data from multiple locations in
parallel?
Thanks,
Jon
--- End Message ---
--- Begin Message ---
[snip]
I have a website that is currently pulling from more than 30 databases,
combining the data, and displaying it to the user. As more and more
databases are added, the script continues to get slower and slower, and
I've
realized that I need to either find a way to pull these data in
parallel. So
- what is the preferred method of pulling data from multiple locations
in
parallel?
[/snip]
Stage the data in a view or views.
--- End Message ---
--- Begin Message ---
On 1 February 2011 15:59, Jon Hood <[email protected]> wrote:
> I have a website that is currently pulling from more than 30 databases,
> combining the data, and displaying it to the user. As more and more
> databases are added, the script continues to get slower and slower, and I've
> realized that I need to either find a way to pull these data in parallel. So
> - what is the preferred method of pulling data from multiple locations in
> parallel?
>
> Thanks,
> Jon
>
I use a data warehouse (a semi denormalized db) to hold data from
around 200 different data sources (DB, Excel spreadsheets, Web, etc.)
I use multiple scripts to update the DB, each one tuned to a
particular frequency.
My main app's queries are always against the data warehouse.
That way, the live front end isn't worried about getting the source data.
If the source data changes frequently, then you can poll more frequently.
I'm on Windows and the Windows Scheduler works great for me. If the
load on the machine during the polling is high, then offload it to a
backend machine. No need for this machine to be forward facing.
For the straight SQL sources, I'm looking at using the data warehouse
itself (SQL Server 2008 R2) to use it's own job server to handle the
data retrieval. That way, all the "data processing" is in the SQL
server, rather than in a load of scripts.
For those sources like Excel, then there are data conversion tools I
can use - Excel exists in the ODBC space, so I can use that as another
data source (I think - I've not tried this).
But whatever I do, I don't try to get live data to the app on every
request. It simply takes way too much time.
If the data is needed live, then I'd be looking to see if I can get a
live data feed from the source system. Essentially a push of the data
to my data warehouse - or a staging structure to allow locally defined
triggers to clean/process the data upon arrival.
Automation is the key for me here. Rather than trying to do everything
for the request, respond to the changes in the data or live with the
fact that the data may be stale.
Can you give us any clues as to the sort of app you are building? The
sort of data you are working on? Are you running your own servers?
--
Richard Quadling
Twitter : EE : Zend
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY
--- End Message ---
--- Begin Message ---
(comments in-line)
On Tue, Feb 1, 2011 at 10:34 AM, Richard Quadling <[email protected]>wrote:
> I use a data warehouse (a semi denormalized db) to hold data from
> around 200 different data sources (DB, Excel spreadsheets, Web, etc.)
>
> I use multiple scripts to update the DB, each one tuned to a
> particular frequency.
>
>
A different script for each database is a possibility. It's a little extra
load on the app server, but it should be able to handle it. Maybe with
pcntl_fork? I haven't explored this option much.
> My main app's queries are always against the data warehouse.
>
> That way, the live front end isn't worried about getting the source data.
>
> If the data is needed live, then I'd be looking to see if I can get a
> live data feed from the source system. Essentially a push of the data
> to my data warehouse - or a staging structure to allow locally defined
> triggers to clean/process the data upon arrival.
>
> Automation is the key for me here. Rather than trying to do everything
> for the request, respond to the changes in the data or live with the
> fact that the data may be stale.
>
> Can you give us any clues as to the sort of app you are building? The
> sort of data you are working on? Are you running your own servers?
>
Data are needed live. 3 of the databases are MySQL. 14 are XML files that
change frequently. 3 are JSON. 1 is Microsoft SQL Server 2005. The main app
is running on linux (distribution doesn't matter - currently Debian, but I
can change it to whatever if there's a reason). Most is financial data that
needs ordered.
I'm going to explore the pcntl_fork option some more...
Thanks!
Jon
--- End Message ---
--- Begin Message ---
On Tue, Feb 1, 2011 at 10:59 AM, Jon Hood <[email protected]> wrote:
> I have a website that is currently pulling from more than 30 databases,
> combining the data, and displaying it to the user. As more and more
> databases are added, the script continues to get slower and slower, and
> I've
> realized that I need to either find a way to pull these data in parallel.
> So
> - what is the preferred method of pulling data from multiple locations in
> parallel?
>
> Thanks,
> Jon
>
Well, you could turn the calls into REST-based web requests that produce
json. Then you could use curl_multi to grab the results in parallel and
quick decode the json.
Adam
P.S. - Sorry for the duplicate, Jon, I forgot to copy the list.
--
Nephtali: A simple, flexible, fast, and security-focused PHP framework
http://nephtaliproject.com
--- End Message ---
--- Begin Message ---
On 1 February 2011 16:39, Jon Hood <[email protected]> wrote:
> (comments in-line)
>
> On Tue, Feb 1, 2011 at 10:34 AM, Richard Quadling <[email protected]>
> wrote:
>>
>> I use a data warehouse (a semi denormalized db) to hold data from
>> around 200 different data sources (DB, Excel spreadsheets, Web, etc.)
>>
>> I use multiple scripts to update the DB, each one tuned to a
>> particular frequency.
>>
>
> A different script for each database is a possibility. It's a little extra
> load on the app server, but it should be able to handle it. Maybe with
> pcntl_fork? I haven't explored this option much.
>
>>
>> My main app's queries are always against the data warehouse.
>>
>> That way, the live front end isn't worried about getting the source data.
>>
>> If the data is needed live, then I'd be looking to see if I can get a
>> live data feed from the source system. Essentially a push of the data
>> to my data warehouse - or a staging structure to allow locally defined
>> triggers to clean/process the data upon arrival.
>>
>> Automation is the key for me here. Rather than trying to do everything
>> for the request, respond to the changes in the data or live with the
>> fact that the data may be stale.
>>
>> Can you give us any clues as to the sort of app you are building? The
>> sort of data you are working on? Are you running your own servers?
>
> Data are needed live. 3 of the databases are MySQL. 14 are XML files that
> change frequently. 3 are JSON. 1 is Microsoft SQL Server 2005. The main app
> is running on linux (distribution doesn't matter - currently Debian, but I
> can change it to whatever if there's a reason). Most is financial data that
> needs ordered.
>
> I'm going to explore the pcntl_fork option some more...
>
> Thanks!
> Jon
>
If you are in control of the data, there are some things that may be useful.
1 - For tables that require syncing, I've added a timestamp column.
This is an automatically updated column whenever the data changes. In
the code handling the sync, I know that I don't need to retrieve any
data if the most recent timestamp is the same as the one I last got.
2 - For physical files, and assuming that the last modified
datetimestamp is maintained, then again, you have an indicator to know
if you need to actually process any data.
3 - For JSON ... if it is coming to you over the web, check headers to
see if the server is providing you a cached version. You may also be
able to save yourself the processing time if you know you've got a
stale response.
In a best case scenario, you poll all the source, realize that none of
them have any new data and you supply the data you already have
(caching the data is pretty much essential).
In a worse case scenario, you have to wait until all the data is
polled and stored. Probably no worse than you are already at.
Richard.
--
Richard Quadling
Twitter : EE : Zend
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY
--- End Message ---
--- Begin Message ---
Bah, forgot to reply-all
On Tue, Feb 1, 2011 at 11:59 AM, Jon Hood <[email protected]> wrote:
> Using pcntl_fork mostly accomplished what I wanted (I had to go back and
> create the actual connection in each of the forked processes, otherwise, the
> first process that ended would destroy the connection to the main DB). It
> seems to be working, and while it's not the most elegant solution, and
> certainly not the best performance, it's all I can do for now.
>
> Any chance I can bring up the topic of native multithreading support? What
> are the chances of feature request
> http://bugs.php.net/bug.php?id=46919getting added to a roadmap for, say, PHP
> version 7?
>
> -Jon
>
--- End Message ---