hi...

i have a project and i'm trying to figure out the best approach to architect
a solution to resolve the issues i'm facing. i'm open to whatever might be
the 'best' solution. keep in mind, this is a 'project' that's my own, kind
of a garage function!!

i'm creating a distributed web parsing/crawling app. it will consist of a
number of nodes in the network whose function is to crawl a site, extract
information from the site, and to return the information to the db/tbls for
the app.

in an effort to speed this whole process, i'm gearing up to being able to
have 100s of crawling apps running in a simultaneous manner. this would
obviously swamp out a single instance of mysql given the limit of the open
connections that you can have.

i've started to look at the idea of having a mysql instance on each crawling
node within the network. this would allow me to have a kind of round robin
approach, so that each crawling/parsing script could write to whatever
'local' mysql db that it finds. this kind of makes sense.

i can then import/pull the information from the local dbs to the master db.

however, i'm also running into a situation where i might need to
delete/flush data written to a local db/tbl by one of the crawling apps in
the even the app fails. in this case, i'd essentially have to search each of
the 'local' mysql dbs in order to do the flush/delete, as i wouldn't know
which db the crawling app that i've killed had been writing to...

which is a less than elegant solution. i've looked at docs that talk about
master/slave replication/etc...

so.. i'm open to a discussion on the potential solutions to this kind of
scenario. keep in mind, i'm not a mysql dba/guru., just trying to solve this
issue.

thanks

-bruce
[EMAIL PROTECTED]



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to