hi... i have a project and i'm trying to figure out the best approach to architect a solution to resolve the issues i'm facing. i'm open to whatever might be the 'best' solution. keep in mind, this is a 'project' that's my own, kind of a garage function!!
i'm creating a distributed web parsing/crawling app. it will consist of a number of nodes in the network whose function is to crawl a site, extract information from the site, and to return the information to the db/tbls for the app. in an effort to speed this whole process, i'm gearing up to being able to have 100s of crawling apps running in a simultaneous manner. this would obviously swamp out a single instance of mysql given the limit of the open connections that you can have. i've started to look at the idea of having a mysql instance on each crawling node within the network. this would allow me to have a kind of round robin approach, so that each crawling/parsing script could write to whatever 'local' mysql db that it finds. this kind of makes sense. i can then import/pull the information from the local dbs to the master db. however, i'm also running into a situation where i might need to delete/flush data written to a local db/tbl by one of the crawling apps in the even the app fails. in this case, i'd essentially have to search each of the 'local' mysql dbs in order to do the flush/delete, as i wouldn't know which db the crawling app that i've killed had been writing to... which is a less than elegant solution. i've looked at docs that talk about master/slave replication/etc... so.. i'm open to a discussion on the potential solutions to this kind of scenario. keep in mind, i'm not a mysql dba/guru., just trying to solve this issue. thanks -bruce [EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]