Hi,

has anyone built a parsing plugin which decides on a per host basis how the 
content of the document should be parsed?

For example, if the title of a document is in the first <h1>-tag of a page for 
host1 , but the title for a document of host2 is in the third <h2>-tag, the 
plugin would extract the title differently depending on the host.

In my opinion something like a dispatcher plugin would be needed:

-          Identify host of a document

-          Read and cache instructions on how to get the information for that 
host (database or config file)

-          Execute host-specific plugin

Do you have any suggestions on how to implement such a scenario efficiently? 
Has anyone implemented something similiar and can point out possible 
performance issues or other critical issues to be considered?

Thanks in advance.

Kind regards,
Martina

Reply via email to