On 12/17/2010 12:54 AM, Paul Cowan wrote: > Hi, > > I am looking at training a couple of models from the same data and I would > like some advice on how to tag the training data. > > Here is an example of some data and the tags I would use: > > <div class="details"> > <address><strong class="context"><START:organisation>THALES LAND AND > JOINT SYSTEMS<END></strong><br />Total Signature Management<br /> > <START:address>Wookey Hole Road<br /> > Wells<br /> > Somerset<br /> > BA5 1AA<END></address> > <p class="tel"><strong>Tel:</strong> +44 (0)1749 682384</p> > <p class="fax"><strong>Fax:</strong> +44 (0)1749 682235</p> > <p><strong>Website:</strong> <a target="_blank" > href="http://www.thalesgroup.com/landjoint/">www.thalesgroup.com/landjoint/</a></p> > <p><strong>Email:</strong> <a > href="mailto:julian.barber@uk.thalesgroup.com?subject=Enquiry%20from%20Defence%20Suppliers%20Directory&cc=defenceenquiries@armedforces.co.uk">julian.barber@uk.thalesgroup.com</a></p> > </div> > > I have the following questions that I would appreciate an answer for: > > 1. Can I have the different name finding tags in the same data? > 2. Does the <START:address> <END> make sense over multiple lines or should I > break this up further? > 3. I want to use 200 or 300 different examples, do I need to create separate > files for each example or can I merge them all into 1 and if it is only 1, > do I need to mark up the start and end of a file? > > Cheers > > Paul Cowan > Cheers > > Paul Cowan > > Cutting-Edge Solutions (Scotland) > > http://thesoftwaresimpleton.blogspot.com/ > Paul,
The current models support multiple tags, as long as they do not overlap. The current algorithms like to work with finding items in the same sentence, or line. You could accomplish this by keeping the entire HTML document as a single sentence to the model. You can merge them into one file, its what we do with the documents. You could use the <HTML> </HTML> tags to signal the start and end of the documents. No need to create a lot of work. James
