Hey,

I'm not sure if this is possible, and if it's not, I'll explore a better
way to do this.

I would like to write a script which analyzes if a line of text is (likely)
a broken natural language sentence, i.e., it is probably part of a
sentence, even if the start or end is not present, rather than it being a
fully "complete" linguistic entity, for example, a header of a section,
which does not have a period at the end and is not really a sentence, yet
is in a complete and unbroken form.

I'm pretty sure in principle this will require some kind of syntax parsing.
I think I read somewhere regular expressions for some mathematical reason
cannot parse tree / nested structures, for example HTML.

Does anyone know what some next most ubiquitous, standard tool is for
analyzing nested linguistic structures? Is that an XML parser?

Thanks very much,
Julius

Reply via email to