Hey, I'm not sure if this is possible, and if it's not, I'll explore a better way to do this.
I would like to write a script which analyzes if a line of text is (likely) a broken natural language sentence, i.e., it is probably part of a sentence, even if the start or end is not present, rather than it being a fully "complete" linguistic entity, for example, a header of a section, which does not have a period at the end and is not really a sentence, yet is in a complete and unbroken form. I'm pretty sure in principle this will require some kind of syntax parsing. I think I read somewhere regular expressions for some mathematical reason cannot parse tree / nested structures, for example HTML. Does anyone know what some next most ubiquitous, standard tool is for analyzing nested linguistic structures? Is that an XML parser? Thanks very much, Julius