This is per reference of this question I raised on StackOverflow As per @Peter Kluegl there is too much scope for code improvement. So eagerly looking how can I improve this script https://stackoverflow.com/questions/44351051/uima-ruta-out-of-memory-issue-in-spark-context
========================================================= TYPESYSTEM EDMTypeSystem; WORDLIST EnglishStopWordList = 'en/anchor/en_stopWords.txt'; WORDLIST FiltersList = 'en/anchor/AnchorFilters.txt'; DECLARE Filters, EnglishStopWords; DECLARE Anchors, SpanStart,SpanClose; DocumentAnnotation{-> ADDRETAINTYPE(MARKUP)}; DocumentAnnotation{-> MARKFAST(Filters, FiltersList)}; STRING MixCharacterRegex = "[0-9]+[a-zA-Z]+"; DocumentAnnotation{-> MARKFAST(EnglishStopWords, EnglishStopWordList,true)}; (SW | CW | CAP ) { -> MARK(Anchors, 1, 2)}; Anchors{CONTAINS(EnglishStopWords) -> UNMARK(Anchors)}; (SPECIAL{REGEXP("['\"-=()\\[\\]]")}| PM) (SW | CW | CAP ) (SPECIAL{REGEXP("['\"-=()\\[\\]]")}| PM) EnglishStopWords? { -> MARK(Anchors, 1, 4)}; (SPECIAL{REGEXP("['\"-=()\\[\\]]")}| PM)? (SW | CW | CAP ) (SPECIAL{REGEXP("['\"-=()\\[\\]]")}| PM) EnglishStopWords? { -> MARK(Anchors, 1, 4)}; (SPECIAL{REGEXP("['\"-=()\\[\\]]")}| PM) (SW | CW | CAP ) (SPECIAL{REGEXP("['\"-=()\\[\\]]")}| PM)? EnglishStopWords? { -> MARK(Anchors, 1, 4)}; (SW | CW | CAP ) (SPECIAL{REGEXP("['\"-=()\\[\\]]")}| PM) EnglishStopWords? { -> MARK(Anchors, 1, 3)}; Anchors{CONTAINS(MARKUP) -> UNMARK(Anchors)}; MixCharacterRegex -> Anchors; "<Value>" -> SpanStart; "</Value>" -> SpanClose; Anchors{-> CREATE(ExtractedData, "type" = "ANCHOR", "value" = Anchors)}; SpanStart Filters? SPACE? ExtractedData SPACE? Filters? SpanClose{-> GATHER(Data, 2, 6, "ExtractedData" = 4)}; =========================================================