Re: Extract text from line below/above annotated keyword using RUTA

2019-11-06 Thread Peter Klügl
Hi,


here are some quick rules. It could be solved with fewer rules and also
with better or faster rules. You need essentially a rule for detecting
the structure and a rule for assigning the semantics. The rules would
also work if you have a plain text table with more rows.


Let me know if you have questions about some parts.


Best,

Peter

TYPESYSTEM utils.PlainTextTypeSystem;
ENGINE utils.PlainTextAnnotator;

DECLARE Header;
DECLARE ColumnDelimiter;
DECLARE Cell(INT column);

DECLARE Keyword (STRING label);
DECLARE Keyword UnderWriterNameKeyword, AppraiserNameLicenseKeyword,
AppraisalCompanyNameKeyword;

"Underwriter's Name" -> UnderWriterNameKeyword ( "label" = "UnderWriter
Name");
"Appraiser's Name/License" -> AppraiserNameLicenseKeyword ( "label" =
"Appraiser Name");
"Appraisal Company Name" -> AppraisalCompanyNameKeyword ( "label" =
"Appraisal Company Name");

DECLARE Entry(Keyword keyword);

EXEC(PlainTextAnnotator, {Line,Paragraph});

ADDRETAINTYPE(WS);
Line{->TRIM(WS)};
Paragraph{->TRIM(WS)};

SPACE[3,100]{-PARTOF(ColumnDelimiter) -> ColumnDelimiter};
Line -> {ANY+{-PARTOF(Cell),-PARTOF(ColumnDelimiter) -> Cell};};
REMOVERETAINTYPE(WS);

INT index = 0;
BLOCK(structure) Line{}{
    ASSIGN(index, 0);
    Line{STARTSWITH(Paragraph) -> Header};
    c:Cell{-> c.column = index, index = index + 1};
}

Header<-{hc:Cell{hc.column == c.column}<-{k:Keyword;};}
    # c:@Cell{-PARTOF(Header) -> e:Entry, e.keyword = k};

DECLARE Entity (STRING label, STRING value);
DECLARE Entity UnderWriterName, AppraiserNameLicense, AppraisalCompanyName;

FOREACH(entry) Entry{}{
    entry{ -> CREATE(UnderWriterName, "label" = k.label, "value" =
entry.ct)}<-{k:entry.keyword{PARTOF(UnderWriterNameKeyword)};};
    entry{ -> CREATE(AppraiserNameLicense, "label" = k.label, "value" =
entry.ct)}<-{k:entry.keyword{PARTOF(AppraiserNameLicenseKeyword)};};
    entry{ -> CREATE(AppraisalCompanyName, "label" = k.label, "value" =
entry.ct)}<-{k:entry.keyword{PARTOF(AppraisalCompanyNameKeyword)};};
}



Am 06.11.2019 um 12:45 schrieb Shashank Pathak:
> Hi Peter,
>
> I am trying to get information from a indented text file.
>
> Input file text:
> Underwriter's Name  Appraiser's Name/License  Appraisal
> Company Name
> Alice Wheaton   Bruce Banner  Stark
> Industries
>
> Approach:
>I am trying to annotate fixed keywords like "Underwriter's Name" and
> then go to line next to this annotated keyword.
>But I am not able to fetch UnderWriter's Name. It is giving all
> instances which are matched(Alice Wheaton  Bruce, Wheaton Bruce Banner,
> etc).
>
>
> Code :
>
> TYPESYSTEM utils.PlainTextTypeSystem;
> ENGINE utils.PlainTextAnnotator;
>
> EXEC(PlainTextAnnotator, {Line});
> ADDRETAINTYPE(WS);
> Line{->TRIM(WS)};
> REMOVERETAINTYPE(WS);
> Document{->FILTERTYPE(SPECIAL)};
>
> DECLARE UnderWriterKeyword, NameKeyword, UnderWriterNameKeyword;
> DECLARE UnderWriterName(String label, String value);
>
> CW{REGEXP("\\bUnderwriter") -> UnderWriterKeyword};
> CW{REGEXP("Name")->NameKeyword};
> (UnderWriterKeyword SW NameKeyword){->UnderWriterNameKeyword};
> Line{CONTAINS(UnderWriterNameKeyword)} Line -> {
>n:CW[1,3]{-> CREATE(UnderWriterName, "label"="UnderWriter Name",
> "value"=n.ct)};
>};
>
> Please tell me whether it is possible to achieve this using RUTA or not.
> Also share steps to get Underwriter's Name, Appraiser's Name/License and
> Appraisal Comapny Name.
> I have already posted question similar to this on stackoverflow
> https://stackoverflow.com/questions/58726610/using-ruta-get-a-data-present-in-next-line-of-annotated-keyword/58728364#58728364
>
> Thanks,
>
> Shashank Pathak
>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Extract text from line below/above annotated keyword using RUTA

2019-11-06 Thread Shashank Pathak
Hi Peter,

I am trying to get information from a indented text file.

Input file text:
Underwriter's Name  Appraiser's Name/License  Appraisal
Company Name
Alice Wheaton   Bruce Banner  Stark
Industries

Approach:
   I am trying to annotate fixed keywords like "Underwriter's Name" and
then go to line next to this annotated keyword.
   But I am not able to fetch UnderWriter's Name. It is giving all
instances which are matched(Alice Wheaton  Bruce, Wheaton Bruce Banner,
etc).


Code :

TYPESYSTEM utils.PlainTextTypeSystem;
ENGINE utils.PlainTextAnnotator;

EXEC(PlainTextAnnotator, {Line});
ADDRETAINTYPE(WS);
Line{->TRIM(WS)};
REMOVERETAINTYPE(WS);
Document{->FILTERTYPE(SPECIAL)};

DECLARE UnderWriterKeyword, NameKeyword, UnderWriterNameKeyword;
DECLARE UnderWriterName(String label, String value);

CW{REGEXP("\\bUnderwriter") -> UnderWriterKeyword};
CW{REGEXP("Name")->NameKeyword};
(UnderWriterKeyword SW NameKeyword){->UnderWriterNameKeyword};
Line{CONTAINS(UnderWriterNameKeyword)} Line -> {
   n:CW[1,3]{-> CREATE(UnderWriterName, "label"="UnderWriter Name",
"value"=n.ct)};
   };

Please tell me whether it is possible to achieve this using RUTA or not.
Also share steps to get Underwriter's Name, Appraiser's Name/License and
Appraisal Comapny Name.
I have already posted question similar to this on stackoverflow
https://stackoverflow.com/questions/58726610/using-ruta-get-a-data-present-in-next-line-of-annotated-keyword/58728364#58728364

Thanks,

Shashank Pathak