Hi,

 

yes, the FOREACH block will skip the second line because it starts with
a whitespace. All normal matching condition will ignore it, too.


Ruta applies a coverage based (in)visibility concept and this
implementation need to able to handle overlapping annotations in a
symmetric way.

In practice this means that all annotations are invisible and will be
ignored that start or end with something invisible. In your example,
whitespaces are still invisible (filtered) and thus the second line is
invisible for the matching condition. This sounds unreasonable, but it
is really important.

There are many options for you to avoid this problem depending on the
overall application and use case.

- you could make whitespaces visible if they matter

Something like (not tested);

DECLARE Test;

ADDRETAINTYPE(WS);

BLOCK(ForEach) Line{}{
    W+{->Test};
}

REMOVERETAINTYPE(WS);

- you could adapt your regex

- you could trim the line annotations

Something like (not tested):

DECLARE Line;
"[^\\r\\n]+" -> Line;

ADDRETAINTYPE(WS);
Line{-> TRIM(WS)};
REMOVERETAINTYPE(WS);

Line{->SHIFT(Line,1,2)} BR;

 

 

Best

 

Peter

Am 04.05.2021 um 14:26 schrieb Michael Bach:
> Hi!
>
> I’m struggling a bit with the usage of ForEach blocks. I’ve written a few 
> rules for structure looking like this:
>
> Document{-> RETAINTYPE(BREAK)};
>
> DECLARE BR;
> "\\r?\\n" -> BR;
>
> DECLARE Line;
> "[^\\r\\n]+" -> Line;
> Line{->SHIFT(Line,1,2)} BR;
>
> DECLARE Empty_Line;
> "\\r?\\n[ ]*(\\r?\\n)" -> 1=Empty_Line;
>
> DECLARE After_Empty,Before_Empty;
> Line{->Before_Empty} Empty_Line;
> Empty_Line Line{->After_Empty};
>
> DECLARE Paragraph;
> Line+{-PARTOF(Paragraph)->Paragraph};
>
>
> Seems to do exactly what I want, but it seems that for some reason, 
> ForEach-Blocks „skip“ some of the Lines. For instance, when a line starts 
> with a leading SPACE, it is being skipped.
>
> For instance, given this script:
>
> DECLARE Test;
> BLOCK(ForEach) Line{}{
>     W+{->Test};
> }
>
> And this input:
>
> This will match
>   This won’t
> But this will
> This too
>
>
> Any hints why the ForEach block might be skipping the second line?
>
> Cheers,
> Michael

-- 
Dr. Peter Klügl
Head of Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó

Reply via email to