First, in your Sort Lines dialog screenshot, you need to select the 
"Specific sub-patterns:" option instead of "Entire match" in order for the 
lines to be sorted by your column sorting criteria (MSGNO, ADRC_COUNTRY, 
ADRC_REGION, ADRC_POST_CODE1, ADRC_CITY1, ADRC_CITY2, ADRC_STREET and 
ADRC_HOUSE_NUM1). Since the sort lines grep pattern:

\d{3};\w{3};[^;]*;[^;]*;\d{10};(\w{2});(\d{2});(\d{5});([^;]*);[^;]*;([^;]*);([^;]*);([^;]*);[^;]*;\w{2};\d{2};\d{5};[^;]*;\d{12};[^;]*;[^;]*;\d{8};[^;]*;\d{12};[^;]*;[^;]*;\d;\d;\d;\d;\d;\d;\d;\d;([^;]*);[^\n]*

will match every line in your example, using the "Entire match" option 
devolves the sort into a simple whole line string sort which would put the 
MSGNO (i.e. \8 in the example) column contents last instead of first in the 
sort order. (See the "Sort Lines" section in Chapter 5 of the BBEdit User 
Manual for details of using sub-pattern sort ordering.)

With the "Entire match" option, if you look at every 2..> line the left 
part of each line is the same until you get to the part of the string with 
the ADRC_ADDRNUMBER characters so the differences in that part of the 
string is Sort Line's "Entire match" is using to determine the ordering of 
the whole line strings.

Using the "Specific sub-patterns:" option is what allows you to specify 
what substring part(s) of a string/line and what composed ordering of those 
concatenated substring will be used in determining the sort ordering 
between whole strings/lines.

To see what's going on with Sort Lines' "Specific sub-patterns:" option you 
can use BBEdit's Pattern Playground to see what the concatenated substring 
for a line is being used to determine line sort ordering. For "Search 
pattern:" put:

\d{3};\w{3};[^;]*;[^;]*;\d{10};(\w{2});(\d{2});(\d{5});([^;]*);[^;]*;([^;]*);([^;]*);([^;]*);[^;]*;\w{2};\d{2};\d{5};[^;]*;\d{12};[^;]*;[^;]*;\d{8};[^;]*;\d{12};[^;]*;[^;]*;\d;\d;\d;\d;\d;\d;\d;\d;([^;]*);[^\n]*

and for "Replace pattern" put:

\8\1\2\3\4\5\6\7

and for "Contents of" chose an open example file.

As you step through each grep pattern match (using the Next button), the 
"Replacement text:" field will show you the concatenated string composed 
from the capture group ordered substring of the whole matched string/line. 
It is that "Replacement text:" string that Sort Lines uses for "Specific 
sub-patterns:" option sorting evaluation.

P.S. If an explanation of what the parts of a grep regular expression is 
specifying would help,  https://regex101.com has a pretty good explanation 
panel that explains what each bit of a regular expression is doing. 
On Wednesday, March 26, 2025 at 6:24:57 AM UTC-7 Vlad Ghitulescu wrote:

> Hey GP
>
>
> And thanks for the suggestion!
>
> I tried the sort-solution before trying to understand the regex itself 😶
>
> I pasted into Text —> Sort Lines… like this
>
> [image: CleanShot 2025-03-26 at 08.24.24.png]
>
> but after Sort it doesn’t look like the postal code column was considered
>
> [image: CleanShot 2025-03-26 at 08.25.19.png]
>
> Did I miss something?
>
> Thanks again!
>
>
> Regards,
> Vlad
>
>
>
>
>
> Am 25.03.2025 um 22:32 schrieb GP <[email protected]>:
>
> As a follow up...
>
> BBEdit's Pattern Playground is a great help in constructing tedious grep 
> patterns like you'll need for your filtering and sorting needs. The really 
> tedious part is getting the field position(s) you want to filter or sort on 
> so you can modify that field's match pattern to conform to the desired 
> filter or sorting criteria.
>
> For example... For your " Filter all lines that have ADR_CHK_KZ = 1" using 
> Text -> Process Lines Containing ... with the grep pattern:
>
>
>
> \d{3};\w{3};[^;]*;[^;]*;\d{10};\w{2};\d{2};\d{5};[^;]*;[^;]*;[^;]*;[^;]*;[^;]*;[^;]*;\w{2};\d{2};\d{5};[^;]*;\d{12};[^;]*;[^;]*;\d{8};[^;]*;\d{12};[^;]*;[^;]*;\d;\d;\d;\d;\d;\d;\d;(1);[^;]*;[^\n]*
>
> will do the trick. For filtering you don't need the group capturing on the 
> 1 but it is useful with Pattern Playground to verify you're getting the 
> right field position and field contents matched.
>
> For your "Sort the file by MSGNO, ADRC_COUNTRY, ADRC_REGION, 
> ADRC_POST_CODE1, ADRC_CITY1, ADRC_CITY2, ADRC_STREET and ADRC_HOUSE_NUM1" 
> using Text -> Sort Lines ... with a grep pattern of:
>
>
> \d{3};\w{3};[^;]*;[^;]*;\d{10};(\w{2});(\d{2});(\d{5});([^;]*);[^;]*;([^;]*);([^;]*);([^;]*);[^;]*;\w{2};\d{2};\d{5};[^;]*;\d{12};[^;]*;[^;]*;\d{8};[^;]*;\d{12};[^;]*;[^;]*;\d;\d;\d;\d;\d;\d;\d;\d;([^;]*);[^\n]*
>
> with "Specific sub-patterns" selected with \8\1\2\3\4\5\6\7 in the fill in 
> field will sort your example text using your desired field ordering.
> On Tuesday, March 25, 2025 at 12:53:47 PM UTC-7 GP wrote:
>
>> For filtering, look at Text -> Process Lines Containing ... and for 
>> sorting Text -> Sort Lines ... using grep patterns to identify what you 
>> want to match for filtering and what subpattern field or fields you want to 
>> sort ordered on.
>>
>> If the number of fields in your sample is representative of the real CSV 
>> files you're working with, it is going to be something of a pain in the 
>> rear coming up with the grep patterns needed to accomplish the desired 
>> filtering and sorting.
>>
>> On Tuesday, March 25, 2025 at 11:03:35 AM UTC-7 Vlad Ghitulescu wrote:
>>
>>> Hey, 
>>>
>>>
>>> I use BBEdit very often while working with big CSV-files (300 - 500 MB, 
>>> up to 4 million rows) looking like this: 
>>>
>>> MANDT;BU;IDENTIFIER;OBJNR;ADRC_ADDRNUMBER;ADRC_COUNTRY;ADRC_REGION;ADRC_POST_CODE1;ADRC_CITY1;ADRC_CITY_EXT;ADRC_CITY2;ADRC_STREET;ADRC_HOUSE_NUM1;ADRC_HOUSE_NUM2;LOKAREF_COUNTRY;LOKAREF_REGION;LOKAREF_POST_CODE1;LOKAREF_CITY1;LOKAREF_CITY_CODE;LOKAREF_CITY_EXT;LOKAREF_CITY2;LOKAREF_CITYP_CODE;LOKAREF_STREET;LOKAREF_STRT_CODE;LOKAREF_HOUSE_NUM1;LOKAREF_HOUSE_NUM2;COUNTRY_KZ;REGION_KZ;POST_CODE1_KZ;CITY1_KZ;CITY_EXT_KZ;CITY2_KZ;STREET_KZ;ADR_CHK_KZ;MSGNO;MESSAGE
>>>  
>>>
>>> 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723592;DE;09;86415;Mering;;Sankt
>>>  
>>> Afra;Egerländer Straße;;;DE;09;86415;Mering;500000002795;, Schwab;Sankt 
>>> Afra;00000006;Egerländerstraße;910011919800;;;0;0;0;0;1;0;1;1;; 
>>> 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723657;DE;09;85655;Aying;;Kaps;Kaps;;;DE;09;85653;Aying;500000002262;;Kaps;00000010;Kaps;700055566100;;;0;0;1;0;3;0;0;1;;
>>>  
>>>
>>> 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723658;DE;09;83083;Riedering;;Patting;Patting;;;DE;09;83083;Riedering;500000002552;b
>>>  
>>> Rosenheim, Oberbay;Patting;00000037;Pattinger 
>>> Straße;910003809300;;;0;0;0;0;1;0;1;1;; 
>>> 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723674;DE;09;85655;Aying;;Großhelfendorf;Hirschbergstraße;;;DE;09;85653;Aying;500000002262;;Großhelfendorf;00000007;Hirschbergstraße;910002873200;;;0;0;1;0;3;0;0;1;;
>>>  
>>>
>>> 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723878;DE;09;93336;Altmannstein;;Berghausen;Altmannsteiner
>>>  
>>> Str.;;;DE;09;93336;Altmannstein;500000005266;;Berghausen;00000003;Altmannsteiner
>>>  
>>> Straße;910001339100;;;0;0;0;0;3;0;1;1;; 
>>> 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723908;DE;09;93336;Altmannstein;;Berghausen;Altmannsteiner
>>>  
>>> Str.;;;DE;09;93336;Altmannstein;500000005266;;Berghausen;00000003;Altmannsteiner
>>>  
>>> Straße;910001339100;;;0;0;0;0;3;0;1;1;; 
>>> 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723918;DE;09;93336;Altmannstein;;Berghausen;Altmannsteiner
>>>  
>>> Str.;;;DE;09;93336;Altmannstein;500000005266;;Berghausen;00000003;Altmannsteiner
>>>  
>>> Straße;910001339100;;;0;0;0;0;3;0;1;1;; 
>>> 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723956;DE;09;93336;Altmannstein;;Berghausen;Altmannsteiner
>>>  
>>> Str.;;;DE;09;93336;Altmannstein;500000005266;;Berghausen;00000003;Altmannsteiner
>>>  
>>> Straße;910001339100;;;0;0;0;0;3;0;1;1;; 
>>> 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007724554;DE;09;95131;Schwarzenbach
>>>  
>>> a.Wald;;Schwarzenbach a 
>>> Wald;Walter-Münch-Straße;;;DE;09;95131;Schwarzenbach 
>>> a.Wald;500000011836;;Schwarzenbach 
>>> a.Wald;00000001;Walter-Münch-Straße;910007835500;;;0;0;0;0;3;1;0;1;; 
>>> 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007724593;DE;09;95131;Schwarzenbach
>>>  
>>> a.Wald;;Schwarzenbach a 
>>> Wald;Walter-Münch-Straße;;;DE;09;95131;Schwarzenbach 
>>> a.Wald;500000011836;;Schwarzenbach 
>>> a.Wald;00000001;Walter-Münch-Straße;910007835500;;;0;0;0;0;3;1;0;1;; 
>>>
>>> Once in a while I’d like to filter or sort such huge files by one or 
>>> more columns, like: 
>>>
>>> 1. Filter all lines that have ADR_CHK_KZ = 1 or 
>>> 2. Sort the file by MSGNO, ADRC_COUNTRY, ADRC_REGION, ADRC_POST_CODE1, 
>>> ADRC_CITY1, ADRC_CITY2, ADRC_STREET and ADRC_HOUSE_NUM1. 
>>>
>>> Is there a way to do this sort of tasks with BBEdit? 
>>>
>>> Thanks! 
>>>
>>>
>>> Regards, 
>>> Vlad 
>>>
>>>
>>>
>>>
> -- 
> This is the BBEdit Talk public discussion group. If you have a feature 
> request or believe that the application isn't working correctly, please 
> email "[email protected]" rather than posting here. Follow @bbedit on 
> Mastodon: <https://mastodon.social/@bbedit>
> --- 
> You received this message because you are subscribed to the Google Groups 
> "BBEdit Talk" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
>
> To view this discussion visit 
> https://groups.google.com/d/msgid/bbedit/50130484-14eb-4298-b762-800f88b2c66en%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/bbedit/50130484-14eb-4298-b762-800f88b2c66en%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
>
>

-- 
This is the BBEdit Talk public discussion group. If you have a feature request 
or believe that the application isn't working correctly, please email 
"[email protected]" rather than posting here. Follow @bbedit on Mastodon: 
<https://mastodon.social/@bbedit>
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/bbedit/3e139849-cf1a-41d8-821e-97f87cc39513n%40googlegroups.com.

Reply via email to