Oh, that’s great, GP - with this I’ll understand better how this (for me) huge regex works! Thanks for the tip! I’ll try this and report back.
> Am 28.03.2025 um 19:16 schrieb GP <[email protected]>: > > Your Pattern Playground results are perplexing. Using your first post's > example CSV data, the grep: > > \d{3};\w{3};[^;]*;[^;]*;\d{10};(\w{2});(\d{2});(\d{5});([^;]*);[^;]*;([^;]*);([^;]*);([^;]*);[^;]*;\w{2};\d{2};\d{5};[^;]*;\d{12};[^;]*;[^;]*;\d{8};[^;]*;\d{12};[^;]*;[^;]*;\d;\d;\d;\d;\d;\d;\d;\d;([^;]*);[^\n]* > > results in every line but the first column labels line matching. > > To figure out what the problem might be on your system with your local > language configuration using either BBEdit's Pattern Playground or regex101 > start out by building the grep pattern from scratch and rebuilding it from > left to right by semicolon delineated field pattern parts. E.g., first \d{3}; > which should find/highlight 7 matches in each line of the example CSV data - > second add \w{3}; for a total grep of \d{3};\w{3}; which should result in the > leading 200;BAG; being highlighted for each line in the example. Continue on > like that until you find the next added semicolon delineated field pattern > part fails to show a match for the left side part of each line in the example > data. It'll be something in that line's or lines' field/column that isn't > matching what the just add grep pattern part's matching criteria is. > > In addition to sorting, an additional use of a working grep pattern is that > you can also use it with BBEdit's Text -> Process Lines Containing... to find > all lines that do NOT contain that grep pattern which will help in finding > malformed CSV data in the large CSV data files your working with. > On Friday, March 28, 2025 at 7:12:03 AM UTC-7 Vlad Ghitulescu wrote: > Hey GP > > > I corrected the error re „Specific sub-patterns:“ but this didn’t seem to > bring any change: The ADRC_POST_CODE1 is still not sorted > > <CleanShot 2025-03-28 at 10.02.07.png> > > The command gave also no recognizable sign that is ready, so I’m not sure > that it didn’t have also problems with the line 25816, where the CRLF follows > a house-number (see previous emails). > > BBEdit’s Pattern Playground shows however that there is no result after > searching with the regex > > <CleanShot 2025-03-28 at 10.09.51.png> > > I’ll take the regex to regex101 (thanks for the hint!) and see if I could > spot an error. > > > > Regards, > Vlad > > > > > >> Am 26.03.2025 um 19:42 schrieb GP <[email protected] <>>: >> > >> First, in your Sort Lines dialog screenshot, you need to select the >> "Specific sub-patterns:" option instead of "Entire match" in order for the >> lines to be sorted by your column sorting criteria (MSGNO, ADRC_COUNTRY, >> ADRC_REGION, ADRC_POST_CODE1, ADRC_CITY1, ADRC_CITY2, ADRC_STREET and >> ADRC_HOUSE_NUM1). Since the sort lines grep pattern: >> >> \d{3};\w{3};[^;]*;[^;]*;\d{10};(\w{2});(\d{2});(\d{5});([^;]*);[^;]*;([^;]*);([^;]*);([^;]*);[^;]*;\w{2};\d{2};\d{5};[^;]*;\d{12};[^;]*;[^;]*;\d{8};[^;]*;\d{12};[^;]*;[^;]*;\d;\d;\d;\d;\d;\d;\d;\d;([^;]*);[^\n]* >> >> will match every line in your example, using the "Entire match" option >> devolves the sort into a simple whole line string sort which would put the >> MSGNO (i.e. \8 in the example) column contents last instead of first in the >> sort order. (See the "Sort Lines" section in Chapter 5 of the BBEdit User >> Manual for details of using sub-pattern sort ordering.) >> >> With the "Entire match" option, if you look at every 2..> line the left part >> of each line is the same until you get to the part of the string with the >> ADRC_ADDRNUMBER characters so the differences in that part of the string is >> Sort Line's "Entire match" is using to determine the ordering of the whole >> line strings. >> >> Using the "Specific sub-patterns:" option is what allows you to specify what >> substring part(s) of a string/line and what composed ordering of those >> concatenated substring will be used in determining the sort ordering between >> whole strings/lines. >> >> To see what's going on with Sort Lines' "Specific sub-patterns:" option you >> can use BBEdit's Pattern Playground to see what the concatenated substring >> for a line is being used to determine line sort ordering. For "Search >> pattern:" put: >> >> \d{3};\w{3};[^;]*;[^;]*;\d{10};(\w{2});(\d{2});(\d{5});([^;]*);[^;]*;([^;]*);([^;]*);([^;]*);[^;]*;\w{2};\d{2};\d{5};[^;]*;\d{12};[^;]*;[^;]*;\d{8};[^;]*;\d{12};[^;]*;[^;]*;\d;\d;\d;\d;\d;\d;\d;\d;([^;]*);[^\n]* >> >> and for "Replace pattern" put: >> >> \8\1\2\3\4\5\6\7 >> >> and for "Contents of" chose an open example file. >> >> As you step through each grep pattern match (using the Next button), the >> "Replacement text:" field will show you the concatenated string composed >> from the capture group ordered substring of the whole matched string/line. >> It is that "Replacement text:" string that Sort Lines uses for "Specific >> sub-patterns:" option sorting evaluation. >> >> P.S. If an explanation of what the parts of a grep regular expression is >> specifying would help, https://regex101.com <https://regex101.com/> has a >> pretty good explanation panel that explains what each bit of a regular >> expression is doing. >> On Wednesday, March 26, 2025 at 6:24:57 AM UTC-7 Vlad Ghitulescu wrote: >> Hey GP >> >> >> And thanks for the suggestion! >> >> I tried the sort-solution before trying to understand the regex itself 😶 >> >> I pasted into Text —> Sort Lines… like this >> >> >> >> but after Sort it doesn’t look like the postal code column was considered >> >> >> >> Did I miss something? >> >> Thanks again! >> >> >> Regards, >> Vlad >> >> >> >> >> >> >>> Am 25.03.2025 um 22:32 schrieb GP <gp-bbed...@ <>hotmail.com >>> <http://hotmail.com/>>: >>> >> >>> As a follow up... >>> >>> BBEdit's Pattern Playground is a great help in constructing tedious grep >>> patterns like you'll need for your filtering and sorting needs. The really >>> tedious part is getting the field position(s) you want to filter or sort on >>> so you can modify that field's match pattern to conform to the desired >>> filter or sorting criteria. >>> >>> For example... For your " Filter all lines that have ADR_CHK_KZ = 1" using >>> Text -> Process Lines Containing ... with the grep pattern: >>> >>> >>> \d{3};\w{3};[^;]*;[^;]*;\d{10};\w{2};\d{2};\d{5};[^;]*;[^;]*;[^;]*;[^;]*;[^;]*;[^;]*;\w{2};\d{2};\d{5};[^;]*;\d{12};[^;]*;[^;]*;\d{8};[^;]*;\d{12};[^;]*;[^;]*;\d;\d;\d;\d;\d;\d;\d;(1);[^;]*;[^\n]* >>> >>> will do the trick. For filtering you don't need the group capturing on the >>> 1 but it is useful with Pattern Playground to verify you're getting the >>> right field position and field contents matched. >>> >>> For your "Sort the file by MSGNO, ADRC_COUNTRY, ADRC_REGION, >>> ADRC_POST_CODE1, ADRC_CITY1, ADRC_CITY2, ADRC_STREET and ADRC_HOUSE_NUM1" >>> using Text -> Sort Lines ... with a grep pattern of: >>> >>> \d{3};\w{3};[^;]*;[^;]*;\d{10};(\w{2});(\d{2});(\d{5});([^;]*);[^;]*;([^;]*);([^;]*);([^;]*);[^;]*;\w{2};\d{2};\d{5};[^;]*;\d{12};[^;]*;[^;]*;\d{8};[^;]*;\d{12};[^;]*;[^;]*;\d;\d;\d;\d;\d;\d;\d;\d;([^;]*);[^\n]* >>> >>> with "Specific sub-patterns" selected with \8\1\2\3\4\5\6\7 in the fill in >>> field will sort your example text using your desired field ordering. >>> On Tuesday, March 25, 2025 at 12:53:47 PM UTC-7 GP wrote: >>> For filtering, look at Text -> Process Lines Containing ... and for sorting >>> Text -> Sort Lines ... using grep patterns to identify what you want to >>> match for filtering and what subpattern field or fields you want to sort >>> ordered on. >>> >>> If the number of fields in your sample is representative of the real CSV >>> files you're working with, it is going to be something of a pain in the >>> rear coming up with the grep patterns needed to accomplish the desired >>> filtering and sorting. >>> >>> On Tuesday, March 25, 2025 at 11:03:35 AM UTC-7 Vlad Ghitulescu wrote: >>> Hey, >>> >>> >>> I use BBEdit very often while working with big CSV-files (300 - 500 MB, up >>> to 4 million rows) looking like this: >>> >>> MANDT;BU;IDENTIFIER;OBJNR;ADRC_ADDRNUMBER;ADRC_COUNTRY;ADRC_REGION;ADRC_POST_CODE1;ADRC_CITY1;ADRC_CITY_EXT;ADRC_CITY2;ADRC_STREET;ADRC_HOUSE_NUM1;ADRC_HOUSE_NUM2;LOKAREF_COUNTRY;LOKAREF_REGION;LOKAREF_POST_CODE1;LOKAREF_CITY1;LOKAREF_CITY_CODE;LOKAREF_CITY_EXT;LOKAREF_CITY2;LOKAREF_CITYP_CODE;LOKAREF_STREET;LOKAREF_STRT_CODE;LOKAREF_HOUSE_NUM1;LOKAREF_HOUSE_NUM2;COUNTRY_KZ;REGION_KZ;POST_CODE1_KZ;CITY1_KZ;CITY_EXT_KZ;CITY2_KZ;STREET_KZ;ADR_CHK_KZ;MSGNO;MESSAGE >>> >>> 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723592;DE;09;86415;Mering;;Sankt >>> Afra;Egerländer Straße;;;DE;09;86415;Mering;500000002795;, Schwab;Sankt >>> Afra;00000006;Egerländerstraße;910011919800;;;0;0;0;0;1;0;1;1;; >>> 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723657;DE;09;85655;Aying;;Kaps;Kaps;;;DE;09;85653;Aying;500000002262;;Kaps;00000010;Kaps;700055566100;;;0;0;1;0;3;0;0;1;; >>> >>> 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723658;DE;09;83083;Riedering;;Patting;Patting;;;DE;09;83083;Riedering;500000002552;b >>> Rosenheim, Oberbay;Patting;00000037;Pattinger >>> Straße;910003809300;;;0;0;0;0;1;0;1;1;; >>> 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723674;DE;09;85655;Aying;;Großhelfendorf;Hirschbergstraße;;;DE;09;85653;Aying;500000002262;;Großhelfendorf;00000007;Hirschbergstraße;910002873200;;;0;0;1;0;3;0;0;1;; >>> >>> 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723878;DE;09;93336;Altmannstein;;Berghausen;Altmannsteiner >>> >>> Str.;;;DE;09;93336;Altmannstein;500000005266;;Berghausen;00000003;Altmannsteiner >>> Straße;910001339100;;;0;0;0;0;3;0;1;1;; >>> 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723908;DE;09;93336;Altmannstein;;Berghausen;Altmannsteiner >>> >>> Str.;;;DE;09;93336;Altmannstein;500000005266;;Berghausen;00000003;Altmannsteiner >>> Straße;910001339100;;;0;0;0;0;3;0;1;1;; >>> 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723918;DE;09;93336;Altmannstein;;Berghausen;Altmannsteiner >>> >>> Str.;;;DE;09;93336;Altmannstein;500000005266;;Berghausen;00000003;Altmannsteiner >>> Straße;910001339100;;;0;0;0;0;3;0;1;1;; >>> 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723956;DE;09;93336;Altmannstein;;Berghausen;Altmannsteiner >>> >>> Str.;;;DE;09;93336;Altmannstein;500000005266;;Berghausen;00000003;Altmannsteiner >>> Straße;910001339100;;;0;0;0;0;3;0;1;1;; >>> 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007724554;DE;09;95131;Schwarzenbach >>> a.Wald;;Schwarzenbach a >>> Wald;Walter-Münch-Straße;;;DE;09;95131;Schwarzenbach >>> a.Wald;500000011836;;Schwarzenbach >>> a.Wald;00000001;Walter-Münch-Straße;910007835500;;;0;0;0;0;3;1;0;1;; >>> 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007724593;DE;09;95131;Schwarzenbach >>> a.Wald;;Schwarzenbach a >>> Wald;Walter-Münch-Straße;;;DE;09;95131;Schwarzenbach >>> a.Wald;500000011836;;Schwarzenbach >>> a.Wald;00000001;Walter-Münch-Straße;910007835500;;;0;0;0;0;3;1;0;1;; >>> >>> Once in a while I’d like to filter or sort such huge files by one or more >>> columns, like: >>> >>> 1. Filter all lines that have ADR_CHK_KZ = 1 or >>> 2. Sort the file by MSGNO, ADRC_COUNTRY, ADRC_REGION, ADRC_POST_CODE1, >>> ADRC_CITY1, ADRC_CITY2, ADRC_STREET and ADRC_HOUSE_NUM1. >>> >>> Is there a way to do this sort of tasks with BBEdit? >>> >>> Thanks! >>> >>> >>> Regards, >>> Vlad >>> >>> >>> >>> >> >>> -- >>> This is the BBEdit Talk public discussion group. If you have a feature >>> request or believe that the application isn't working correctly, please >>> email "sup...@ <>barebones.com <http://barebones.com/>" rather than posting >>> here. Follow @bbedit on Mastodon: <https://mastodon.social/@bbedit> >>> --- >>> You received this message because you are subscribed to the Google Groups >>> "BBEdit Talk" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to bbedit+un...@ <>googlegroups.com <http://googlegroups.com/>. >> >>> To view this discussion visit >>> https://groups.google.com/d/msgid/bbedit/50130484-14eb-4298-b762-800f88b2c66en%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/bbedit/50130484-14eb-4298-b762-800f88b2c66en%40googlegroups.com?utm_medium=email&utm_source=footer>. >> >> >> -- >> This is the BBEdit Talk public discussion group. If you have a feature >> request or believe that the application isn't working correctly, please >> email "[email protected] <>" rather than posting here. Follow @bbedit on >> Mastodon: <https://mastodon.social/@bbedit> >> --- >> You received this message because you are subscribed to the Google Groups >> "BBEdit Talk" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <>. > >> To view this discussion visit >> https://groups.google.com/d/msgid/bbedit/3e139849-cf1a-41d8-821e-97f87cc39513n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/bbedit/3e139849-cf1a-41d8-821e-97f87cc39513n%40googlegroups.com?utm_medium=email&utm_source=footer>. > > > -- > This is the BBEdit Talk public discussion group. If you have a feature > request or believe that the application isn't working correctly, please email > "[email protected] <mailto:[email protected]>" rather than posting > here. Follow @bbedit on Mastodon: <https://mastodon.social/@bbedit> > --- > You received this message because you are subscribed to the Google Groups > "BBEdit Talk" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] > <mailto:[email protected]>. > To view this discussion visit > https://groups.google.com/d/msgid/bbedit/a12981c7-c81f-44cb-9f7b-3ea64cd6c602n%40googlegroups.com > > <https://groups.google.com/d/msgid/bbedit/a12981c7-c81f-44cb-9f7b-3ea64cd6c602n%40googlegroups.com?utm_medium=email&utm_source=footer>. > <CleanShot 2025-03-28 at 10.02.07.png><CleanShot 2025-03-28 at 10.09.51.png> -- This is the BBEdit Talk public discussion group. If you have a feature request or believe that the application isn't working correctly, please email "[email protected]" rather than posting here. Follow @bbedit on Mastodon: <https://mastodon.social/@bbedit> --- You received this message because you are subscribed to the Google Groups "BBEdit Talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/bbedit/DFB3F0C0-54DC-4D9B-8F44-A4088FD657D0%40Ghitulescu.de.
