Hey Vlad,

A BBEdit feature that might help is its ability to detect and manipulate *columns*. You’ll find it under the Edit menu. As far as I can tell, the columns it detects are tab or comma separated. A look at your sample showed no quote marks, but it did have a few commas; if you replace the semi-colon (‘;’) separators with tabs, then BBEdit will recognize your columns. How could that help? Well, one idea would be to move the column(s) you care about for search or sorting to the left-most position, greatly simplifying any pattern matching you want to try.

Always work on a copy of your file, of course.

HTH

------ Original Message ------
From "GP" <[email protected]>
To "BBEdit Talk" <[email protected]>
Date 3/25/2025 2:32:00 PM
Subject Re: How to filter / sort CSV-files by certain columns?

As a follow up...

BBEdit's Pattern Playground is a great help in constructing tedious grep patterns like you'll need for your filtering and sorting needs. The really tedious part is getting the field position(s) you want to filter or sort on so you can modify that field's match pattern to conform to the desired filter or sorting criteria.

For example... For your " Filter all lines that have ADR_CHK_KZ = 1" using Text -> Process Lines Containing ... with the grep pattern:


\d{3};\w{3};[^;]*;[^;]*;\d{10};\w{2};\d{2};\d{5};[^;]*;[^;]*;[^;]*;[^;]*;[^;]*;[^;]*;\w{2};\d{2};\d{5};[^;]*;\d{12};[^;]*;[^;]*;\d{8};[^;]*;\d{12};[^;]*;[^;]*;\d;\d;\d;\d;\d;\d;\d;(1);[^;]*;[^\n]*



will do the trick. For filtering you don't need the group capturing on the 1 but it is useful with Pattern Playground to verify you're getting the right field position and field contents matched.



For your "Sort the file by MSGNO, ADRC_COUNTRY, ADRC_REGION, ADRC_POST_CODE1, ADRC_CITY1, ADRC_CITY2, ADRC_STREET and ADRC_HOUSE_NUM1" using Text -> Sort Lines ... with a grep pattern of:



\d{3};\w{3};[^;]*;[^;]*;\d{10};(\w{2});(\d{2});(\d{5});([^;]*);[^;]*;([^;]*);([^;]*);([^;]*);[^;]*;\w{2};\d{2};\d{5};[^;]*;\d{12};[^;]*;[^;]*;\d{8};[^;]*;\d{12};[^;]*;[^;]*;\d;\d;\d;\d;\d;\d;\d;\d;([^;]*);[^\n]*

with "Specific sub-patterns" selected with \8\1\2\3\4\5\6\7 in the fill in field will sort your example text using your desired field ordering.

On Tuesday, March 25, 2025 at 12:53:47 PM UTC-7 GP wrote:
For filtering, look at Text -> Process Lines Containing ... and for sorting Text -> Sort Lines ... using grep patterns to identify what you want to match for filtering and what subpattern field or fields you want to sort ordered on.

If the number of fields in your sample is representative of the real CSV files you're working with, it is going to be something of a pain in the rear coming up with the grep patterns needed to accomplish the desired filtering and sorting.

On Tuesday, March 25, 2025 at 11:03:35 AM UTC-7 Vlad Ghitulescu wrote:
Hey,


I use BBEdit very often while working with big CSV-files (300 - 500 MB, up to 4 million rows) looking like this:

MANDT;BU;IDENTIFIER;OBJNR;ADRC_ADDRNUMBER;ADRC_COUNTRY;ADRC_REGION;ADRC_POST_CODE1;ADRC_CITY1;ADRC_CITY_EXT;ADRC_CITY2;ADRC_STREET;ADRC_HOUSE_NUM1;ADRC_HOUSE_NUM2;LOKAREF_COUNTRY;LOKAREF_REGION;LOKAREF_POST_CODE1;LOKAREF_CITY1;LOKAREF_CITY_CODE;LOKAREF_CITY_EXT;LOKAREF_CITY2;LOKAREF_CITYP_CODE;LOKAREF_STREET;LOKAREF_STRT_CODE;LOKAREF_HOUSE_NUM1;LOKAREF_HOUSE_NUM2;COUNTRY_KZ;REGION_KZ;POST_CODE1_KZ;CITY1_KZ;CITY_EXT_KZ;CITY2_KZ;STREET_KZ;ADR_CHK_KZ;MSGNO;MESSAGE
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723592;DE;09;86415;Mering;;Sankt Afra;Egerländer Straße;;;DE;09;86415;Mering;500000002795;, Schwab;Sankt Afra;00000006;Egerländerstraße;910011919800;;;0;0;0;0;1;0;1;1;;
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723657;DE;09;85655;Aying;;Kaps;Kaps;;;DE;09;85653;Aying;500000002262;;Kaps;00000010;Kaps;700055566100;;;0;0;1;0;3;0;0;1;;
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723658;DE;09;83083;Riedering;;Patting;Patting;;;DE;09;83083;Riedering;500000002552;b Rosenheim, Oberbay;Patting;00000037;Pattinger Straße;910003809300;;;0;0;0;0;1;0;1;1;;
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723674;DE;09;85655;Aying;;Großhelfendorf;Hirschbergstraße;;;DE;09;85653;Aying;500000002262;;Großhelfendorf;00000007;Hirschbergstraße;910002873200;;;0;0;1;0;3;0;0;1;;
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723878;DE;09;93336;Altmannstein;;Berghausen;Altmannsteiner Str.;;;DE;09;93336;Altmannstein;500000005266;;Berghausen;00000003;Altmannsteiner Straße;910001339100;;;0;0;0;0;3;0;1;1;; 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723908;DE;09;93336;Altmannstein;;Berghausen;Altmannsteiner Str.;;;DE;09;93336;Altmannstein;500000005266;;Berghausen;00000003;Altmannsteiner Straße;910001339100;;;0;0;0;0;3;0;1;1;; 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723918;DE;09;93336;Altmannstein;;Berghausen;Altmannsteiner Str.;;;DE;09;93336;Altmannstein;500000005266;;Berghausen;00000003;Altmannsteiner Straße;910001339100;;;0;0;0;0;3;0;1;1;; 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723956;DE;09;93336;Altmannstein;;Berghausen;Altmannsteiner Str.;;;DE;09;93336;Altmannstein;500000005266;;Berghausen;00000003;Altmannsteiner Straße;910001339100;;;0;0;0;0;3;0;1;1;; 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007724554;DE;09;95131;Schwarzenbach a.Wald;;Schwarzenbach a Wald;Walter-Münch-Straße;;;DE;09;95131;Schwarzenbach a.Wald;500000011836;;Schwarzenbach a.Wald;00000001;Walter-Münch-Straße;910007835500;;;0;0;0;0;3;1;0;1;; 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007724593;DE;09;95131;Schwarzenbach a.Wald;;Schwarzenbach a Wald;Walter-Münch-Straße;;;DE;09;95131;Schwarzenbach a.Wald;500000011836;;Schwarzenbach a.Wald;00000001;Walter-Münch-Straße;910007835500;;;0;0;0;0;3;1;0;1;;

Once in a while I’d like to filter or sort such huge files by one or more columns, like:

1. Filter all lines that have ADR_CHK_KZ = 1 or
2. Sort the file by MSGNO, ADRC_COUNTRY, ADRC_REGION, ADRC_POST_CODE1, ADRC_CITY1, ADRC_CITY2, ADRC_STREET and ADRC_HOUSE_NUM1.

Is there a way to do this sort of tasks with BBEdit?

Thanks!


Regards,
Vlad




--
This is the BBEdit Talk public discussion group. If you have a feature request or believe that the application isn't working correctly, please email "[email protected]" rather than posting here. Follow @bbedit on Mastodon: <https://mastodon.social/@bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/bbedit/50130484-14eb-4298-b762-800f88b2c66en%40googlegroups.com <https://groups.google.com/d/msgid/bbedit/50130484-14eb-4298-b762-800f88b2c66en%40googlegroups.com?utm_medium=email&utm_source=footer>.


Thanks,

    - Bruce

_bruce__van_allen__santa_cruz__ca_


--
This is the BBEdit Talk public discussion group. If you have a feature request or believe that 
the application isn't working correctly, please email "[email protected]" rather 
than posting here. Follow @bbedit on Mastodon: <https://mastodon.social/@bbedit>
--- You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/bbedit/em0f3661a9-5cf2-410e-bf67-3da2c28d5975%40c8f72f7e.com.

Reply via email to