Hey Vlad,
A BBEdit feature that might help is its ability to detect and manipulate
*columns*. You’ll find it under the Edit menu. As far as I can tell, the
columns it detects are tab or comma separated. A look at your sample
showed no quote marks, but it did have a few commas; if you replace the
semi-colon (‘;’) separators with tabs, then BBEdit will recognize your
columns. How could that help? Well, one idea would be to move the
column(s) you care about for search or sorting to the left-most
position, greatly simplifying any pattern matching you want to try.
Always work on a copy of your file, of course.
HTH
------ Original Message ------
From "GP" <[email protected]>
To "BBEdit Talk" <[email protected]>
Date 3/25/2025 2:32:00 PM
Subject Re: How to filter / sort CSV-files by certain columns?
As a follow up...
BBEdit's Pattern Playground is a great help in constructing tedious
grep patterns like you'll need for your filtering and sorting needs.
The really tedious part is getting the field position(s) you want to
filter or sort on so you can modify that field's match pattern to
conform to the desired filter or sorting criteria.
For example... For your " Filter all lines that have ADR_CHK_KZ = 1"
using Text -> Process Lines Containing ... with the grep pattern:
\d{3};\w{3};[^;]*;[^;]*;\d{10};\w{2};\d{2};\d{5};[^;]*;[^;]*;[^;]*;[^;]*;[^;]*;[^;]*;\w{2};\d{2};\d{5};[^;]*;\d{12};[^;]*;[^;]*;\d{8};[^;]*;\d{12};[^;]*;[^;]*;\d;\d;\d;\d;\d;\d;\d;(1);[^;]*;[^\n]*
will do the trick. For filtering you don't need the group capturing on
the 1 but it is useful with Pattern Playground to verify you're getting
the right field position and field contents matched.
For your "Sort the file by MSGNO, ADRC_COUNTRY, ADRC_REGION,
ADRC_POST_CODE1, ADRC_CITY1, ADRC_CITY2, ADRC_STREET and
ADRC_HOUSE_NUM1" using Text -> Sort Lines ... with a grep pattern of:
\d{3};\w{3};[^;]*;[^;]*;\d{10};(\w{2});(\d{2});(\d{5});([^;]*);[^;]*;([^;]*);([^;]*);([^;]*);[^;]*;\w{2};\d{2};\d{5};[^;]*;\d{12};[^;]*;[^;]*;\d{8};[^;]*;\d{12};[^;]*;[^;]*;\d;\d;\d;\d;\d;\d;\d;\d;([^;]*);[^\n]*
with "Specific sub-patterns" selected with \8\1\2\3\4\5\6\7 in the fill
in field will sort your example text using your desired field ordering.
On Tuesday, March 25, 2025 at 12:53:47 PM UTC-7 GP wrote:
For filtering, look at Text -> Process Lines Containing ... and for
sorting Text -> Sort Lines ... using grep patterns to identify what
you want to match for filtering and what subpattern field or fields
you want to sort ordered on.
If the number of fields in your sample is representative of the real
CSV files you're working with, it is going to be something of a pain
in the rear coming up with the grep patterns needed to accomplish the
desired filtering and sorting.
On Tuesday, March 25, 2025 at 11:03:35 AM UTC-7 Vlad Ghitulescu wrote:
Hey,
I use BBEdit very often while working with big CSV-files (300 - 500
MB, up to 4 million rows) looking like this:
MANDT;BU;IDENTIFIER;OBJNR;ADRC_ADDRNUMBER;ADRC_COUNTRY;ADRC_REGION;ADRC_POST_CODE1;ADRC_CITY1;ADRC_CITY_EXT;ADRC_CITY2;ADRC_STREET;ADRC_HOUSE_NUM1;ADRC_HOUSE_NUM2;LOKAREF_COUNTRY;LOKAREF_REGION;LOKAREF_POST_CODE1;LOKAREF_CITY1;LOKAREF_CITY_CODE;LOKAREF_CITY_EXT;LOKAREF_CITY2;LOKAREF_CITYP_CODE;LOKAREF_STREET;LOKAREF_STRT_CODE;LOKAREF_HOUSE_NUM1;LOKAREF_HOUSE_NUM2;COUNTRY_KZ;REGION_KZ;POST_CODE1_KZ;CITY1_KZ;CITY_EXT_KZ;CITY2_KZ;STREET_KZ;ADR_CHK_KZ;MSGNO;MESSAGE
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723592;DE;09;86415;Mering;;Sankt
Afra;Egerländer Straße;;;DE;09;86415;Mering;500000002795;,
Schwab;Sankt
Afra;00000006;Egerländerstraße;910011919800;;;0;0;0;0;1;0;1;1;;
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723657;DE;09;85655;Aying;;Kaps;Kaps;;;DE;09;85653;Aying;500000002262;;Kaps;00000010;Kaps;700055566100;;;0;0;1;0;3;0;0;1;;
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723658;DE;09;83083;Riedering;;Patting;Patting;;;DE;09;83083;Riedering;500000002552;b
Rosenheim, Oberbay;Patting;00000037;Pattinger
Straße;910003809300;;;0;0;0;0;1;0;1;1;;
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723674;DE;09;85655;Aying;;Großhelfendorf;Hirschbergstraße;;;DE;09;85653;Aying;500000002262;;Großhelfendorf;00000007;Hirschbergstraße;910002873200;;;0;0;1;0;3;0;0;1;;
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723878;DE;09;93336;Altmannstein;;Berghausen;Altmannsteiner
Str.;;;DE;09;93336;Altmannstein;500000005266;;Berghausen;00000003;Altmannsteiner
Straße;910001339100;;;0;0;0;0;3;0;1;1;;
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723908;DE;09;93336;Altmannstein;;Berghausen;Altmannsteiner
Str.;;;DE;09;93336;Altmannstein;500000005266;;Berghausen;00000003;Altmannsteiner
Straße;910001339100;;;0;0;0;0;3;0;1;1;;
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723918;DE;09;93336;Altmannstein;;Berghausen;Altmannsteiner
Str.;;;DE;09;93336;Altmannstein;500000005266;;Berghausen;00000003;Altmannsteiner
Straße;910001339100;;;0;0;0;0;3;0;1;1;;
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723956;DE;09;93336;Altmannstein;;Berghausen;Altmannsteiner
Str.;;;DE;09;93336;Altmannstein;500000005266;;Berghausen;00000003;Altmannsteiner
Straße;910001339100;;;0;0;0;0;3;0;1;1;;
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007724554;DE;09;95131;Schwarzenbach
a.Wald;;Schwarzenbach a
Wald;Walter-Münch-Straße;;;DE;09;95131;Schwarzenbach
a.Wald;500000011836;;Schwarzenbach
a.Wald;00000001;Walter-Münch-Straße;910007835500;;;0;0;0;0;3;1;0;1;;
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007724593;DE;09;95131;Schwarzenbach
a.Wald;;Schwarzenbach a
Wald;Walter-Münch-Straße;;;DE;09;95131;Schwarzenbach
a.Wald;500000011836;;Schwarzenbach
a.Wald;00000001;Walter-Münch-Straße;910007835500;;;0;0;0;0;3;1;0;1;;
Once in a while I’d like to filter or sort such huge files by one or
more columns, like:
1. Filter all lines that have ADR_CHK_KZ = 1 or
2. Sort the file by MSGNO, ADRC_COUNTRY, ADRC_REGION,
ADRC_POST_CODE1, ADRC_CITY1, ADRC_CITY2, ADRC_STREET and
ADRC_HOUSE_NUM1.
Is there a way to do this sort of tasks with BBEdit?
Thanks!
Regards,
Vlad
--
This is the BBEdit Talk public discussion group. If you have a feature
request or believe that the application isn't working correctly, please
email "[email protected]" rather than posting here. Follow @bbedit
on Mastodon: <https://mastodon.social/@bbedit>
---
You received this message because you are subscribed to the Google
Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/bbedit/50130484-14eb-4298-b762-800f88b2c66en%40googlegroups.com
<https://groups.google.com/d/msgid/bbedit/50130484-14eb-4298-b762-800f88b2c66en%40googlegroups.com?utm_medium=email&utm_source=footer>.
Thanks,
- Bruce
_bruce__van_allen__santa_cruz__ca_
--
This is the BBEdit Talk public discussion group. If you have a feature request or believe that
the application isn't working correctly, please email "[email protected]" rather
than posting here. Follow @bbedit on Mastodon: <https://mastodon.social/@bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/bbedit/em0f3661a9-5cf2-410e-bf67-3da2c28d5975%40c8f72f7e.com.