Properly formatted .csv files may have CR, LF, and CRLF end of line constructs in records. However, this would be where the best approach would be to use tools designed for the .csv protocol, rather than BBEdit’s manifold text editor tools.

In Perl, we use the Text::CSV family of modules. You can specify the column separator as the semi-colon in your files, and read in the records as simple arrays, with null or blank values handled. There’s also the database route if you wanted to go a bit further with your Perl - DBI plus DBD::CSV.

Other modern languages have similar capabilities.

You could probably also open that file in Excel, if you want the GUI path. Sometimes I do that just to take a look at a file’s structure. But beware that Excel will convert some data such as dates to its own format, and make other “helpful” changes like dropping leading zeroes - a problem with postal codes, e.g. - unless you first designate all cells in the sheet as “text”.

HTH

------ Original Message ------
From "Vlad Ghitulescu" <[email protected]>
To [email protected]
Date 3/26/2025 6:47:17 AM
Subject Re: How to filter / sort CSV-files by certain columns?

The „disaster" was most probably caused by errors in the CSV-file: There is a " in the line just after the tragedy happens and 3 CRLF in the middle of some rows 😣



Am 26.03.2025 um 09:29 schrieb Vlad Ghitulescu <[email protected]>:

The rearrange columns worked only on the first 25,816 lines

--
This is the BBEdit Talk public discussion group. If you have a feature request or believe that the application isn't working correctly, please email "[email protected]" rather than posting here. Follow @bbedit on Mastodon: <https://mastodon.social/@bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/bbedit/D78F9924-03B3-4117-9B3D-BC6614B5D1D5%40Ghitulescu.de.
<CleanShot 2025-03-26 at 09.21.19.png>

The rest of the file was more or less mixed (see above). I’ll talk with the support about it.





Am 26.03.2025 um 09:08 schrieb Vlad Ghitulescu <[email protected]>:

Hello again,


I changed all the semicolons to ",", changed then the grep ^ and $ into " and then voilá! the columns magically appeared:

<CleanShot 2025-03-26 at 08.57.09.png>

After this I could successfully rearrange the columns

<CleanShot 2025-03-26 at 09.01.19.png>

I then sorted all but the first line… and this totally destroyed the file

<CleanShot 2025-03-26 at 09.03.44.png>

(The copy of the file, that is - naturally! 😉)

I’ll start analyzing the new problem…


Regards,
Vlad






Am 26.03.2025 um 08:15 schrieb Vlad Ghitulescu <[email protected]>:

Hey Bruce,


Thanks for the idea!

I changed the semicolon into tab

<CleanShot 2025-03-26 at 08.11.14.png>

but unfortunately that didn’t unlock the possibility to rearrange the columns

<CleanShot 2025-03-26 at 08.11.50.png>

Any idea where I went wrong?

Thanks again!


Regards,
Vlad




Am 25.03.2025 um 23:49 schrieb Bruce Van Allen <[email protected]>:

Hey Vlad,

A BBEdit feature that might help is its ability to detect and manipulate *columns*. You’ll find it under the Edit menu. As far as I can tell, the columns it detects are tab or comma separated. A look at your sample showed no quote marks, but it did have a few commas; if you replace the semi-colon (‘;’) separators with tabs, then BBEdit will recognize your columns. How could that help? Well, one idea would be to move the column(s) you care about for search or sorting to the left-most position, greatly simplifying any pattern matching you want to try.

Always work on a copy of your file, of course.

HTH

------ Original Message ------
From "GP" <[email protected]>
To "BBEdit Talk" <[email protected]>
Date 3/25/2025 2:32:00 PM
Subject Re: How to filter / sort CSV-files by certain columns?

As a follow up...

BBEdit's Pattern Playground is a great help in constructing tedious grep patterns like you'll need for your filtering and sorting needs. The really tedious part is getting the field position(s) you want to filter or sort on so you can modify that field's match pattern to conform to the desired filter or sorting criteria.

For example... For your " Filter all lines that have ADR_CHK_KZ = 1" using Text -> Process Lines Containing ... with the grep pattern:


\d{3};\w{3};[^;]*;[^;]*;\d{10};\w{2};\d{2};\d{5};[^;]*;[^;]*;[^;]*;[^;]*;[^;]*;[^;]*;\w{2};\d{2};\d{5};[^;]*;\d{12};[^;]*;[^;]*;\d{8};[^;]*;\d{12};[^;]*;[^;]*;\d;\d;\d;\d;\d;\d;\d;(1);[^;]*;[^\n]*



will do the trick. For filtering you don't need the group capturing on the 1 but it is useful with Pattern Playground to verify you're getting the right field position and field contents matched.



For your "Sort the file by MSGNO, ADRC_COUNTRY, ADRC_REGION, ADRC_POST_CODE1, ADRC_CITY1, ADRC_CITY2, ADRC_STREET and ADRC_HOUSE_NUM1" using Text -> Sort Lines ... with a grep pattern of:



\d{3};\w{3};[^;]*;[^;]*;\d{10};(\w{2});(\d{2});(\d{5});([^;]*);[^;]*;([^;]*);([^;]*);([^;]*);[^;]*;\w{2};\d{2};\d{5};[^;]*;\d{12};[^;]*;[^;]*;\d{8};[^;]*;\d{12};[^;]*;[^;]*;\d;\d;\d;\d;\d;\d;\d;\d;([^;]*);[^\n]*

with "Specific sub-patterns" selected with \8\1\2\3\4\5\6\7 in the fill in field will sort your example text using your desired field ordering.

On Tuesday, March 25, 2025 at 12:53:47 PM UTC-7 GP wrote:
For filtering, look at Text -> Process Lines Containing ... and for sorting Text -> Sort Lines ... using grep patterns to identify what you want to match for filtering and what subpattern field or fields you want to sort ordered on.

If the number of fields in your sample is representative of the real CSV files you're working with, it is going to be something of a pain in the rear coming up with the grep patterns needed to accomplish the desired filtering and sorting.

On Tuesday, March 25, 2025 at 11:03:35 AM UTC-7 Vlad Ghitulescu wrote:
Hey,


I use BBEdit very often while working with big CSV-files (300 - 500 MB, up to 4 million rows) looking like this:

MANDT;BU;IDENTIFIER;OBJNR;ADRC_ADDRNUMBER;ADRC_COUNTRY;ADRC_REGION;ADRC_POST_CODE1;ADRC_CITY1;ADRC_CITY_EXT;ADRC_CITY2;ADRC_STREET;ADRC_HOUSE_NUM1;ADRC_HOUSE_NUM2;LOKAREF_COUNTRY;LOKAREF_REGION;LOKAREF_POST_CODE1;LOKAREF_CITY1;LOKAREF_CITY_CODE;LOKAREF_CITY_EXT;LOKAREF_CITY2;LOKAREF_CITYP_CODE;LOKAREF_STREET;LOKAREF_STRT_CODE;LOKAREF_HOUSE_NUM1;LOKAREF_HOUSE_NUM2;COUNTRY_KZ;REGION_KZ;POST_CODE1_KZ;CITY1_KZ;CITY_EXT_KZ;CITY2_KZ;STREET_KZ;ADR_CHK_KZ;MSGNO;MESSAGE
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723592;DE;09;86415;Mering;;Sankt Afra;Egerländer Straße;;;DE;09;86415;Mering;500000002795;, Schwab;Sankt Afra;00000006;Egerländerstraße;910011919800;;;0;0;0;0;1;0;1;1;;
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723657;DE;09;85655;Aying;;Kaps;Kaps;;;DE;09;85653;Aying;500000002262;;Kaps;00000010;Kaps;700055566100;;;0;0;1;0;3;0;0;1;;
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723658;DE;09;83083;Riedering;;Patting;Patting;;;DE;09;83083;Riedering;500000002552;b Rosenheim, Oberbay;Patting;00000037;Pattinger Straße;910003809300;;;0;0;0;0;1;0;1;1;;
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723674;DE;09;85655;Aying;;Großhelfendorf;Hirschbergstraße;;;DE;09;85653;Aying;500000002262;;Großhelfendorf;00000007;Hirschbergstraße;910002873200;;;0;0;1;0;3;0;0;1;;
200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723878;DE;09;93336;Altmannstein;;Berghausen;Altmannsteiner Str.;;;DE;09;93336;Altmannstein;500000005266;;Berghausen;00000003;Altmannsteiner Straße;910001339100;;;0;0;0;0;3;0;1;1;; 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723908;DE;09;93336;Altmannstein;;Berghausen;Altmannsteiner Str.;;;DE;09;93336;Altmannstein;500000005266;;Berghausen;00000003;Altmannsteiner Straße;910001339100;;;0;0;0;0;3;0;1;1;; 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723918;DE;09;93336;Altmannstein;;Berghausen;Altmannsteiner Str.;;;DE;09;93336;Altmannstein;500000005266;;Berghausen;00000003;Altmannsteiner Straße;910001339100;;;0;0;0;0;3;0;1;1;; 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723956;DE;09;93336;Altmannstein;;Berghausen;Altmannsteiner Str.;;;DE;09;93336;Altmannstein;500000005266;;Berghausen;00000003;Altmannsteiner Straße;910001339100;;;0;0;0;0;3;0;1;1;; 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007724554;DE;09;95131;Schwarzenbach a.Wald;;Schwarzenbach a Wald;Walter-Münch-Straße;;;DE;09;95131;Schwarzenbach a.Wald;500000011836;;Schwarzenbach a.Wald;00000001;Walter-Münch-Straße;910007835500;;;0;0;0;0;3;1;0;1;; 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007724593;DE;09;95131;Schwarzenbach a.Wald;;Schwarzenbach a Wald;Walter-Münch-Straße;;;DE;09;95131;Schwarzenbach a.Wald;500000011836;;Schwarzenbach a.Wald;00000001;Walter-Münch-Straße;910007835500;;;0;0;0;0;3;1;0;1;;

Once in a while I’d like to filter or sort such huge files by one or more columns, like:

1. Filter all lines that have ADR_CHK_KZ = 1 or
2. Sort the file by MSGNO, ADRC_COUNTRY, ADRC_REGION, ADRC_POST_CODE1, ADRC_CITY1, ADRC_CITY2, ADRC_STREET and ADRC_HOUSE_NUM1.

Is there a way to do this sort of tasks with BBEdit?

Thanks!


Regards,
Vlad




--
This is the BBEdit Talk public discussion group. If you have a feature request or believe that the application isn't working correctly, please email "[email protected]" rather than posting here. Follow @bbedit on Mastodon: <https://mastodon.social/@bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/bbedit/50130484-14eb-4298-b762-800f88b2c66en%40googlegroups.com<https://groups.google.com/d/msgid/bbedit/50130484-14eb-4298-b762-800f88b2c66en%40googlegroups.com?utm_medium=email&utm_source=footer>.


Thanks,

- Bruce

_bruce__van_allen__santa_cruz__ca_


--
This is the BBEdit Talk public discussion group. If you have a feature request or believe that the application isn't working correctly, please email "[email protected]" rather than posting here. Follow @bbedit on Mastodon: <https://mastodon.social/@bbedit> --- You received this message because you are subscribed to the Google Groups "BBEdit Talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/bbedit/em0f3661a9-5cf2-410e-bf67-3da2c28d5975%40c8f72f7e.com.



--
This is the BBEdit Talk public discussion group. If you have a feature request or believe that the application isn't working correctly, please email "[email protected]" rather than posting here. Follow @bbedit on Mastodon: <https://mastodon.social/@bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/bbedit/D78F9924-03B3-4117-9B3D-BC6614B5D1D5%40Ghitulescu.de.



Thanks,

    - Bruce

_bruce__van_allen__santa_cruz__ca_


--
This is the BBEdit Talk public discussion group. If you have a feature request or believe that 
the application isn't working correctly, please email "[email protected]" rather 
than posting here. Follow @bbedit on Mastodon: <https://mastodon.social/@bbedit>
--- You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/bbedit/ema35f173d-e03b-43d0-84b2-ae4ce1b51f9d%40181c06df.com.

Reply via email to