Regex for dealing with commas inside of quotes

2019-10-11 Thread Russ Pixels
I use Text Factory with GREP to process downloaded bank statements, batch 
converting them from CSV to tab .txt using a simple comma find and replace, 
but if there are commas in the comments field that screws things up. Can 
someone supply a regex example that converts only the commas found between 
the quotes to some other character or a space? In the factory it will run 
first, then I can convert the commas to tabs. Thanks

Example of issue: 

28 MAR 2018,27 MAR 2018,"Comments for credited amount Ref: TH 
1,2,3,4",,1025.00,2453.85

-- 
This is the BBEdit Talk public discussion group. If you have a 
feature request or need technical support, please email
"supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: 
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/bbedit/785cde9a-f164-436b-8b33-035466f2495c%40googlegroups.com.


Re: Regex for dealing with commas inside of quotes

2019-10-12 Thread ThePorgie
I'm assuming the Comments can have a varying amount and they might not be 
numbers as well?
If this were a problem I had to tackle I think it would be easier to import 
the data into Excel (or your spreadsheet software of choice) and do a Find 
& Replace on just the one columnI'm coming up empty for a grep solution 
if the data varies like I think it would.

On Friday, October 11, 2019 at 10:08:26 PM UTC-4, Russ Pixels wrote:
>
> I use Text Factory with GREP to process downloaded bank statements, batch 
> converting them from CSV to tab .txt using a simple comma find and replace, 
> but if there are commas in the comments field that screws things up. Can 
> someone supply a regex example that converts only the commas found between 
> the quotes to some other character or a space? In the factory it will run 
> first, then I can convert the commas to tabs. Thanks
>
> Example of issue: 
>
> 28 MAR 2018,27 MAR 2018,"Comments for credited amount Ref: TH 
> 1,2,3,4",,1025.00,2453.85
>

-- 
This is the BBEdit Talk public discussion group. If you have a 
feature request or need technical support, please email
"supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: 
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/bbedit/cd8da06d-934d-440e-ab02-6e1e53ed3a48%40googlegroups.com.


Re: Regex for dealing with commas inside of quotes

2019-10-13 Thread GP
You don't need to do it in two steps.  The following pattern captures in 
groups everything but commas in non-quoted strings:

^(\d{2}\s[A-Z]{3}\s\d{4}),(\d{2}\s[A-Z]{3}\s\d{4}),("[^"]*"),([^,]*),(\d{1,}\.\d{2}),(\d{1,}\.\d{2})$

then do the substitution with the captured groups separated by tab 
characters.

$1\t$2\t$3\t$4\t$5\t$6

This assumes each entry is formatted like your example.

The ("[^"]*") pattern is what captures the quoted string with embedded 
commas.

The ([^,]*) pattern handles your example's empty field but can handle a 
non-empty field in that position as long as it doesn't have an embedded 
comma.

You may have to do some tweaking on the date and dollar capturing patterns 
if your statements have some variations in their formats.

-- 
This is the BBEdit Talk public discussion group. If you have a 
feature request or need technical support, please email
"supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: 
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/bbedit/6e890777-820a-4684-905f-df691f447542%40googlegroups.com.


Re: Regex for dealing with commas inside of quotes

2019-10-13 Thread Russ Pixels
Wow - thanks for that - a bit more complex than I expected but it does the 
trick, and I appreciate your time in constructing it. Its also a good 
example for me of how to construct complex regex, as I'm often scratching 
my head (mainly as I only use it now and again). I can scavenge bits in 
future as I can see common patterns I need in there as well.  


On Sunday, October 13, 2019 at 10:15:35 PM UTC+11, GP wrote:
>
> You don't need to do it in two steps.  The following pattern captures in 
> groups everything but commas in non-quoted strings:
>
>
> ^(\d{2}\s[A-Z]{3}\s\d{4}),(\d{2}\s[A-Z]{3}\s\d{4}),("[^"]*"),([^,]*),(\d{1,}\.\d{2}),(\d{1,}\.\d{2})$
>
> then do the substitution with the captured groups separated by tab 
> characters.
>
> $1\t$2\t$3\t$4\t$5\t$6
>
> This assumes each entry is formatted like your example.
>
> The ("[^"]*") pattern is what captures the quoted string with embedded 
> commas.
>
> The ([^,]*) pattern handles your example's empty field but can handle a 
> non-empty field in that position as long as it doesn't have an embedded 
> comma.
>
> You may have to do some tweaking on the date and dollar capturing patterns 
> if your statements have some variations in their formats.
>

-- 
This is the BBEdit Talk public discussion group. If you have a 
feature request or need technical support, please email
"supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: 
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/bbedit/b03ff90a-d143-4c8c-970b-23a0ce62ec62%40googlegroups.com.


Re: Regex for dealing with commas inside of quotes

2019-10-14 Thread ThePorgie
("[^"]*")

Finding little Snip-its like above is why I like to come look thru these 
grep treads. They completely show me how I'm thinking of the problem all 
wrong. Brilliant!
Thanks!

On Sunday, October 13, 2019 at 7:15:35 AM UTC-4, GP wrote:
>
> You don't need to do it in two steps.  The following pattern captures in 
> groups everything but commas in non-quoted strings:
>
>
> ^(\d{2}\s[A-Z]{3}\s\d{4}),(\d{2}\s[A-Z]{3}\s\d{4}),("[^"]*"),([^,]*),(\d{1,}\.\d{2}),(\d{1,}\.\d{2})$
>
> then do the substitution with the captured groups separated by tab 
> characters.
>
> $1\t$2\t$3\t$4\t$5\t$6
>
> This assumes each entry is formatted like your example.
>
> The ("[^"]*") pattern is what captures the quoted string with embedded 
> commas.
>
> The ([^,]*) pattern handles your example's empty field but can handle a 
> non-empty field in that position as long as it doesn't have an embedded 
> comma.
>
> You may have to do some tweaking on the date and dollar capturing patterns 
> if your statements have some variations in their formats.
>

-- 
This is the BBEdit Talk public discussion group. If you have a 
feature request or need technical support, please email
"supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: 
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/bbedit/8e0534e3-1893-4abd-bfad-02adadc8ef0e%40googlegroups.com.