Re: Question about GREP search in XML files with weird CDATA fields

2020-02-24 Thread Miguel Perez
This could be an option too. Thank you.
Excel ended up not working in this case because while it reads the file, it 
has a weird formatting too and I cannot work with it much better.
The formula posted above works much better for me and is what I was looking 
for.

El lunes, 24 de febrero de 2020, 12:12:16 (UTC-6), ThePorgie escribió:
>
> One other thing about a xml tool. The latest version of Mac Excel will now 
> open xml. Just an fyi if that would work to get the names you're looking 
> for.
>
>
>
> On Monday, February 24, 2020 at 11:44:36 AM UTC-5, Miguel Perez wrote:
>>
>> Hi,
>>
>> I'm fairly new to RegEx and I need your help.
>>
>> I process many XML files in my job. Most of them are formatted correctly, 
>> that is:
>> Value
>> Value
>>
>> For those I search for values using:
>>
>> .*?
>> And it works like a charm.
>>
>> But then I have this one source that formats its XML files with CDATA 
>> fields like this:
>> 
>> 
>> 
>> 
>> In this example they are trying to say that the value *NAME* is *John 
>> Appleseed*. Rather than putting it as a key/value pair, they do that 
>> weird syntax.
>>
>> What GREP pattern can I use to extract all the names for this formatting?
>>
>> I am open to other solutions, like BASH scripts and Applescript. I'm 
>> desperate.
>>
>> Thank you for your help, friends.
>>
>> 
>>
>

-- 
This is the BBEdit Talk public discussion group. If you have a feature request 
or need technical support, please email "supp...@barebones.com" rather than 
posting here. Follow @bbedit on Twitter: 
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/bbedit/4bab3829-303a-4b88-ab22-fca5baa527f9%40googlegroups.com.


Re: Question about GREP search in XML files with weird CDATA fields

2020-02-24 Thread GP


On Monday, February 24, 2020 at 9:54:43 AM UTC-8, Miguel Perez wrote:
>
> Thank you, ThePorgie.
>
> Unfortunately it doesn't work for me.
>
> I should've said that there are many values using this syntax, like this:
> 
> 
> 
> 
> 
> 
> 
> 
>
>
> As you can see, there are two keys, but the very next line says *value* 
> for both of them. That is my main concern.
>
> I want *value1* for each item on the list, but its defining *key* is in 
> the line above with that CDATA formatting.
>
> Any ideas?
>
> As others have noted, expand your match pattern to reject the portions of 
the XML data you don't want to match.

Using your examples , the following regular expression will handle both 
types of match cases:

(?:\s*<\/key1>\s+<\/value>)|(?:(.*)<\/key1>)

The (?: ... ) constructs are non-capturing grouping to organize the 
alternative matching cases.

Note that you want the longest match case (i.e., the CDATA pattern) as the 
first alternative. Since that will be the first case tried for a pattern 
match and will thus correctly match on the desired CDATA pattern and not 
incorrectly use the "(.*)<\/key1>" expression for wrongly find a 
match in the CDATA text patterns.

The second alternative is the expression which matches on your "formatted 
correctly" non-CDATA formatted XML case. (This alternative is only tried 
after the first alternative fails to find a match.

Also note there are two capture fields in the regular expression. $1 
captures the name text in the first CDATA case alternative and $2 captures 
the name text in the second alternative.

-- 
This is the BBEdit Talk public discussion group. If you have a feature request 
or need technical support, please email "supp...@barebones.com" rather than 
posting here. Follow @bbedit on Twitter: 
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/bbedit/bbd822d4-c455-4e09-8852-39dbe155d9a9%40googlegroups.com.


Re: Question about GREP search in XML files with weird CDATA fields

2020-02-24 Thread Sam Hathaway
Can you give us a real-world example? I’m not clear on whether 
“key1” and “key2” literally appear in your document or if they 
are placeholders.


In any case, you should probably use a tool that is designed to work 
with XML. Such a tool would take care of the CDATA sections for you and 
let you search for things in a hierarchical way.


You might be able to “get the job done” with text-oriented tools, 
but it will eventually drive you insane.

-sam

On 24 Feb 2020, at 12:47, Miguel Perez wrote:


Thank you, ThePorgie.

Unfortunately it doesn't work for me.

I should've said that there are many values using this syntax, like 
this:











As you can see, there are two keys, but the very next line says 
*value* for

both of them. That is my main concern.

I want *value1* for each item on the list, but its defining *key* is 
in the

line above with that CDATA formatting.

Any ideas?

El lunes, 24 de febrero de 2020, 11:01:22 (UTC-6), ThePorgie 
escribió:


Put "\1" (no quotes) in the replace field and then Extract with
On Monday, February 24, 2020 at 11:44:36 AM UTC-5, Miguel Perez 
wrote:


Hi,

I'm fairly new to RegEx and I need your help.

I process many XML files in my job. Most of them are formatted 
correctly,

that is:
Value
Value

For those I search for values using:

.*?
And it works like a charm.

But then I have this one source that formats its XML files with 
CDATA

fields like this:




In this example they are trying to say that the value *NAME* is 
*John

Appleseed*. Rather than putting it as a key/value pair, they do that
weird syntax.

What GREP pattern can I use to extract all the names for this 
formatting?


I am open to other solutions, like BASH scripts and Applescript. I'm
desperate.

Thank you for your help, friends.







--
This is the BBEdit Talk public discussion group. If you have a feature 
request or need technical support, please email 
"supp...@barebones.com" rather than posting here. Follow @bbedit on 
Twitter: 

---
You received this message because you are subscribed to the Google 
Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send 
an email to bbedit+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/bbedit/8c4b1b3f-c421-472f-967f-9e3d5d3dfff8%40googlegroups.com.



--
This is the BBEdit Talk public discussion group. If you have a feature request or need 
technical support, please email "supp...@barebones.com" rather than posting here. 
Follow @bbedit on Twitter: 
--- 
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/bbedit/ED2DF836-2037-430C-99D7-6D34730717A3%40munkynet.org.


Re: Question about GREP search in XML files with weird CDATA fields

2020-02-24 Thread ThePorgie
I would then include the line above so the string needs "Name"
\n\s+
> Thank you, ThePorgie.
>
> Unfortunately it doesn't work for me.
>
> I should've said that there are many values using this syntax, like this:
> 
> 
> 
> 
> 
> 
> 
> 
>
>
> As you can see, there are two keys, but the very next line says *value* 
> for both of them. That is my main concern.
>
> I want *value1* for each item on the list, but its defining *key* is in 
> the line above with that CDATA formatting.
>
> Any ideas?
>
> El lunes, 24 de febrero de 2020, 11:01:22 (UTC-6), ThePorgie escribió:
>>
>> Put "\1" (no quotes) in the replace field and then Extract with
>> >
>> Will that work for ya?
>>
>>
>>
>> On Monday, February 24, 2020 at 11:44:36 AM UTC-5, Miguel Perez wrote:
>>>
>>> Hi,
>>>
>>> I'm fairly new to RegEx and I need your help.
>>>
>>> I process many XML files in my job. Most of them are formatted 
>>> correctly, that is:
>>> Value
>>> Value
>>>
>>> For those I search for values using:
>>>
>>> .*?
>>> And it works like a charm.
>>>
>>> But then I have this one source that formats its XML files with CDATA 
>>> fields like this:
>>> 
>>> 
>>> 
>>> 
>>> In this example they are trying to say that the value *NAME* is *John 
>>> Appleseed*. Rather than putting it as a key/value pair, they do that 
>>> weird syntax.
>>>
>>> What GREP pattern can I use to extract all the names for this formatting?
>>>
>>> I am open to other solutions, like BASH scripts and Applescript. I'm 
>>> desperate.
>>>
>>> Thank you for your help, friends.
>>>
>>> 
>>>
>>

-- 
This is the BBEdit Talk public discussion group. If you have a feature request 
or need technical support, please email "supp...@barebones.com" rather than 
posting here. Follow @bbedit on Twitter: 
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/bbedit/1cc9de43-63eb-4523-b650-44ee43208d5e%40googlegroups.com.


Re: Question about GREP search in XML files with weird CDATA fields

2020-02-24 Thread Miguel Perez
Thank you, ThePorgie.

Unfortunately it doesn't work for me.

I should've said that there are many values using this syntax, like this:










As you can see, there are two keys, but the very next line says *value* for 
both of them. That is my main concern.

I want *value1* for each item on the list, but its defining *key* is in the 
line above with that CDATA formatting.

Any ideas?

El lunes, 24 de febrero de 2020, 11:01:22 (UTC-6), ThePorgie escribió:
>
> Put "\1" (no quotes) in the replace field and then Extract with
> 
> Will that work for ya?
>
>
>
> On Monday, February 24, 2020 at 11:44:36 AM UTC-5, Miguel Perez wrote:
>>
>> Hi,
>>
>> I'm fairly new to RegEx and I need your help.
>>
>> I process many XML files in my job. Most of them are formatted correctly, 
>> that is:
>> Value
>> Value
>>
>> For those I search for values using:
>>
>> .*?
>> And it works like a charm.
>>
>> But then I have this one source that formats its XML files with CDATA 
>> fields like this:
>> 
>> 
>> 
>> 
>> In this example they are trying to say that the value *NAME* is *John 
>> Appleseed*. Rather than putting it as a key/value pair, they do that 
>> weird syntax.
>>
>> What GREP pattern can I use to extract all the names for this formatting?
>>
>> I am open to other solutions, like BASH scripts and Applescript. I'm 
>> desperate.
>>
>> Thank you for your help, friends.
>>
>> 
>>
>

-- 
This is the BBEdit Talk public discussion group. If you have a feature request 
or need technical support, please email "supp...@barebones.com" rather than 
posting here. Follow @bbedit on Twitter: 
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/bbedit/8c4b1b3f-c421-472f-967f-9e3d5d3dfff8%40googlegroups.com.


Re: Question about GREP search in XML files with weird CDATA fields

2020-02-24 Thread ThePorgie
Put "\1" (no quotes) in the replace field and then Extract with

> Hi,
>
> I'm fairly new to RegEx and I need your help.
>
> I process many XML files in my job. Most of them are formatted correctly, 
> that is:
> Value
> Value
>
> For those I search for values using:
>
> .*?
> And it works like a charm.
>
> But then I have this one source that formats its XML files with CDATA 
> fields like this:
> 
> 
> 
> 
> In this example they are trying to say that the value *NAME* is *John 
> Appleseed*. Rather than putting it as a key/value pair, they do that 
> weird syntax.
>
> What GREP pattern can I use to extract all the names for this formatting?
>
> I am open to other solutions, like BASH scripts and Applescript. I'm 
> desperate.
>
> Thank you for your help, friends.
>
> 
>

-- 
This is the BBEdit Talk public discussion group. If you have a feature request 
or need technical support, please email "supp...@barebones.com" rather than 
posting here. Follow @bbedit on Twitter: 
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/bbedit/a24a2625-dd7d-4b3f-8f93-52246883e99d%40googlegroups.com.