Re: Sorting multiple records in a text file

Roland Küffner Sun, 10 Nov 2024 16:50:59 -0800

sorry, I missed a question mark in step 2. It should be: "Created at:
(.+?)<br>Author: .+?\s(.+?)<br>", Specific sub-patterns = "\2\1"



On Mon, Nov 11, 2024 at 1:39 AM Roland Küffner <[email protected]>
wrote:

> Hi, although this is solved, here another possible way using a text
> factory.
>
> The ratio (text factory steps) would be this:
> 1) Replace every line break with an placeholder (e.g. <br>): "
> \n(?!Annotation)" => "<br>"  (this features "Positional Assertions" <=
> RTM) => effect: every record is on it's own line
> 2) Sort lines (using "Sort using pattern") – Searching pattern = "Created
> at: (.+?)<br>Author: .+\s(.+?)<br>", Specific sub-patterns = "\2\1"
> 3) Re-Replace your placeholder with real line breaks: TF-step Replace all:
> "<br>" => "\n"
>
> Caveat: this assumes: 1) "Annotation" will always be the first field in a
> record; 2) "Created at" will always occur before "Author"; 3) the sequence
> "<br>" will not occur in the original text (change the placeholder if so)
>
> It worked on your sample data,
>
> Regards
> Roland
>
>
>
> On Wed, Nov 6, 2024 at 1:45 PM Howard <[email protected]> wrote:
>
>> Thanks everyone for your responses. They enabled me to solve the problem.
>> Howard
>> On Tuesday 5 November 2024 at 1:13:46 pm UTC-5 jj wrote:
>>
>>> 1. here is the find/replace dialog:
>>> [image: Screenshot 2024-11-05 at 18.50.03.png]
>>> 2. Here is what the file should look like once the replacement has been
>>> done:
>>> (the text options are displayed by clicking on the ⚙️ on the left of the
>>> navigation bar)
>>> [image: Screenshot 2024-11-05 at 18.49.22.png]
>>>
>>> Notice that the <TAB> tags where replaced by tab characters (the little
>>> triangles Δ only visible when Show invisibles > Show tabs is checked).
>>>
>>> The conversion should be immediate.
>>>
>>> Transforming your example to this Tab-Separated-Values result:
>>>
>>> [image: Screenshot 2024-11-05 at 19.00.50.png]
>>>  (Notice that the invisible tab characters are visible here because Show
>>> invisibles > Show tabs is checked for this file too )
>>>
>>> HTH
>>>
>>> Jean Jourdain
>>>
>>> On Tuesday, November 5, 2024 at 12:37:59 PM UTC+1 Howard wrote:
>>>
>>>> Jean, I tried to apply your canonize_lines_to_columns.txt file to the
>>>> data shown earlier in this post, following your directions; however, after
>>>> letting it run for a few minutes, nothing happened. I had to Force Quit.
>>>>
>>>> I had to manually change every <TAB> to `\t` and the BBEdit
>>>> Find/Replace wouldn't do it. In BBEdit Settings, the "Auto-expand tabs"
>>>> option was off. How do I deselect that option?
>>>>
>>>> How long should it take for the data to be converted? What could I be
>>>> doing wrong?
>>>>
>>>> Howard
>>>>
>>>> On Monday 4 November 2024 at 5:15:56 pm UTC-5 jj wrote:
>>>>
>>>>> Hi Howard,
>>>>>
>>>>> You could do that with a canonize file and a few regular espressions.
>>>>>
>>>>>  1. Create a new file named canonize_lines_to_columns.txt with this
>>>>> content:
>>>>>
>>>>> # -*- x-bbedit-canon-case-sensitive: 1; x-bbedit-canon-match-words: 0;
>>>>> x-bbedit-canon-grep: 1; -*-
>>>>> # End:
>>>>> # Local Variables:
>>>>> # coding: utf-8
>>>>> # indent_style: tab
>>>>> #===
>>>>> # Save the annotation number in a <<< >>> bracket that we will need
>>>>> later.
>>>>>
>>>>> * ^Annotation (\d+):<TAB><<<\1>>>* # Replace all the column titles by
>>>>> tabs.
>>>>>
>>>>> * \n(Created at|Author|Type|Comment):\h*<TAB>\t* # Replace all the
>>>>> newlines by a space in case there are some in the contents of Comment
>>>>> fields.
>>>>>
>>>>> * \n<TAB>\x20* # Put a newline before each annotation number and
>>>>> remove the <<< >>> bracket.
>>>>>
>>>>> * <<<(\d+)>>><TAB>\n\1* # Put the column names in the first line.
>>>>>
>>>>> * \A\h*$<TAB>Annotation\tCreated At\tAuthor\tType\tComment* # Put a
>>>>> single space where there is more that one.
>>>>>
>>>>> * \x20{2,}<TAB>\x20* # Reorder the columns to Author, Created At,
>>>>> Annotation, Type, Comment.
>>>>>
>>>>> * ^(.+?)\t(.+?)\t(.+?)\t<TAB>\3\t\2\t\1\t* # Now the file could be
>>>>> sorted in BBEdit by Author, Created At
>>>>> # Or imported into a Spreadsheet as Tab Separated Values.
>>>>>
>>>>>  2. Once you have created this file replace in it all the <TAB> by
>>>>> real tabs. Take care to deselect the "Auto-expand tabs" option on the file
>>>>> before you save it otherwise they will be replaced by spaces and we need
>>>>> them as separators.
>>>>>
>>>>> Find: <TAB>
>>>>> Replace: \t
>>>>>
>>>>> 3. Go to your data file and use the menu Text > Canonize… with the
>>>>> saved canonize file and apply it to your data.
>>>>>
>>>>> 4. Your data should be converted to Tab-Separated-Values with the
>>>>> columns reordered as to be sorted in this order: Author, Created At,
>>>>> Annotation, Type, Comment.
>>>>>
>>>>> 5. Use the menu Text > Sort Lines… or import the resulting TSV into a
>>>>> spreadsheet.
>>>>>
>>>>> HTH
>>>>>
>>>>> Jean Jourdain
>>>>>
>>>>> On Monday, November 4, 2024 at 6:10:04 PM UTC+1 Howard wrote:
>>>>>
>>>>>> I think I can write the GREP code that matches the first four lines,
>>>>>> but I am not sure how to do that for the *Comment* lines. Also, once
>>>>>> I do that, how do I "write a regular expression that recognizes the sort
>>>>>> keys within the line"?
>>>>>>
>>>>>> I've also never used text factory. (Is it easier to use in BBEdit 15
>>>>>> than in BBEdit 14?)
>>>>>> Howard
>>>>>>
>>>>>> On Monday 4 November 2024 at 11:53:49 am UTC-5 Neil Faiman wrote:
>>>>>>
>>>>>>> As far as I know, BBEdit simply supports sorting *lines* — not
>>>>>>> arbitrary records represented by batches of text lines. But do not
>>>>>>> despair. All is not lost. BBEdit has really robust support for sorting
>>>>>>> lines.
>>>>>>>
>>>>>>> I would start with a GREP that could match across multiple lines and
>>>>>>> collapse them into a single line, with some arbitrary separator 
>>>>>>> character
>>>>>>> representing where the original line breaks were. (You might need two
>>>>>>> patterns, one to collapse the multi-line Comments into a single line, 
>>>>>>> and
>>>>>>> then a second one to collapse all the line in the record into a single 
>>>>>>> line.
>>>>>>>
>>>>>>> Now that each record is represented by a single line, you can write
>>>>>>> a regular expression that recognizes the sort keys within the line.
>>>>>>> Then you can use the “Sort using pattern“ feature of the Text > Sort 
>>>>>>> Lines…
>>>>>>> command to sort the records on those keys.
>>>>>>>
>>>>>>> Finally, you can reverse the process from the first step and split
>>>>>>> the records back into multiple lines.
>>>>>>>
>>>>>>> Once you’ve got each of the steps perfected, you can create a text
>>>>>>> factory that will apply them to a file automatically, and you should be
>>>>>>> good to go.
>>>>>>>
>>>>>>> Good luck,
>>>>>>> Neil Faiman
>>>>>>>
>>>>>>> On Nov 4, 2024, at 11:35 AM, Howard <[email protected]> wrote:
>>>>>>>
>>>>>>> I have multiple records in a text file in the format below (seven
>>>>>>> sample records shown). I want to sort all of them by *Author* and
>>>>>>> then, within *Author,* *Created At*. In a record, the first four
>>>>>>> lines are always just one line; however, the fifth line (*Comment*)
>>>>>>> can be up to 30-40 lines, possibly more).
>>>>>>>
>>>>>>> Is this something that BBEdit can do? If it is, how can I do it?
>>>>>>>
>>>>>>>
>>>>>>> --
>> This is the BBEdit Talk public discussion group. If you have a feature
>> request or believe that the application isn't working correctly, please
>> email "[email protected]" rather than posting here. Follow @bbedit
>> on Mastodon: <https://mastodon.social/@bbedit>
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "BBEdit Talk" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion visit
>> https://groups.google.com/d/msgid/bbedit/0d492688-dffa-43c0-989d-98d1961bb999n%40googlegroups.com
>> <https://groups.google.com/d/msgid/bbedit/0d492688-dffa-43c0-989d-98d1961bb999n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
This is the BBEdit Talk public discussion group. If you have a feature request 
or believe that the application isn't working correctly, please email 
"[email protected]" rather than posting here. Follow @bbedit on Mastodon: 
<https://mastodon.social/@bbedit>
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/bbedit/CABybPXZW3pEb%3DnL-e78XhRiz9f8KSn4z4Zmwx4fU-j%3DYfJA%2Bbg%40mail.gmail.com.

test.textfactory
Description: Binary data

Re: Sorting multiple records in a text file

Reply via email to