On Fri, Dec 31, 2021, at 15:59, Gregory Pittman wrote:
> On 12/31/21 14:19, Matt Miller wrote:
>> On Fri, Dec 31, 2021, at 07:11, Gregory Pittman wrote:
>>> On 12/30/21 20:49, Matt Miller wrote:
>>>> I'm loading a text frame from a utf-8 encoded text file, and within my
>>>> Scribus Python code I want to search for the standard newline character,
>>>> ascii value 10. When I see an ascii 10 as the line separator I want to
>>>> apply a special paragraph style to the following paragraph. Most
>>>> paragraphs end with the Unicode paragraph separator character, \u2029, and
>>>> in those cases the default paragraph style is fine.
>>>>
>>>> My problem is that both these types of characters are matching '\r' when I
>>>> use re.search in python. also, if I select either line separator
>>>> character, then do getText(), I get a '\r' no matter what. I've confirmed
>>>> that my file encoding is utf-8. What am I missing? How can I search for a
>>>> simple '\n' character?
>>>
>>> Hi Matt,
>>>
>>> You don't say what OS you're using.
>>
>> Linux
>>
>>> Maybe running dos2unix on the text would help.
>>
>> Well, my ideal workflow is one where the contents of the text file instruct
>> Scribus what paragraph and character styles to use throughout the document.
>> So, I'm careful to put exactly the characters I want into the text file.
>> I've been largely successful, but now I've run into a case where it seems
>> Scribus (or Python) is losing information when I load the file. From inside
>> Scribus I can't distinguish between a Unicode paragraph separator, \u2029,
>> and a simple line feed, \u000A.
>>
>> I'm able to open my text file from the Python console and dump it out to see
>> that newline is displayed as "\n" and a paragraph separator is displayed as
>> "\u2029." So, I'm suspecting the problem is with Scribus, or how I'm using
>> it. I'm loading the file using insertHtmlText(), but I get the same bad
>> behavior from the GUI when I do "Content | Get Text..." and load the file
>> manually.
>>
>> I've attached a text file that shows the problem. If you run the following
>> in the scripter console from a document with a text frame named "Text1" you
>> should see the problem:
>>
>
> When I save this file then open it with kwrite, then when I do a
> Replace operation trying to switch \n to <p> (an arbitrary choice), I
> see the replacement happen at the end of the 2nd, 3rd, 4th, and 6th
> sentences.
I duplicated that behavior with kwrite. It seems only the 2nd and 6th sentences
should have been replaced, since only those have an actual \u000A. But, I guess
each app does some of its own interpretation as to what '\n' means. I suppose
that's similar to what Scribus is doing when it decides that \u2029 and \u000A
should both appear as '\r' in the Python string.
So, I changed my special marker character that I expect in my text files from a
line feed to a zero-width space. That character isn't as semantically
meaningful to me for this purpose as a line feed, but it works.
Thanks for the input.
> Greg
>
>
> ___
> Scribus Mailing List: [email protected]
> Edit your options or unsubscribe:
> http://lists.scribus.net/mailman/listinfo/scribus
> See also:
> http://wiki.scribus.net
> http://forums.scribus.net
--
Matt Miller
mailto:[email protected]
___
Scribus Mailing List: [email protected]
Edit your options or unsubscribe:
http://lists.scribus.net/mailman/listinfo/scribus
See also:
http://wiki.scribus.net
http://forums.scribus.net