On Fri, Dec 31, 2021, at 15:59, Gregory Pittman wrote:
> On 12/31/21 14:19, Matt Miller wrote:
>> On Fri, Dec 31, 2021, at 07:11, Gregory Pittman wrote:
>>> On 12/30/21 20:49, Matt Miller wrote:
>>>> I'm loading a text frame from a utf-8 encoded text file, and within my 
>>>> Scribus Python code I want to search for the standard newline character, 
>>>> ascii value 10. When I see an ascii 10 as the line separator I want to 
>>>> apply a special paragraph style to the following paragraph. Most 
>>>> paragraphs end with the Unicode paragraph separator character, \u2029, and 
>>>> in those cases the default paragraph style is fine.
>>>>
>>>> My problem is that both these types of characters are matching '\r' when I 
>>>> use re.search in python. also, if I select either line separator 
>>>> character, then do getText(), I get a '\r' no matter what. I've confirmed 
>>>> that my file encoding is utf-8. What am I missing? How can I search for a 
>>>> simple '\n' character?
>>>
>>> Hi Matt,
>>>
>>> You don't say what OS you're using.
>> 
>> Linux
>> 
>>> Maybe running dos2unix on the text would help.
>> 
>> Well, my ideal workflow is one where the contents of the text file instruct 
>> Scribus what paragraph and character styles to use throughout the document. 
>> So, I'm careful to put exactly the characters I want into the text file. 
>> I've been largely successful, but now I've run into a case where it seems 
>> Scribus (or Python) is losing information when I load the file. From inside 
>> Scribus I can't distinguish between a Unicode paragraph separator, \u2029, 
>> and a simple line feed, \u000A.
>> 
>> I'm able to open my text file from the Python console and dump it out to see 
>> that newline is displayed as "\n" and a paragraph separator is displayed as 
>> "\u2029." So, I'm suspecting the problem is with Scribus, or how I'm using 
>> it. I'm loading the file using insertHtmlText(), but I get the same bad 
>> behavior from the GUI when I do "Content | Get Text..." and load the file 
>> manually.
>> 
>> I've attached a text file that shows the problem. If you run the following 
>> in the scripter console from a document with a text frame named "Text1" you 
>> should see the problem:
>> 
>
> When I save this file then open it with kwrite, then when I do a 
> Replace operation trying to switch \n to <p> (an arbitrary choice), I 
> see the replacement happen at the end of the 2nd, 3rd, 4th, and 6th 
> sentences.

I duplicated that behavior with kwrite. It seems only the 2nd and 6th sentences 
should have been replaced, since only those have an actual \u000A. But, I guess 
each app does some of its own interpretation as to what '\n' means. I suppose 
that's similar to what Scribus is doing when it decides that \u2029 and \u000A 
should both appear as '\r' in the Python string.

So, I changed my special marker character that I expect in my text files from a 
line feed to a zero-width space. That character isn't as semantically 
meaningful to me for this purpose as a line feed, but it works.

Thanks for the input.

> Greg
>
>
> ___
> Scribus Mailing List: [email protected]
> Edit your options or unsubscribe:
> http://lists.scribus.net/mailman/listinfo/scribus
> See also:
> http://wiki.scribus.net
> http://forums.scribus.net

-- 

  Matt Miller
  mailto:[email protected]

___
Scribus Mailing List: [email protected]
Edit your options or unsubscribe:
http://lists.scribus.net/mailman/listinfo/scribus
See also:
http://wiki.scribus.net
http://forums.scribus.net

Reply via email to