[Libreoffice-ux-advise] [Bug 117428] add an option to PDF export dialog to do ActualText per word

2018-05-27 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=117428

Heiko Tietze  changed:

   What|Removed |Added

   Keywords|needsUXEval |
 CC|libreoffice-ux-advise@lists |olivier.hallot@documentfoun
   |.freedesktop.org|dation.org,
   ||tietze.he...@gmail.com

--- Comment #15 from Heiko Tietze  ---
Putting all comments together UX recommends to implement an option for this
/Actualtext feature. I suggest the caption "Improve non-latin text export"
(with default off, meaning nothing changes for western users) and explain
details at the help pages.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Libreoffice-ux-advise mailing list
Libreoffice-ux-advise@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-ux-advise


[Libreoffice-ux-advise] [Bug 117428] add an option to PDF export dialog to do ActualText per word

2018-05-21 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=117428
Bug 117428 depends on bug 117533, which changed state.

Bug 117533 Summary: Problems with copying text from generated PDF (for Graphite 
font)
https://bugs.documentfoundation.org/show_bug.cgi?id=117533

   What|Removed |Added

 Status|NEEDINFO|RESOLVED
 Resolution|--- |INVALID

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Libreoffice-ux-advise mailing list
Libreoffice-ux-advise@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-ux-advise


[Libreoffice-ux-advise] [Bug 117428] add an option to PDF export dialog to do ActualText per word

2018-05-20 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=117428

--- Comment #14 from Shree Devi Kumar  ---
(In reply to Khaled Hosny from comment #12)
> (In reply to Shree Devi Kumar from comment #10)
> > (In reply to Khaled Hosny from comment #9)
> > > They keyword for the
> > > proposed changes is “per word”, the new option would skip the algorithm 
> > > and
> > > tags the glyphs if each word with it's text, as a complete unit. 
> > 
> > @Khaled Any update on this? Can you create a patch for this option so that
> > it can be tested?
> 
> I don’t currently have time to work on this, unfortunately.

Ok. Thank you for your work on \Actualtext, it is step in the right direction
to getting fully copyable text from pdfs.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Libreoffice-ux-advise mailing list
Libreoffice-ux-advise@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-ux-advise


[Libreoffice-ux-advise] [Bug 117428] add an option to PDF export dialog to do ActualText per word

2018-05-20 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=117428

--- Comment #13 from Shree Devi Kumar  ---
(In reply to V Stuart Foote from comment #11)
> I don't believe Khaled has volunteered to tackle the needed refactoring to
> the PDF export filter and GUI.  Check History--clearly not assigned as
> Khaled removed himself, back to NEW

OK. Since he had suggested about opening a new bug for this, I had incorrectly
assumed that he was planning to work on it. 

> 
> Otherwise, is there any objection that implementing an /ActualText flag "per
> word" will mean string selection to copy from PDF will be limited to word
> bounds? Personally I think we need the tagging more than the partial string
> copy. 
> 
> Assuring correct handling combining glyphs and Unicode script--and
> presumably OTF font features when implemented (as for bug 58941)--is the
> desired outcome.
> 
> Justified from a11y perspective, and needed for accuracy supporting CTL
> scripts. 
> 
> Is that the UX consensus?

As a user the ability to copy text from pdf is important. Currently, except for
xelatex, I am not aware of any other method of doing so for Devanagari and
other Indic scripts.

Please see
https://www.wikihow.com/index.php?title=Create-a-Searchable-Hindi-PDF-Using-Lyx-with-Xetex
which is a workaround for users who are not comfortable with XeLatex to create
these searchable/copyable pdfs.

It will be a great benefit to users if this option can be implemented in Libre
Office.

Thank You!

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Libreoffice-ux-advise mailing list
Libreoffice-ux-advise@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-ux-advise


[Libreoffice-ux-advise] [Bug 117428] add an option to PDF export dialog to do ActualText per word

2018-05-16 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=117428

--- Comment #12 from Khaled Hosny  ---
(In reply to Shree Devi Kumar from comment #10)
> (In reply to Khaled Hosny from comment #9)
> > They keyword for the
> > proposed changes is “per word”, the new option would skip the algorithm and
> > tags the glyphs if each word with it's text, as a complete unit. 
> 
> @Khaled Any update on this? Can you create a patch for this option so that
> it can be tested?

I don’t currently have time to work on this, unfortunately.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Libreoffice-ux-advise mailing list
Libreoffice-ux-advise@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-ux-advise


[Libreoffice-ux-advise] [Bug 117428] add an option to PDF export dialog to do ActualText per word

2018-05-16 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=117428

V Stuart Foote  changed:

   What|Removed |Added

 Status|ASSIGNED|NEW
   See Also||https://bugs.documentfounda
   ||tion.org/show_bug.cgi?id=58
   ||941

--- Comment #11 from V Stuart Foote  ---
I don't believe Khaled has volunteered to tackle the needed refactoring to the
PDF export filter and GUI.  Check History--clearly not assigned as Khaled
removed himself, back to NEW

Otherwise, is there any objection that implementing an /ActualText flag "per
word" will mean string selection to copy from PDF will be limited to word
bounds? Personally I think we need the tagging more than the partial string
copy. 

Assuring correct handling combining glyphs and Unicode script--and presumably
OTF font features when implemented (as for bug 58941)--is the desired outcome.

Justified from a11y perspective, and needed for accuracy supporting CTL
scripts. 

Is that the UX consensus?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Libreoffice-ux-advise mailing list
Libreoffice-ux-advise@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-ux-advise


[Libreoffice-ux-advise] [Bug 117428] add an option to PDF export dialog to do ActualText per word

2018-05-16 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=117428

Shree Devi Kumar  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED

--- Comment #10 from Shree Devi Kumar  ---
(In reply to Khaled Hosny from comment #9)
>
> We do export the text already, but using a clever algorithm that minimizes
> file size impact and keeps individual characters selectable (as much as
> possible), but it fails in minor ways with some readers second guessing us
> and inserting random spaces in the middle of the word. 

For Indic languages this was happening in ALL readers that I tested. 

> They keyword for the
> proposed changes is “per word”, the new option would skip the algorithm and
> tags the glyphs if each word with it's text, as a complete unit. 

@Khaled Any update on this? Can you create a patch for this option so that it
can be tested?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Libreoffice-ux-advise mailing list
Libreoffice-ux-advise@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-ux-advise


[Libreoffice-ux-advise] [Bug 117428] add an option to PDF export dialog to do ActualText per word

2018-05-09 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=117428

Volga  changed:

   What|Removed |Added

 Depends on||117533


Referenced Bugs:

https://bugs.documentfoundation.org/show_bug.cgi?id=117533
[Bug 117533] Problems with copying text from generated PDF (for Graphite font)
-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Libreoffice-ux-advise mailing list
Libreoffice-ux-advise@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-ux-advise


[Libreoffice-ux-advise] [Bug 117428] add an option to PDF export dialog to do ActualText per word

2018-05-08 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=117428

Khaled Hosny  changed:

   What|Removed |Added

 Status|ASSIGNED|NEW
   Assignee|khaledho...@eglug.org   |libreoffice-b...@lists.free
   ||desktop.org

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Libreoffice-ux-advise mailing list
Libreoffice-ux-advise@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-ux-advise


[Libreoffice-ux-advise] [Bug 117428] add an option to PDF export dialog to do ActualText per word

2018-05-08 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=117428

--- Comment #8 from Heiko Tietze  ---
(In reply to V Stuart Foote from comment #3)
> 1. done only for CTL?  => NO (see 6.)
No

> 2. toggled active by default? => NO (all PDFs would balloon in size)
We have a direct export command with just the file dialog. So considering
Tools> Options>Print makes also sense. But I agree with No because of KISS.

> 3. receive its own check box control on the PDF export dialog?  => NO
> 4. alternatively, be merged into the generate "Tagged PDF (add document
> structure)" checkbox?  => YES (pending export dialog work needed for bug
> 45636) 
We have many options in this dialog and one more doesnt spoil the party. The
problem with Tagged PDF is that this option is formally used for the structure.
=> Maybe ("[ ] Export raw text" underneath "[ ] Export comments")

> 5. perform ICU lib only recognition of intended language/script? => NO
> (insufficient granularity as to /Lang tagging for non-CTL scripts)
ACK

> 6. or, recognize /ActualText as a component of supporting a11y--and that
> eventual support of ISO 14289-1 PDF/UA (bug 45636) will require accurate
> /Lang tagging--so coordination of ICU lib Unicode block detection with the
> locale/language (BCP 47/ISO 639 [1][2][3]) as set by locale or by Paragraph
> from the GUI must be implemented for fidelity of non-CTL scripts.  => YES
Sounds to me like a checkbox is set on or off by default.

(In reply to Khaled Hosny from comment #4)
> 2) What exact wording to use, /ActualText is a jargon
"Export raw text", "Export actual text", "Export source"...

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Libreoffice-ux-advise mailing list
Libreoffice-ux-advise@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-ux-advise


[Libreoffice-ux-advise] [Bug 117428] add an option to PDF export dialog to do ActualText per word

2018-05-07 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=117428

Khaled Hosny  changed:

   What|Removed |Added

   See Also|https://bugs.documentfounda |
   |tion.org/show_bug.cgi?id=66 |
   |597 |

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Libreoffice-ux-advise mailing list
Libreoffice-ux-advise@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-ux-advise


[Libreoffice-ux-advise] [Bug 117428] add an option to PDF export dialog to do ActualText per word

2018-05-07 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=117428

--- Comment #7 from Shree Devi Kumar  ---
> "Tagged PDF" would simply include actual text tagging by default.

That would be great!

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Libreoffice-ux-advise mailing list
Libreoffice-ux-advise@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-ux-advise


[Libreoffice-ux-advise] [Bug 117428] add an option to PDF export dialog to do ActualText per word

2018-05-07 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=117428

Xisco Faulí  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||xiscofa...@libreoffice.org

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Libreoffice-ux-advise mailing list
Libreoffice-ux-advise@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-ux-advise


[Libreoffice-ux-advise] [Bug 117428] add an option to PDF export dialog to do ActualText per word

2018-05-06 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=117428

--- Comment #6 from V Stuart Foote  ---
(In reply to Khaled Hosny from comment #4)
> 1) Do we want this option or not

IIUC with
https://cgit.freedesktop.org/libreoffice/core/commit/?id=c688b01d9102832226251fc84045408afe392459
gets us /ActualText tags in the PDF at the Unicode Glyph cluster level where
needed.

This additional work would be to expand PDF export to include generation of the
/ActualText at Unicode Word boundaries for text in all scripts/fonts. Helpful
for fidelity of CTL script content by word, but also for extending our Tagged
PDF content in general to include tagged words for entire text. Good for a11y
and AT tools that can parse the tags.

So I think it is worth doing.


> 2) What exact wording to use, /ActualText is a jargon (it refers to specific 
> PDF
> construct) that I’m not sure should be exposed in user UI.


True "Actual Text", a counterpoint to "Alternate Text" or "Extended Text"
commenting for accessibility, could include other lexical aspects of rendering
a document--e.g. exposing in PDF the meaning of an Emoji, from its Unicode
point name or drawn from a substitution table. 

But here if we were to enable/disable Unicode Word boundary tags by simply
adding it to the "Tagged PDF (add document structure)" check box the specific
PDF /Lang & /ActualText tags would not be needed in the UI.  "Tagged PDF" would
simply include actual text tagging by default.

The Help item for the checkbox would include details regards the Tagging of
text including /Lang tags and /ActualText tags mentioned for completeness--but
with no need to refer to them in the GUI otherwise.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Libreoffice-ux-advise mailing list
Libreoffice-ux-advise@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-ux-advise


[Libreoffice-ux-advise] [Bug 117428] add an option to PDF export dialog to do ActualText per word

2018-05-06 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=117428

--- Comment #5 from Shree Devi Kumar  ---
Regarding expected ballooning in pdf size, please see

http://tug.org/pipermail/xetex/2016-February/026445.html

On 23/2/16 02:54, Andrew Cunningham wrote:
> It would probably more than double, i was under the impression that
> ActualText was a tag attrubute, so extensive tagging would be needed,
> and actual text added to the tags.

The ActualText tagging is highly compressible, so in practice the 
increase in overall PDF size is not all that great.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Libreoffice-ux-advise mailing list
Libreoffice-ux-advise@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-ux-advise


[Libreoffice-ux-advise] [Bug 117428] add an option to PDF export dialog to do ActualText per word

2018-05-06 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=117428

--- Comment #4 from Khaled Hosny  ---
(In reply to Heiko Tietze from comment #2)
> What input from UX do you expect, Khaled?

What V Stuart Foote, plus 1) Do we want this option or not 2) What exact
wording to use, /ActualText is a jargon (it refers to specific PDF construct)
that I’m not sure should be exposed in user UI.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Libreoffice-ux-advise mailing list
Libreoffice-ux-advise@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-ux-advise


[Libreoffice-ux-advise] [Bug 117428] add an option to PDF export dialog to do ActualText per word

2018-05-06 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=117428

V Stuart Foote  changed:

   What|Removed |Added

 CC||er...@redhat.com,
   ||vmik...@collabora.co.uk,
   ||vstuart.fo...@utsa.edu
   See Also||https://bugs.documentfounda
   ||tion.org/show_bug.cgi?id=66
   ||597

--- Comment #3 from V Stuart Foote  ---
The change is internal to the PDF export filter, enabled it will produce much
larger PDF. But that PDF will be more useful to users needing to copy out text
with reasonable word bounds--especially so for Complex script languages as
drove bug 66597

The UX issues now are if the work on tagging PDF with /ActualText should be: 

1. done only for CTL?  => NO (see 6.)

2. toggled active by default? => NO (all PDFs would balloon in size)

3. receive its own check box control on the PDF export dialog?  => NO

4. alternatively, be merged into the generate "Tagged PDF (add document
structure)" checkbox?  => YES (pending export dialog work needed for bug 45636) 

5. perform ICU lib only recognition of intended language/script? => NO
(insufficient granularity as to /Lang tagging for non-CTL scripts)

6. or, recognize /ActualText as a component of supporting a11y--and that
eventual support of ISO 14289-1 PDF/UA (bug 45636) will require accurate /Lang
tagging--so coordination of ICU lib Unicode block detection with the
locale/language (BCP 47/ISO 639 [1][2][3]) as set by locale or by Paragraph
from the GUI must be implemented for fidelity of non-CTL scripts.  => YES


=-ref-=
[1]
https://opengrok.libreoffice.org/xref/core/i18nlangtag/source/isolang/isolang.cxx#170
[2] https://opengrok.libreoffice.org/xref/core/include/i18nlangtag/mslangid.hxx
[3] https://opengrok.libreoffice.org/xref/core/include/rtl/locale.h

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Libreoffice-ux-advise mailing list
Libreoffice-ux-advise@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-ux-advise


[Libreoffice-ux-advise] [Bug 117428] add an option to PDF export dialog to do ActualText per word

2018-05-06 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=117428

--- Comment #2 from Heiko Tietze  ---
What input from UX do you expect, Khaled?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Libreoffice-ux-advise mailing list
Libreoffice-ux-advise@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-ux-advise


[Libreoffice-ux-advise] [Bug 117428] add an option to PDF export dialog to do ActualText per word

2018-05-06 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=117428

Heiko Tietze  changed:

   What|Removed |Added

 CC||libreoffice-ux-advise@lists
   ||.freedesktop.org

--- Comment #1 from Heiko Tietze  ---
needsUXEval needs CC @ ux-advice

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Libreoffice-ux-advise mailing list
Libreoffice-ux-advise@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-ux-advise