Re: Resources on writing import/export file filters for Writer

2018-09-29 Thread Jens Tröger
Thanks Noel!

I’ve poked through

  https://github.com/LibreOffice/core/tree/master/sw/source/filter/html

and couldn’t find anything hinting at expanding HTML entities.  Looking
at the parser you indicated, looks like this code would cover HTML
entities:

  
https://github.com/LibreOffice/core/blob/master/svtools/source/svhtml/parhtml.cxx#L394-L622

which are listed here:

  https://dev.w3.org/html5/html-author/charref

I’ll take a closer look…  

Cheers,
Jens


On Thu, Sep 27, 2018 at 11:24:40AM +0200, Noel Grandin wrote:
> 
> On 2018/09/27 11:10 AM, Jens Tröger wrote:
> > I’ve been poking through the HTML reader (rather superficially, I admit) in
> > search for the code that expands HTML entities to Unicode. I did that to
> 
> 
> Probably
> HTMLParser::ScanText
> at
> svtools/source/svhtml/parhtml.cxx:394

-- 
Jens Tröger
http://savage.light-speed.de/
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: Resources on writing import/export file filters for Writer

2018-09-27 Thread Noel Grandin



On 2018/09/27 11:10 AM, Jens Tröger wrote:

I’ve been poking through the HTML reader (rather superficially, I admit) in
search for the code that expands HTML entities to Unicode. I did that to



Probably
   HTMLParser::ScanText
at
   svtools/source/svhtml/parhtml.cxx:394
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: Resources on writing import/export file filters for Writer

2018-09-27 Thread Jens Tröger
Thank you Miklos,

I’ve been poking through the HTML reader (rather superficially, I admit) in
search for the code that expands HTML entities to Unicode. I did that to
address this bug:

https://bugs.documentfoundation.org/show_bug.cgi?id=119944

Didn’t find much, so could you please point me into the right direction?

Cheers,
Jens



--
Sent from: 
http://document-foundation-mail-archive.969070.n3.nabble.com/Dev-f1639786.html
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: Resources on writing import/export file filters for Writer

2018-05-24 Thread Miklos Vajna
Hi,

On Tue, May 22, 2018 at 10:16:04PM -0700, Jens Tröger 
 wrote:
> Other than looking through existing code, there is no current documentation
> available on how to write an export filter?

The usual trick: have a look at how one of the existing filters work,
and do the same (after understanding what it does) should work here as
well.

http://fridrich.blogspot.com/2013/08/extending-swiss-army-knife-overview.html
is a high level overview around this topic, I'm not sure if there is
anything newer than that.

Regards,

Miklos


signature.asc
Description: Digital signature
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: Resources on writing import/export file filters for Writer

2018-05-22 Thread Jens Tröger
Thank you Miklos!

I've spent some time trying to figure out how to use XSLT to define export
to an XML format, e.g. here:

https://wiki.openoffice.org/wiki/Export_filter_framework

That too seems to require writing a native-code wrapper, but most of the
documentation here is way outdated (early 2000s) and links to examples and
further documentation are mostly dead.

Other than looking through existing code, there is no current documentation
available on how to write an export filter?

Cheers,
Jens



--
Sent from: 
http://document-foundation-mail-archive.969070.n3.nabble.com/Dev-f1639786.html
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: Resources on writing import/export file filters for Writer

2018-05-22 Thread Miklos Vajna
Hi,

On Sat, May 19, 2018 at 09:26:18PM -0700, Jens Tröger 
 wrote:
> I haven’t yet had the time to dive into the topic yet—unfortunately.  So
> perhaps my question is somewhat of a guess: compared to using the Python UNO
> interface to load and traverse a document, can I expect a performance
> improvement when I use a filter to save the information that I currently
> extract using UNO?
> 
> If the guesstimated answer is “maybe/likely no” then I’d shift my current
> priorities…

Depends on how the import/export filter is implemented. Importers get an
empty doc model and populate it; exporters traverse an existing doc
model and write to a file. If you use the same python uno calls inside
the filter, then you don't gain much.

Regards,

Miklos


signature.asc
Description: Digital signature
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: Resources on writing import/export file filters for Writer

2018-05-19 Thread Jens Tröger
Hello,

I haven’t yet had the time to dive into the topic yet—unfortunately.  So
perhaps my question is somewhat of a guess: compared to using the Python UNO
interface to load and traverse a document, can I expect a performance
improvement when I use a filter to save the information that I currently
extract using UNO?

If the guesstimated answer is “maybe/likely no” then I’d shift my current
priorities…

Cheers,
Jens



--
Sent from: http://nabble.documentfoundation.org/Dev-f1639786.html
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: Resources on writing import/export file filters for Writer

2018-03-26 Thread Jens Tröger
Thanks David, I'll noodle through the Wiki then.

(Maybe) last question for now: if this:

  https://github.com/LibreOffice/core/tree/master/sw/source/filter

are the import filters for (some) of the file formats that Writer is
able to read, where would the export filters be?

Cheers,
Jens

-- 
Jens Tröger
http://savage.light-speed.de/
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: Resources on writing import/export file filters for Writer

2018-03-26 Thread David Tardon
Hi,

On Mon, 2018-03-26 at 09:26 +0200, Jens Tröger wrote:
> Thanks David, I'll noodle through the Wiki then.
> 
> (Maybe) last question for now: if this:
> 
>   https://github.com/LibreOffice/core/tree/master/sw/source/filter
> 
> are the import filters for (some) of the file formats that Writer is
> able to read, where would the export filters be?

In the same directory as the import filters, typically...

D.
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: Resources on writing import/export file filters for Writer

2018-03-22 Thread David Tardon
Hi,

On Wed, 2018-03-21 at 22:44 -0700, Jens Tröger wrote:
> Thank you David. That's a start but still assumes a lot of implicit
> knowledge
> about the surrounding infrastructure. Is there a minimal export
> plugin to
> start from? I do have quite some experience with the object model
> from the
> Python/UNO view.

I don't know. There might be something in the UNO examples. I think
there is also a chapter about filters in the old OO.o Developers Guide
(http://www.openoffice.org/api/DevelopersGuide/DevelopersGuide.html).

> Also, it appears that LO6+ imports (x)html files to some extend.
> Where do I
> ask questions

On this list or on IRC.

> /file bugs around that?

At http://tdf.io/bugs .

Btw, the HTML import is nothing new. Quite the contrary. It was already
present in OO.o 1.0 (and the code has changed very little since that
time).

>  Or, in the context of this thread,
> where do I find the source on https://github.com/LibreOffice ?

In sw/source/filter/html in the core repo.

> I assume this one is a good way to settle in:
> https://wiki.documentfoundation.org/Development/BuildingOnLinuxux  ?

Yes, it is.

D.
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: Resources on writing import/export file filters for Writer

2018-03-21 Thread Jens Tröger
Thank you David. That's a start but still assumes a lot of implicit knowledge
about the surrounding infrastructure. Is there a minimal export plugin to
start from? I do have quite some experience with the object model from the
Python/UNO view.

Also, it appears that LO6+ imports (x)html files to some extend. Where do I
ask questions/file bugs around that? Or, in the context of this thread,
where do I find the source on https://github.com/LibreOffice ?

I assume this one is a good way to settle in:
https://wiki.documentfoundation.org/Development/BuildingOnLinux  ?

Thanks!
Jens



--
Sent from: http://nabble.documentfoundation.org/Dev-f1639786.html
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: Resources on writing import/export file filters for Writer

2018-03-02 Thread David Tardon
Hi,

On Fri, 2018-03-02 at 00:16 -0700, Jens Tröger wrote:
> Thank you Miklos!
> 
> Fridrich's blog was interesting, and mentions export filters on
> occasion. It
> hasn't given much detail though on how to actually write a XFilter
> based
> implementation which is what seems most sensible.
> 
> Perhaps somebody can point me at the source code of existing (simple)
> export
> filters?

E.g., writerperfect/source/writer/EPUBExportFilter.cxx .

D.
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: Resources on writing import/export file filters for Writer

2018-03-01 Thread Jens Tröger
Thank you Miklos!

Fridrich's blog was interesting, and mentions export filters on occasion. It
hasn't given much detail though on how to actually write a XFilter based
implementation which is what seems most sensible.

Perhaps somebody can point me at the source code of existing (simple) export
filters?

The blog also suggests to use ODF flat XML as an intermediary file format
but without giving a reference. Is this the one:
https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office#technical
?

Thanks!
Jens



--
Sent from: http://nabble.documentfoundation.org/Dev-f1639786.html
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: Resources on writing import/export file filters for Writer

2018-02-19 Thread Miklos Vajna
Hi,

On Sun, Feb 18, 2018 at 09:03:36PM -0800, Jens Tröger 
 wrote:
> I’m looking into writing an “export filter”, i.e. exporting the
> currently open Writer document to a custom file format. Can somebody
> please point me at the appropriate documentation and perhaps code
> samples? At a later point I’d like to import a custom file format as
> well, and will look for that same documentation.

Here is one overview:
.
It discusses import filters, but export filters are similar in many
aspects.

If you want to read or write some obscure old file format, doing that as
part of DLP () may make sense.

Regards,

Miklos


signature.asc
Description: Digital signature
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice


Resources on writing import/export file filters for Writer

2018-02-18 Thread Jens Tröger
Hello,

I’m looking into writing an “export filter”, i.e. exporting the currently open 
Writer document to a custom file format. Can somebody please point me at the 
appropriate documentation and perhaps code samples? At a later point I’d like 
to import a custom file format as well, and will look for that same 
documentation.

Thanks!
Jens

--
Jens Tröger
http://savage.light-speed.de/

___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice