[
https://issues.apache.org/jira/browse/BATIK-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Erich Schubert updated BATIK-1183:
----------------------------------
Description:
In ELKI, we use Batik for scatterplots.
Marker symbols are generated as <symbol> tag, and then a <use> at the
individual locations. This is nice for post-editing (because the symbols can be
changed in a single place), but performance of this approach is pretty bad (up
to the point where I am considering to kick out Batik, and try something else).
When analyzing performance bottlenecks, I noticed the following things:
1. A substantial amount of time (way too much) goes into listener list
management (yes, I want support for dynamic changes; so I do need listeners).
It seems that for every <use>, several listeners are added?
2. String.intern is a major performance factor. I understand that we need to
intern strings, but we need to avoid redoing it as often.
3. When a <symbol> is used, it gets cloned. With thousands of <use> tags, this
leads to a substantial cost. In particular, because every string will be
interned again for every usage.
(org.apache.batik.bridge.SVGUseElementBridge#buildCompositeGraphicsNode calls
'importNode')
Attached is a file that shows the performance bottleneck; in particular when
interactions are enabled.
I have tried to improve some of these things in my speedup branch:
https://github.com/kno10/batik/tree/fixesAndSpeed
In this branch:
- the namespace SVGConstants.SVG_NAMESPACE_URI is recognized and the call to
String.intern() is avoided. This is the default namespace for SVG, and the
constant will point to the interned version.
- the custom "Hashtable" has been removed, and replaced with a type-safe
HashMap<> (which should actually be faster)
- The listener list management is now much simpler (and more efficient, as some
of the functionality wasn't ever used anywhere).
But I could not tackle reducing the amount of listeners and the cloning, as I
am not deep enough into Batik internals. I understand they are meant to
propagate changes to the symbol to all the copies, but maybe we can instead
have one shared listener on the <symbol> tag for all the <use> tags, not one
listener per <use> tag?
Without using '<symbol>' and '<use>', performance is much better. It makes the
file harder to edit, and twice as large. :-(
was:
In ELKI, we use Batik for Scatterplots.
Marker symbols are generated as <symbol> tag, and then a <use> at the
individual locations. This is nice for post-editing (because the symbols can be
changed in a single place), but performance of this approach is pretty bad (up
to the point where I am considering to kick out Batik, and try something else).
When analyzing performance bottlenecks, I noticed the following things:
1. A substantial amount of time (way too much) goes into listener list
management (yes, I want support for dynamic changes; so I do need listeners).
It seems that for every <use>, several listeners are added?
2. String.intern is a major performance factor. I understand that we need to
intern strings, but we need to avoid redoing it as often.
3. When a <symbol> is used, it gets cloned. With thousands of <use> tags, this
leads to a substantial cost. In particular, because every string will be
interned again for every usage.
(org.apache.batik.bridge.SVGUseElementBridge#buildCompositeGraphicsNode calls
'importNode')
I have tried to improve some of these things in my speedup branch:
https://github.com/kno10/batik/tree/fixesAndSpeed
In particular, SVGConstants.SVG_NAMESPACE_URI is recognized and not interned;
as we expect to see this namespace very often; and I replaced the listener list
management with something much simpler (and more efficient, as some of the
functionality wasn't ever used).
I could not tackle the amount of listeners and the cloning, as I am not deep
enough into Batik internals.
> Performance of <use> and <symbol>
> ---------------------------------
>
> Key: BATIK-1183
> URL: https://issues.apache.org/jira/browse/BATIK-1183
> Project: Batik
> Issue Type: Improvement
> Components: Bridge
> Affects Versions: trunk
> Reporter: Erich Schubert
> Labels: performance
> Attachments: scatter.svg.gz
>
>
> In ELKI, we use Batik for scatterplots.
> Marker symbols are generated as <symbol> tag, and then a <use> at the
> individual locations. This is nice for post-editing (because the symbols can
> be changed in a single place), but performance of this approach is pretty bad
> (up to the point where I am considering to kick out Batik, and try something
> else).
> When analyzing performance bottlenecks, I noticed the following things:
> 1. A substantial amount of time (way too much) goes into listener list
> management (yes, I want support for dynamic changes; so I do need listeners).
> It seems that for every <use>, several listeners are added?
> 2. String.intern is a major performance factor. I understand that we need to
> intern strings, but we need to avoid redoing it as often.
> 3. When a <symbol> is used, it gets cloned. With thousands of <use> tags,
> this leads to a substantial cost. In particular, because every string will be
> interned again for every usage.
> (org.apache.batik.bridge.SVGUseElementBridge#buildCompositeGraphicsNode calls
> 'importNode')
> Attached is a file that shows the performance bottleneck; in particular when
> interactions are enabled.
> I have tried to improve some of these things in my speedup branch:
> https://github.com/kno10/batik/tree/fixesAndSpeed
> In this branch:
> - the namespace SVGConstants.SVG_NAMESPACE_URI is recognized and the call to
> String.intern() is avoided. This is the default namespace for SVG, and the
> constant will point to the interned version.
> - the custom "Hashtable" has been removed, and replaced with a type-safe
> HashMap<> (which should actually be faster)
> - The listener list management is now much simpler (and more efficient, as
> some of the functionality wasn't ever used anywhere).
> But I could not tackle reducing the amount of listeners and the cloning, as I
> am not deep enough into Batik internals. I understand they are meant to
> propagate changes to the symbol to all the copies, but maybe we can instead
> have one shared listener on the <symbol> tag for all the <use> tags, not one
> listener per <use> tag?
> Without using '<symbol>' and '<use>', performance is much better. It makes
> the file harder to edit, and twice as large. :-(
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]