On Fri, 2022-09-23 at 10:54 -0700, Eric Norman wrote:
> Hi Robert,
> 
> Thanks for clarifying that requiring those 3 additional bundles was
> intentional.  I've confirmed that the starter project builds ok and
> all the
> tests are passing after adding those to the distribution.
> 
> I don't know all the history, but I just thought it was a bit odd
> that a
> private copy of some of the dependencies are embedded in the xss
> bundle and
> others are not.
> 
> In other words, is the XSS bundle trying to be a self-contained uber
> bundle, a slim bundle with the dependencies provided externally, or
> somewhere in between to workaround some problems?

The XSS bundle initially inlined many dependencies because we tried
hard to control the XML stack used for parsing. This has been
problematic historically. Some of the history may not be 100% clear
because this was intially developed internally at Adobe and then
contributed to Sling. Note that we were embedding an XML parser and the
XML APIs jar [1], overriding some of the XML discovery classes [2] and
playing tricks with the classloader for IBM JVMs[3].

IMO the XSS bundle should be slim in terms of dependencies since it's
used by many higher-level bundles. At the same time I prefer not
bundling its dependencies unless needed since that makes upgrading
versions harder.

> 
> The 2.2.20 version has 6.831.549 bytes, while
> > the new one has 4.007.522 bytes. And the good part is that these
> > dependencies are now bundles, so they can be updated without
> > needing to
> > cut a new Sling XSS update.
> > 
> 
> A smaller bundle is indeed an improvement.  Though, ~4MB still seems
> a bit
> large for what this is doing.
> 
> This probably wasn't a primary goal for this change, but was there
> consideration of reducing those duplicate private classes in the XSS
> bundle?  Looks like there are private copies of google-guava, batik,
> various commons-* stuff and a few others embedded there.

We embed Guava because the Java HTML Sanitizer needs it and we
can't/don't want to ships this as a bundle in Sling due to be
problematic versioning policy they have.

There are indeed jars inlined that we could take a look at, but that
was not the goal for the change. I think batik is the greatest
"offender", but since we don't deploy it in the Starter we are not
losing anything for now.

Thanks,
Robert

> 

[1]:
https://github.com/apache/sling-org-apache-sling-xss/blob/d2f92e9de35c66a27caf482f2e1567ac274391d3/pom.xml#L117-L124
[2]:
https://github.com/apache/sling-org-apache-sling-xss/tree/org.apache.sling.xss-2.2.20/src/main/java/javax/xml
[3]:
https://github.com/apache/sling-org-apache-sling-xss/blob/d2f92e9de35c66a27caf482f2e1567ac274391d3/src/main/java/org/apache/sling/xss/impl/PolicyHandler.java#L50-L67
> Regards,
> -Eric
> 
> 
> On Fri, Sep 23, 2022 at 9:22 AM Robert Munteanu <romb...@apache.org>
> wrote:
> 
> > Hi Eric,
> > 
> > On Thu, 2022-09-22 at 11:18 -0700, Eric Norman wrote:
> > > Can someone provide any details about what new dependencies this
> > > version of
> > > XSS API is expected to require?
> > 
> > The main change is that we have moved from the AntiSamy library to
> > the
> > OWASP Java HTML Sanitizer. More details and discussion at [1] and
> > [2].
> > 
> > We keep using the same format for the configuration file, which
> > means
> > that we have to parse the XML somehow. The library that was
> > selected
> > was Jackson - it's widely used, has a good community, and
> > permissively
> > licensed.
> > 
> > The woodstox dependency comes from Jackson. Some time ago I had
> > pushed
> > a commit that updated the Sling Starter to have the additional
> > needed
> > dependencies [3].
> > 
> > That may look like a downside, but keep in mind that the "old"
> > Sling
> > XSS bundle embedded many more dependencies, including XML parsers
> > and
> > was very hard to evolve. The 2.2.20 version has 6.831.549 bytes,
> > while
> > the new one has 4.007.522 bytes. And the good part is that these
> > dependencies are now bundles, so they can be updated without
> > needing to
> > cut a new Sling XSS update.
> > 
> > > 
> > > I couldn't plug the SNAPSHOT build into the starter as it would
> > > not
> > > build
> > > due to unresolved dependencies on packages from jackson-
> > > dataformat-
> > > xml
> > > (and woodstox?).
> > > 
> > > Is there any chance that the code could be refactored to not
> > > require
> > > those
> > > additional dependencies?
> > 
> > Jackson is pretty core to the refactoring. I'm not saying it's
> > impossible, but someone would have to do the legwork for that.
> > 
> > Looking at the Jackon pom it seems that we don't _have to_ use
> > Woodstox, but the code does not seem to work without it right now.
> > 
> > Thanks,
> > Robert
> > 
> > 
> > [1]: https://issues.apache.org/jira/browse/SLING-7231
> > [2]: https://github.com/apache/sling-org-apache-sling-xss/pull/28
> > [3]:
> > 
> > https://github.com/rombert/sling-org-apache-sling-starter/commit/0372ef37203086dc1b930bffe86f0ee1a81ceaf9
> > [4]:
> > 
> > https://github.com/FasterXML/jackson-dataformat-xml/blob/42f584f05e1e8566f79dc7251d6cb826f2505c13/pom.xml#L78-L81
> > 
> > > 
> > > Regards,
> > > -Eric
> > > 
> > > On Wed, Sep 21, 2022 at 5:44 AM Robert Munteanu
> > > <romb...@apache.org>
> > > wrote:
> > > 
> > > > Hi,
> > > > 
> > > > I just merged the XSS API changes for removing the AntiSamy
> > > > library
> > > > and
> > > > using the OWASP Java HTML Sanitizer instead [1], [2]. (Thanks,
> > > > Tatyana!)
> > > > 
> > > > As it's a big change and, even though it's coming with lots of
> > > > tests, I
> > > > would like to wait a bit before releasing.
> > > > 
> > > > I would like to start the release vote in a week from now, on
> > > > 28.09.
> > > > 
> > > > Please let me know if you discover any release blockers before.
> > > > 
> > > > Thanks,
> > > > Robert
> > > > 
> > > > [1]: https://issues.apache.org/jira/browse/SLING-7231
> > > > [2]:
> > > > https://github.com/apache/sling-org-apache-sling-xss/pull/28
> > > > 
> > 
> > 

Reply via email to