Re: Intent to release Sling XSS API 2.3.0

2022-09-26 Thread Robert Munteanu
On Fri, 2022-09-23 at 10:54 -0700, Eric Norman wrote:
> Hi Robert,
> 
> Thanks for clarifying that requiring those 3 additional bundles was
> intentional.  I've confirmed that the starter project builds ok and
> all the
> tests are passing after adding those to the distribution.
> 
> I don't know all the history, but I just thought it was a bit odd
> that a
> private copy of some of the dependencies are embedded in the xss
> bundle and
> others are not.
> 
> In other words, is the XSS bundle trying to be a self-contained uber
> bundle, a slim bundle with the dependencies provided externally, or
> somewhere in between to workaround some problems?

The XSS bundle initially inlined many dependencies because we tried
hard to control the XML stack used for parsing. This has been
problematic historically. Some of the history may not be 100% clear
because this was intially developed internally at Adobe and then
contributed to Sling. Note that we were embedding an XML parser and the
XML APIs jar [1], overriding some of the XML discovery classes [2] and
playing tricks with the classloader for IBM JVMs[3].

IMO the XSS bundle should be slim in terms of dependencies since it's
used by many higher-level bundles. At the same time I prefer not
bundling its dependencies unless needed since that makes upgrading
versions harder.

> 
> The 2.2.20 version has 6.831.549 bytes, while
> > the new one has 4.007.522 bytes. And the good part is that these
> > dependencies are now bundles, so they can be updated without
> > needing to
> > cut a new Sling XSS update.
> > 
> 
> A smaller bundle is indeed an improvement.  Though, ~4MB still seems
> a bit
> large for what this is doing.
> 
> This probably wasn't a primary goal for this change, but was there
> consideration of reducing those duplicate private classes in the XSS
> bundle?  Looks like there are private copies of google-guava, batik,
> various commons-* stuff and a few others embedded there.

We embed Guava because the Java HTML Sanitizer needs it and we
can't/don't want to ships this as a bundle in Sling due to be
problematic versioning policy they have.

There are indeed jars inlined that we could take a look at, but that
was not the goal for the change. I think batik is the greatest
"offender", but since we don't deploy it in the Starter we are not
losing anything for now.

Thanks,
Robert

> 

[1]:
https://github.com/apache/sling-org-apache-sling-xss/blob/d2f92e9de35c66a27caf482f2e1567ac274391d3/pom.xml#L117-L124
[2]:
https://github.com/apache/sling-org-apache-sling-xss/tree/org.apache.sling.xss-2.2.20/src/main/java/javax/xml
[3]:
https://github.com/apache/sling-org-apache-sling-xss/blob/d2f92e9de35c66a27caf482f2e1567ac274391d3/src/main/java/org/apache/sling/xss/impl/PolicyHandler.java#L50-L67
> Regards,
> -Eric
> 
> 
> On Fri, Sep 23, 2022 at 9:22 AM Robert Munteanu 
> wrote:
> 
> > Hi Eric,
> > 
> > On Thu, 2022-09-22 at 11:18 -0700, Eric Norman wrote:
> > > Can someone provide any details about what new dependencies this
> > > version of
> > > XSS API is expected to require?
> > 
> > The main change is that we have moved from the AntiSamy library to
> > the
> > OWASP Java HTML Sanitizer. More details and discussion at [1] and
> > [2].
> > 
> > We keep using the same format for the configuration file, which
> > means
> > that we have to parse the XML somehow. The library that was
> > selected
> > was Jackson - it's widely used, has a good community, and
> > permissively
> > licensed.
> > 
> > The woodstox dependency comes from Jackson. Some time ago I had
> > pushed
> > a commit that updated the Sling Starter to have the additional
> > needed
> > dependencies [3].
> > 
> > That may look like a downside, but keep in mind that the "old"
> > Sling
> > XSS bundle embedded many more dependencies, including XML parsers
> > and
> > was very hard to evolve. The 2.2.20 version has 6.831.549 bytes,
> > while
> > the new one has 4.007.522 bytes. And the good part is that these
> > dependencies are now bundles, so they can be updated without
> > needing to
> > cut a new Sling XSS update.
> > 
> > > 
> > > I couldn't plug the SNAPSHOT build into the starter as it would
> > > not
> > > build
> > > due to unresolved dependencies on packages from jackson-
> > > dataformat-
> > > xml
> > > (and woodstox?).
> > > 
> > > Is there any chance that the code could be refactored to not
> > > require
> > > those
> > > additional dependencies?
> > 
> > Jackson is pretty core to the refactoring. I'm not saying it's
> > impossible, but someone would have to do the legwork for that.
> > 
> > Looking at the Jackon pom it seems that we don't _have to_ use
> > Woodstox, but the code does not seem to work without it right now.
> > 
> > Thanks,
> > Robert
> > 
> > 
> > [1]: https://issues.apache.org/jira/browse/SLING-7231
> > [2]: https://github.com/apache/sling-org-apache-sling-xss/pull/28
> > [3]:
> > 
> > https://github.com/rombert/sling-org-apache-sling-starter/commit/0372ef37203086dc1b930bffe86f0

Re: Intent to release Sling XSS API 2.3.0

2022-09-23 Thread Eric Norman
Hi Robert,

Thanks for clarifying that requiring those 3 additional bundles was
intentional.  I've confirmed that the starter project builds ok and all the
tests are passing after adding those to the distribution.

I don't know all the history, but I just thought it was a bit odd that a
private copy of some of the dependencies are embedded in the xss bundle and
others are not.

In other words, is the XSS bundle trying to be a self-contained uber
bundle, a slim bundle with the dependencies provided externally, or
somewhere in between to workaround some problems?

The 2.2.20 version has 6.831.549 bytes, while
> the new one has 4.007.522 bytes. And the good part is that these
> dependencies are now bundles, so they can be updated without needing to
> cut a new Sling XSS update.
>

A smaller bundle is indeed an improvement.  Though, ~4MB still seems a bit
large for what this is doing.

This probably wasn't a primary goal for this change, but was there
consideration of reducing those duplicate private classes in the XSS
bundle?  Looks like there are private copies of google-guava, batik,
various commons-* stuff and a few others embedded there.

Regards,
-Eric


On Fri, Sep 23, 2022 at 9:22 AM Robert Munteanu  wrote:

> Hi Eric,
>
> On Thu, 2022-09-22 at 11:18 -0700, Eric Norman wrote:
> > Can someone provide any details about what new dependencies this
> > version of
> > XSS API is expected to require?
>
> The main change is that we have moved from the AntiSamy library to the
> OWASP Java HTML Sanitizer. More details and discussion at [1] and [2].
>
> We keep using the same format for the configuration file, which means
> that we have to parse the XML somehow. The library that was selected
> was Jackson - it's widely used, has a good community, and permissively
> licensed.
>
> The woodstox dependency comes from Jackson. Some time ago I had pushed
> a commit that updated the Sling Starter to have the additional needed
> dependencies [3].
>
> That may look like a downside, but keep in mind that the "old" Sling
> XSS bundle embedded many more dependencies, including XML parsers and
> was very hard to evolve. The 2.2.20 version has 6.831.549 bytes, while
> the new one has 4.007.522 bytes. And the good part is that these
> dependencies are now bundles, so they can be updated without needing to
> cut a new Sling XSS update.
>
> >
> > I couldn't plug the SNAPSHOT build into the starter as it would not
> > build
> > due to unresolved dependencies on packages from jackson-dataformat-
> > xml
> > (and woodstox?).
> >
> > Is there any chance that the code could be refactored to not require
> > those
> > additional dependencies?
>
> Jackson is pretty core to the refactoring. I'm not saying it's
> impossible, but someone would have to do the legwork for that.
>
> Looking at the Jackon pom it seems that we don't _have to_ use
> Woodstox, but the code does not seem to work without it right now.
>
> Thanks,
> Robert
>
>
> [1]: https://issues.apache.org/jira/browse/SLING-7231
> [2]: https://github.com/apache/sling-org-apache-sling-xss/pull/28
> [3]:
>
> https://github.com/rombert/sling-org-apache-sling-starter/commit/0372ef37203086dc1b930bffe86f0ee1a81ceaf9
> [4]:
>
> https://github.com/FasterXML/jackson-dataformat-xml/blob/42f584f05e1e8566f79dc7251d6cb826f2505c13/pom.xml#L78-L81
>
> >
> > Regards,
> > -Eric
> >
> > On Wed, Sep 21, 2022 at 5:44 AM Robert Munteanu 
> > wrote:
> >
> > > Hi,
> > >
> > > I just merged the XSS API changes for removing the AntiSamy library
> > > and
> > > using the OWASP Java HTML Sanitizer instead [1], [2]. (Thanks,
> > > Tatyana!)
> > >
> > > As it's a big change and, even though it's coming with lots of
> > > tests, I
> > > would like to wait a bit before releasing.
> > >
> > > I would like to start the release vote in a week from now, on
> > > 28.09.
> > >
> > > Please let me know if you discover any release blockers before.
> > >
> > > Thanks,
> > > Robert
> > >
> > > [1]: https://issues.apache.org/jira/browse/SLING-7231
> > > [2]: https://github.com/apache/sling-org-apache-sling-xss/pull/28
> > >
>
>


Re: Intent to release Sling XSS API 2.3.0

2022-09-23 Thread Robert Munteanu
Hi Eric,

On Thu, 2022-09-22 at 11:18 -0700, Eric Norman wrote:
> Can someone provide any details about what new dependencies this
> version of
> XSS API is expected to require?

The main change is that we have moved from the AntiSamy library to the
OWASP Java HTML Sanitizer. More details and discussion at [1] and [2].

We keep using the same format for the configuration file, which means
that we have to parse the XML somehow. The library that was selected
was Jackson - it's widely used, has a good community, and permissively
licensed.

The woodstox dependency comes from Jackson. Some time ago I had pushed
a commit that updated the Sling Starter to have the additional needed
dependencies [3].

That may look like a downside, but keep in mind that the "old" Sling
XSS bundle embedded many more dependencies, including XML parsers and
was very hard to evolve. The 2.2.20 version has 6.831.549 bytes, while
the new one has 4.007.522 bytes. And the good part is that these
dependencies are now bundles, so they can be updated without needing to
cut a new Sling XSS update.

> 
> I couldn't plug the SNAPSHOT build into the starter as it would not
> build
> due to unresolved dependencies on packages from jackson-dataformat-
> xml
> (and woodstox?).
> 
> Is there any chance that the code could be refactored to not require
> those
> additional dependencies?

Jackson is pretty core to the refactoring. I'm not saying it's
impossible, but someone would have to do the legwork for that.

Looking at the Jackon pom it seems that we don't _have to_ use
Woodstox, but the code does not seem to work without it right now.

Thanks,
Robert


[1]: https://issues.apache.org/jira/browse/SLING-7231
[2]: https://github.com/apache/sling-org-apache-sling-xss/pull/28
[3]:
https://github.com/rombert/sling-org-apache-sling-starter/commit/0372ef37203086dc1b930bffe86f0ee1a81ceaf9
[4]:
https://github.com/FasterXML/jackson-dataformat-xml/blob/42f584f05e1e8566f79dc7251d6cb826f2505c13/pom.xml#L78-L81

> 
> Regards,
> -Eric
> 
> On Wed, Sep 21, 2022 at 5:44 AM Robert Munteanu 
> wrote:
> 
> > Hi,
> > 
> > I just merged the XSS API changes for removing the AntiSamy library
> > and
> > using the OWASP Java HTML Sanitizer instead [1], [2]. (Thanks,
> > Tatyana!)
> > 
> > As it's a big change and, even though it's coming with lots of
> > tests, I
> > would like to wait a bit before releasing.
> > 
> > I would like to start the release vote in a week from now, on
> > 28.09.
> > 
> > Please let me know if you discover any release blockers before.
> > 
> > Thanks,
> > Robert
> > 
> > [1]: https://issues.apache.org/jira/browse/SLING-7231
> > [2]: https://github.com/apache/sling-org-apache-sling-xss/pull/28
> > 



Re: Intent to release Sling XSS API 2.3.0

2022-09-22 Thread Eric Norman
Can someone provide any details about what new dependencies this version of
XSS API is expected to require?

I couldn't plug the SNAPSHOT build into the starter as it would not build
due to unresolved dependencies on packages from jackson-dataformat-xml
(and woodstox?).

Is there any chance that the code could be refactored to not require those
additional dependencies?

Regards,
-Eric

On Wed, Sep 21, 2022 at 5:44 AM Robert Munteanu  wrote:

> Hi,
>
> I just merged the XSS API changes for removing the AntiSamy library and
> using the OWASP Java HTML Sanitizer instead [1], [2]. (Thanks,
> Tatyana!)
>
> As it's a big change and, even though it's coming with lots of tests, I
> would like to wait a bit before releasing.
>
> I would like to start the release vote in a week from now, on 28.09.
>
> Please let me know if you discover any release blockers before.
>
> Thanks,
> Robert
>
> [1]: https://issues.apache.org/jira/browse/SLING-7231
> [2]: https://github.com/apache/sling-org-apache-sling-xss/pull/28
>


Intent to release Sling XSS API 2.3.0

2022-09-21 Thread Robert Munteanu
Hi,

I just merged the XSS API changes for removing the AntiSamy library and
using the OWASP Java HTML Sanitizer instead [1], [2]. (Thanks,
Tatyana!)

As it's a big change and, even though it's coming with lots of tests, I
would like to wait a bit before releasing.

I would like to start the release vote in a week from now, on 28.09.

Please let me know if you discover any release blockers before.

Thanks,
Robert

[1]: https://issues.apache.org/jira/browse/SLING-7231
[2]: https://github.com/apache/sling-org-apache-sling-xss/pull/28