On 1/12/2022 8:31 AM, Jan Høydahl wrote:
I think there are lots of pieces of code in solr-core that can easily be 
extracted the same way.
Some perhaps even for 9.0.0, as it slims down the core and reduces attack 
surface for most users as well.

I think it would be really awesome if we had a core download that only included basic functionality, and all the other fancy things that Solr does now out of the box (as well as those that are contrib) could be added after download via package scripting or just additional downloads.

The size of solr-8.11.1.tgz is 207MiB, or 218076598 bytes. The .zip version is slightly larger. 8.0.0 was 163MiB, 7.0.0 was 142MiBm, 6.0.0 was 131MiB, and 1.4.1 was 53.7MiB. I think it's insane that the download is so big ... and a lot of what makes it big are things that the vast majority of our users will never use.

Large reductions in the overall size of the main download would be possible by putting hadoop, calcite, some of the really large lucene analysis components, and the contrib stuff into packages. The extraction contrib alone is 43.5MiB compressed in zip format.

I would suggest moving zookeeper and its dependencies as well, but I think we probably want SolrCloud to be part of base functionality.

Some of the large jars are included for what are probably insignificant usages, and I wonder if that functionality could be replaced by newer native functions available in Java 8 and later. I am eyeballing things like guava and the commons-* jars here, but I am sure there are other things in this category. I'd like to eliminate as many dependencies as we can.

Extracting some things from the solr-core jar into other jars sounds like a really awesome idea.

I don't think the solr-core jar should be in the dist directory. It's useless by itself, because it will still have a LOT of dependencies even if we shrink it. And there are likely other things in the dist directory that fall into that category. The test framework and its dependencies are a good candidate for removal.

By removing some of the low-hanging fruit that I am SURE isn't needed for base binary functionality on the 8.11.1 download, I was able to end up with a .zip file sized in at 60.4MiB, and I am sure at least a little bit of further reduction is possible if we can fully map out dependencies. I think we can leverage gradle to provide some dependency info.

Exactly how to organize the code repo to create divided artifacts is something that we would need to think about. My initial idea is changing "contrib" to "package" and then making some new directories under package.

Thanks,
Shawn

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to