Hi,

These are impressive savings!

Out of curiosity, we don't build the index incrementally using Maven's IndexReader, do we? That's why we download the whole index, right?

Thanks,
Antonio


[1]

https://maven.apache.org/maven-indexer/indexer-reader/apidocs/org/apache/maven/index/reader/IndexReader.html

On 17/3/23 11:06, Michael Bien wrote:
Hello everyone,

I experimented a bit with the maven index extraction process and got some pretty good results (I think).

There might be a way to filter the index during extraction without noteworthy overhead, which allows the following:

 - "sliding window" time filters, e.g drop all documents older than 2 years (aka: who uses old libraries?)

 - we can drop fields we don't need from the index. Esp interesting for fields which don't compress well (looking at you, sha1 hash)

some results for the time cutoff filter:

full: 5.6 GB
2y: 2.6 GB
1y: 1.4 GB

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@netbeans.apache.org
For additional commands, e-mail: dev-h...@netbeans.apache.org

For further information about the NetBeans mailing lists, visit:
https://cwiki.apache.org/confluence/display/NETBEANS/Mailing+lists



Reply via email to