[GitHub] [maven-indexer] mbien commented on pull request #302: filter in index reader and use update request for configuration

via GitHub Sun, 07 May 2023 22:46:06 -0700


mbien commented on PR #302:
URL: https://github.com/apache/maven-indexer/pull/302#issuecomment-1537786100

> Just to be clear: "So the filter can be also used for removing fields in
addition of whole documents." means that filter can transform passed in
Document instances? As that would be nasty side effect or plan misuse of API
IMHO. If we want to "transform" documents (and why not?) let's have a dedicated
API for that as well IMHO.

I removed that sentence now so that nobody is getting confused. This was
indeed the goal at first, however it doesn't work anyway. I solved it by
swapping out the `MinimalArtifactInfoIndexCreator` and adjusting the
[`updateDocument`](https://github.com/apache/maven-indexer/blob/41e88f874132a6bcae3dd034547b735b6a8a4c12/indexer-core/src/main/java/org/apache/maven/index/creator/MinimalArtifactInfoIndexCreator.java#L272)
method.

> As for removing SHA1, unsure why would one do it. How to "identify"
artifacts otherwise, or NB does not have such a use case?

Those hashes alone are >30% of the index size since noise compresses badly.
The idea is to use SMO for that, since the few usages for hashes right now are
all followed by subsequent downloads anyway (e.g find src/doc for dependency) -
so its essentially an online usecase.

A smaller index would also slightly speed up the merge when MT is enabled.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@maven.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [maven-indexer] mbien commented on pull request #302: filter in index reader and use update request for configuration

Reply via email to