Hello all, In case you have not followed Hyperspace is a new indexing subsystem for Spark from Microsoft. It seemed like a very interesting project and I tried to explore if it can help us with an indexing option inside Hudi.
TL;DR : - Was exploring if hyperspace can be used an alternative for our record/bloom indexes - For the needle-in-a-haystack search i.e a single id out of all the records, hyperspace also seems to be not very effective atm (might not be surprising given the recommendations so far). - Index refresh still seems like non-incremental i.e rebuilding the entire index from scratch every time. - Our old workhorse BLOOM_INDEX still significantly outperforms. But we should really step on the gas for RFC-15 like efforts/RFC-08 to make this much faster, which gives us an incrementally updating version Everything said, Hyperspace is a very cool project and it is only going to get better over time. We have good ways of collaborating in the future. Any hyperspace folks (if lurking here), please chime in (it's worth a shot) You can find my experiments here. https://gist.github.com/vinothchandar/593b19c47bea2406b9a8a9aaed30775a Please keep the conversations to the mailing list, so everyone can chime in. Thanks Vinoth
