Re: [JENKINS] Lucene-main-MacOSX (64bit/hotspot/jdk-21.0.1) - Build # 11500 - Still Failing!

2024-06-22 Thread Dawid Weiss
Thank you for digging, Uwe!

On Fri, Jun 21, 2024 at 10:24 PM Uwe Schindler  wrote:

> Hi,
>
> it looks like I was able to work around by putting:
>
> org.gradle.vfs.watch=false
>
> into the config file ~/.gradle/gradle.properties on the MacOS's Jenkins
> Node.
>
> This build seems to work again:
> https://jenkins.thetaphi.de/job/Lucene-main-MacOSX/11501/console
>
> Uwe
> Am 21.06.2024 um 18:48 schrieb Uwe Schindler:
>
> Hi,
>
> the issue is here: https://github.com/gradle/gradle/issues/29476
>
> Looks like they solved the issue. We may need to update to later Gradle
> 8.8 bugfix release. It looks like it's not yet available.
>
> https://github.com/gradle/gradle/pull/29514
>
> Jenkins is older than macos 11:
>
> serv1-vm2:~ jenkins$ sw_vers
> ProductName:Mac OS X
> ProductVersion: 10.14.6
> BuildVersion:   18G9323
>
> Uwe
>
> Am 21.06.2024 um 18:01 schrieb Uwe Schindler:
>
> Hi,
>
> it looks like since we changed Gradle to latest version (not sure which PR
> it was), all builds on MacOS X fail on Policeman Jenkins (its is x86-64 not
> ARM, older version of MacOSX as its hard to update due to VM issues with
> VirtualBOX). The whole JVM crushes when Gradle starts. The hserr.pid shows
> that Gradle loads a native library which causes an issue. The dylib file is
> provided by Gradle so the issue is clearly the dylib file shipped with
> Gradle.
>
> For now I disabled the builds. I have the feeling that Gradle has some NPE
> error in their own dynamic library "libnative-platform-file-events.dylib":
>
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x00012c06c000, pid=26586, tid=3587
> #
> # JRE version: OpenJDK Runtime Environment Temurin-11.0.21+9 (11.0.21+9)
> (build 11.0.21+9)
> # Java VM: OpenJDK 64-Bit Server VM Temurin-11.0.21+9 (11.0.21+9, mixed
> mode, tiered, compressed oops, parallel gc, bsd-amd64)
> # Problematic frame:
> # C  [libnative-platform-file-events.dylib+0x0] __dso_handle+0x0
> #
> # No core dump will be written. Core dumps have been disabled. To enable
> core dumping, try "ulimit -c unlimited" before starting Java again
>
> See attached file for full details, it also shows at end that the
> mentioned libnative-platform-file-events.dylib file is shipped with Gradle:
>
> 0x00012c06
> /Users/jenkins/.gradle/native/c067742578af261105cb4f569cf0c3c89f3d7b1fecec35dd04571415982c5e48/osx-amd64/libnative-platform.dylib
> 0x00012c06c000
> /Users/jenkins/.gradle/native/100fb08df4bc3b14c8652ba06237920a3bd2aa13389f12d3474272988ae205f9/osx-amd64/libnative-platform-file-events.dylib
>
> Solr builds have not yet updated Gradle, they build fine.
>
> Uwe
>
> Am 21.06.2024 um 15:58 schrieb Policeman Jenkins Server:
>
> Build: https://jenkins.thetaphi.de/job/Lucene-main-MacOSX/11500/
> Java: 64bit/hotspot/jdk-21.0.1 -XX:-UseCompressedOops -XX:+UseParallelGC
>
> No tests ran.
>
> -
> To unsubscribe, e-mail: builds-unsubscr...@lucene.apache.org
> For additional commands, e-mail: builds-h...@lucene.apache.org
>
>
>
> -
> To unsubscribe, e-mail:dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail:dev-h...@lucene.apache.org
>
>
> --
> Uwe Schindler
> Achterdiek 19, D-28357 Bremenhttps://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>


Re: Any recommended issues to work on for a newcomer?

2024-06-22 Thread Michael Wechner

Hi Hank

Sorry, I still did not find the time to try your code, but learned today 
about


https://rockset.com/Rockset_for_Hybrid_Search.pdf
https://rockset.com/whitepapers/hybrid-search-architecture/

which might be interesting to compare with.

Thanks

Michael



Am 20.05.24 um 08:16 schrieb Michael Wechner:

Hi Hank

Very cool, thank you, will try to do this asap!

All the best

Michael


Am 19.05.24 um 01:42 schrieb Chang Hank:

Hey Michael,

I wrote the first version of my idea about implementing RRF in 
Lucene, here the link of the code 
https://gist.github.com/hack4chang/ee2b37eab80bd82e574ff4f94ed204e9.
Right now I have some questions, one is about the shardIndex to be 
returned, another one is the TotalHits value, please take a look at 
the code and kindly leave some comments below.


Thanks,
Hank


On May 18, 2024, at 2:01 PM, Chang Hank  wrote:

Or maybe we can first create an issue and PR based on the issue number?
WDYT?

Best,

Hank

On May 18, 2024, at 11:29 AM, Chang Hank  
wrote:


Hey Michael,

Sorry I was a bit busy this week, but I’ve looked into the 
resources you provided and also some useful advice from Alessandro 
and Adrien.


I have a briefly understanding of how RRF works, but I’m not quite 
sure how we should implement it. Based on the advice from 
Alessandro and Adrien, it seems we need to consider that the search 
results are located at different shards. According to Alessandro, 
we should aggregate the ranked lists from all distributed nodes and 
then apply RRF.
Are we going to implement this aggregation logic inside our RRF 
method?


Also could you please create a PR so we can discuss more details 
further?


All the best,

Hank

On May 13, 2024, at 10:09 AM, Michael Wechner 
 wrote:


Great, sounds like we have plan :-)

Hank and I can get started trying to understand the internals 
better ...


Thanks

Michael

Am 13.05.24 um 18:21 schrieb Alessandro Benedetti:
Sure, we can make it work but in a distributed environment you 
have to run first each query distributed (aggregating all nodes) 
and then RRF on top of the aggregated ranked lists.
Doing RRF per node first and then aggregate per shard won't 
return the same results I suspect.

When I go back to working on the task I'll be able to elaborate more!

Cheers
--
*Alessandro Benedetti*
Director @ Sease Ltd.
/Apache Lucene/Solr Committer/
/Apache Solr PMC Member/

e-mail: a.benede...@sease.io/
/

*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io 
LinkedIn  | Twitter 
 | Youtube 
 | 
Github 



On Mon, 13 May 2024 at 14:12, Adrien Grand  wrote:

> Maybe Adrien Grand and others might also have some feedback
:-)

I'd suggest the signature to look something like `TopDocs
TopDocs#rrf(int topN, int k, TopDocs[] hits)` to be
consistent with `TopDocs#merge`. Internally, it should look
at `ScoreDoc#shardId` and `ScoreDoc#doc` to figure out which
hits map to the same document.

> Back in the day, I was reasoning on this and I didn't think
Lucene was the right place for an interleaving algorithm,
given that Reciprocal Rank Fusion is affected by distribution
and it's not supposed to work per node.

To me this is like `TopDocs#merge`. There are changes needed
on the application side to hook this call into the logic that
combines hits that come from multiple shards (multiple
queries in the case of RRF), but Lucene can still provide the
merging logic.

On Mon, May 13, 2024 at 1:41 PM Michael Wechner
 wrote:

Thanks for your feedback Alessandro!

I am using Lucene independent of Solr or OpenSearch,
Elasticsearch, but would like to combine different result
sets using RRF, therefore think that Lucene itself could
be a good place actually.

Looking forward to your additional elaboration!

Thanks

Michael





Am 13.05.2024 um 12:34 schrieb Alessandro Benedetti
:

This is not strictly related to Lucene, but I'll give a
talk at Berlin Buzzwords on how I am implementing
Reciprocal Rank Fusion in Apache Solr.
I'll resume my work on the contribution next week and
have more to share later.

Back in the day, I was reasoning on this and I didn't
think Lucene was the right place for an interleaving
algorithm, given that Reciprocal Rank Fusion is affected
by distribution and it's not supposed to work per node.
I think I evaluated the possibility of doing it as a
Lucene query or a Lucene component but then ended up
with a different approach.
I'll elaborate more when I go back to the task!

Cheers
--
*Alessandro Be