Re: Maximum score estimation

2024-05-22 Thread Mikhail Khludnev
I'm trying to understand Impacts. Need help.
https://github.com/apache/lucene/issues/5270#issuecomment-1223383919
Does it mean
advanceShallow(0)
getMaxScore(maxDoc-1)
gives a  good max score estem at least for a term query?

On Fri, May 10, 2024 at 11:21 PM Mikhail Khludnev  wrote:

> Hello Alessandro.
> Glad to hear!
> There's not much update from the previously published link: just a tiny
> test. Guessing max tf doesn't seem really reliable.
> However, I've got another idea:
> Can't Impacts give us an exact max score like
> https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/search/Scorer.html#getMaxScore(int)?
>
> I don't know if it's possible and how to do it.
>
> On Thu, May 9, 2024 at 6:11 PM Alessandro Benedetti 
> wrote:
>
>> Hi Mikhail,
>> I was thinking again about this regarding Hybrid Search in Solr and the
>> current
>> https://solr.apache.org/guide/solr/latest/query-guide/function-queries.html#scale-function
>> .
>> Was there any progress on this? Any traction?
>> Sooner or later I hope to get some funds to work on this, I keep you
>> updated!
>> I agree this would be useful in Learning To Rank and Hybrid Search in
>> general.
>> The current original score feature is unlikely to be useful if not
>> normalised per an estimated maximum score.
>>
>> Cheers
>> --
>> *Alessandro Benedetti*
>> Director @ Sease Ltd.
>> *Apache Lucene/Solr Committer*
>> *Apache Solr PMC Member*
>>
>> e-mail: a.benede...@sease.io
>>
>>
>> *Sease* - Information Retrieval Applied
>> Consulting | Training | Open Source
>>
>> Website: Sease.io <http://sease.io/>
>> LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
>> <https://twitter.com/seaseltd> | Youtube
>> <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
>> <https://github.com/seaseltd>
>>
>>
>> On Mon, 13 Feb 2023 at 12:47, Mikhail Khludnev  wrote:
>>
>>> Hello.
>>> Just FYI. I scratched a little prototype
>>> https://github.com/mkhludnev/likely/blob/main/src/test/java/org/apache/lucene/contrb/highly/TestLikelyReader.java#L53
>>> To estimate maximum possible score for the query against an index:
>>>  - it creates a virtual index (LikelyReader), which
>>>  - contains all terms from the original index with the same docCount
>>>  - matching all of these terms in the first doc (docnum=0) with the
>>> maximum termFreq (which estimating is a separate question).
>>> So, if we search over this LikelyReader we get a score estimate, which
>>> can hardly be exceeded by the same query over the original index.
>>> I suppose this might be useful for LTR as a better alternative to the
>>> query score feature.
>>>
>>> On Tue, Dec 6, 2022 at 10:02 AM Mikhail Khludnev 
>>> wrote:
>>>
>>>> Hello dev!
>>>> Users are interested in the meaning of absolute value of the score, but
>>>> we always reply that it's just relative value. Maximum score of matched
>>>> docs is not an answer.
>>>> Ultimately we need to measure how much sense a query has in the index.
>>>> e.g. [jet OR propulsion OR spider] query should be measured like
>>>> nonsense, because the best matching docs have much lower scores than
>>>> hypothetical (and assuming absent) doc matching [jet AND propulsion AND
>>>> spider].
>>>> Could it be a method that returns the maximum possible score if all
>>>> query terms would match. Something like stubbing postings on virtual
>>>> all_matching doc with average stats like tf and field length and kicks
>>>> scorers in? It reminds me something about probabilistic retrieval, but not
>>>> much. Is there anything like this already?
>>>>
>>>> --
>>>> Sincerely yours
>>>> Mikhail Khludnev
>>>>
>>>
>>>
>>> --
>>> Sincerely yours
>>> Mikhail Khludnev
>>>
>>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Maximum score estimation

2024-05-10 Thread Mikhail Khludnev
Hello Alessandro.
Glad to hear!
There's not much update from the previously published link: just a tiny
test. Guessing max tf doesn't seem really reliable.
However, I've got another idea:
Can't Impacts give us an exact max score like
https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/search/Scorer.html#getMaxScore(int)?

I don't know if it's possible and how to do it.

On Thu, May 9, 2024 at 6:11 PM Alessandro Benedetti 
wrote:

> Hi Mikhail,
> I was thinking again about this regarding Hybrid Search in Solr and the
> current
> https://solr.apache.org/guide/solr/latest/query-guide/function-queries.html#scale-function
> .
> Was there any progress on this? Any traction?
> Sooner or later I hope to get some funds to work on this, I keep you
> updated!
> I agree this would be useful in Learning To Rank and Hybrid Search in
> general.
> The current original score feature is unlikely to be useful if not
> normalised per an estimated maximum score.
>
> Cheers
> --
> *Alessandro Benedetti*
> Director @ Sease Ltd.
> *Apache Lucene/Solr Committer*
> *Apache Solr PMC Member*
>
> e-mail: a.benede...@sease.io
>
>
> *Sease* - Information Retrieval Applied
> Consulting | Training | Open Source
>
> Website: Sease.io <http://sease.io/>
> LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
> <https://twitter.com/seaseltd> | Youtube
> <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
> <https://github.com/seaseltd>
>
>
> On Mon, 13 Feb 2023 at 12:47, Mikhail Khludnev  wrote:
>
>> Hello.
>> Just FYI. I scratched a little prototype
>> https://github.com/mkhludnev/likely/blob/main/src/test/java/org/apache/lucene/contrb/highly/TestLikelyReader.java#L53
>> To estimate maximum possible score for the query against an index:
>>  - it creates a virtual index (LikelyReader), which
>>  - contains all terms from the original index with the same docCount
>>  - matching all of these terms in the first doc (docnum=0) with the
>> maximum termFreq (which estimating is a separate question).
>> So, if we search over this LikelyReader we get a score estimate, which
>> can hardly be exceeded by the same query over the original index.
>> I suppose this might be useful for LTR as a better alternative to the
>> query score feature.
>>
>> On Tue, Dec 6, 2022 at 10:02 AM Mikhail Khludnev  wrote:
>>
>>> Hello dev!
>>> Users are interested in the meaning of absolute value of the score, but
>>> we always reply that it's just relative value. Maximum score of matched
>>> docs is not an answer.
>>> Ultimately we need to measure how much sense a query has in the index.
>>> e.g. [jet OR propulsion OR spider] query should be measured like
>>> nonsense, because the best matching docs have much lower scores than
>>> hypothetical (and assuming absent) doc matching [jet AND propulsion AND
>>> spider].
>>> Could it be a method that returns the maximum possible score if all
>>> query terms would match. Something like stubbing postings on virtual
>>> all_matching doc with average stats like tf and field length and kicks
>>> scorers in? It reminds me something about probabilistic retrieval, but not
>>> much. Is there anything like this already?
>>>
>>> --
>>> Sincerely yours
>>> Mikhail Khludnev
>>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: Welcome Stefan Vodita as Lucene committter

2024-01-19 Thread Mikhail Khludnev
Welcome, Stefan!
Good choice, btw!

On Fri, Jan 19, 2024 at 10:03 PM Stefan Vodita 
wrote:

> Thank you all! It's an honor to join the project as a committer.
>
> I'm originally from a small town in southern Romania
> <https://maps.app.goo.gl/fz3Ju683kF91MmbT7>, so I'm really looking
> forward to seeing #12172 <https://github.com/apache/lucene/pull/12172>
> resolved, since both the characters in question (ș, ț)
> are supposed to show up in my name.
>
> In university <https://maps.app.goo.gl/D93VnxFEeRrZFHJs9>, I had
> professors who contributed to open software <https://github.com/unikraft>
> and I was
> lucky enough to be given a taste of the open source world. I had become a
> teaching assistant for a few of the courses (Data Structures, Control
> Theory),
> and it had crossed my mind to stay at the university. Then I got an offer
> to
> come work at Amazon, in Ireland
> <https://maps.app.goo.gl/25nSmtv87hZ2mcKv5>. They gave me a list of teams
> I could join that
> only had the names of the teams - I thought Search Engine Tech sounded the
> coolest. I was right! That's how I first learned about Lucene and started
> working with/on it. It's a privilege, Lucene is an amazing piece of
> software and
> I'm proud to be contributing.
>
> Outside programming, I like history and philosophy. I've been a voracious
> reader basically since I learned how to read. Recently, I've been going
> down
> a spiral of increasingly obscure books, but nothing has topped
> Dostoevsky's
> classic, The Brothers Karamazov
> <https://www.goodreads.com/en/book/show/4934>. Knowing books also happens
> to be useful
> for thinking up faceting examples
> <https://communityovercode.org/past-sessions/community-over-code-na-2023/#SH004>,
> so that's a plus.
> When I was in middle-school, I half-willingly went through 4 years of
> classical
> guitar training and was left with a life-long desire to be a good musician
> despite my inconsistent practice habits. Practice will have to wait until I
> finish up the next PR - looking forward to many more in the future!
>
> Cheers,
> Stefan
>
> On Thu, 18 Jan 2024 at 15:56, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> Hi Team,
>>
>> I'm pleased to announce that Stefan Vodita has accepted the Lucene PMC's
>> invitation to become a committer!
>>
>> Stefan, the tradition is that new committers introduce themselves with a
>> brief bio.
>>
>> Congratulations, welcome, and thank you for all your improvements to
>> Lucene and our community,
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: CVEs reported in Solr 8.11.2

2024-01-09 Thread Mikhail Khludnev
Hello Praveen,
IIRC this jar is used only by Tika (Solr Cell) module which is disabled by
default. So, it's up to user to turn on this vulnerability.

On Mon, Jan 8, 2024 at 9:55 AM Praveen Kamath 
wrote:

> Hey Team,
>
> Greetings for the day. This is Praveen from Acquia - one of your Solr
> customers.
> We recently ran an ORCA scan on our solr instances and got to know of
> several vulnerabilities in Lucene 8.11.2. I couldn't find any tickets
> regarding vulnerability reported in bcprov-jdk15on-1.69.jar (1.69):
> org.bouncycastle:bcprov-jdk15on library in your issue tracker
> <https://issues.apache.org/jira/>.
> I want to raise a ticket for this. Kindly help me with the process to do
> so.
>
> Thanks and regards,
> Praveen Kamath
> Staff Engineer, Acquia
>


-- 
Sincerely yours
Mikhail Khludnev


Re: ./crave pull .. 'heapdumps/* Fwd: [JENKINS] Solr » Solr-Check-9.x - Build # 5949 - Still Failing!

2023-12-02 Thread Mikhail Khludnev
Thanks Yuvraaj.
dev@, how to tweak jenkins script?

On Sat, Dec 2, 2023 at 9:25 PM Yuvraaj Kelkar  wrote:

> The new version of crave is in place and will be used automatically on the
> next invocation from Jenkins.
> Can you update the Jenkins script to call crave like this:
>
> ./crave pull --extra-rsync-flags ' --ignore-missing-args'
> '**/build/**/test/TEST-*.xml' '**/*.events' 'heapdumps/**' '**/hs_err_pid*'
>
>
> Release has been marked here:
> https://github.com/accupara/crave/releases/tag/0.2-6879
> <https://link.getmailspring.com/link/010a76bd-7093-4bfd-980e-990296657...@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Faccupara%2Fcrave%2Freleases%2Ftag%2F0.2-6879=ZGV2QGx1Y2VuZS5hcGFjaGUub3Jn>
>
> Thanks,
> -Uv
> On Dec 1 2023, at 11:10 am, Mikhail Khludnev  wrote:
>
> Make sense.
>
> [image: Sent from Mailspring]
> On Fri, Dec 1, 2023 at 7:56 PM Yuvraaj Kelkar  wrote:
>
> I think the second option is what we'll go for.
> I'm going to add a flag to pull that will allow the user to specify extra
> flags to be given to rsync.
> Then we can call crave pull like this:
> ./crave pull --extra-rsync-flags ' --ignore-missing-args'
> '**/build/**/test/TEST-*.xml' '**/*.events' 'heapdumps/**' '**/hs_err_pid*'
>
>
> *** Note the additional space before the hypen in ' --ignore-missing-args'
>  .
>
> This should handle the missing source files/directories.
>
> What do you think?
>
> Thanks,
> -Uv
>
> On Dec 1 2023, at 12:56 am, Mikhail Khludnev  wrote:
>
> Hello Yuvraaj,
> Thanks for taking care of this. Honestly it's not my wheelhouse.
> I seems like there's a consideration that a test getting out of heap will
> create heapdumps folder and put a file into. I don't know wether
> test/gradle can dump heap there ever. At least we don't have tests dumps
> heap there now. So, whether this folder exists or is absent is not certain.
> We have a few options:
>  - drop heapdumps/** from crave pull until someone needs to investigate a
> test falling out of memory.
>  - hack crave pull to ignore path wildcards for absent dir
>  - execute $mkdir heapdumps or  $mkdir -p heapdumps (depending on script's
> error handling more) before $crave pull
>
>
> On Thu, Nov 30, 2023 at 11:24 PM Yuvraaj Kelkar  wrote:
>
> I just started a build with crave:
> crave run ./gradlew --console=plain check integrationTests
>
> And at the end of it, looked for the patterns in the crave pull  command:
>
> admin@171074329f9e:/tmp/src/solr$ find . -name '*.events'
> admin@171074329f9e:/tmp/src/solr$ find . -name 'hs_err_pid*'
> admin@171074329f9e:/tmp/src/solr$
> admin@171074329f9e:/tmp/src/solr$ ls -l heapdumps
> ls: cannot access 'heapdumps': No such file or directory
>
>
> The only thing I could get a lot of output on was
>
> admin@171074329f9e:/tmp/src/solr$ find . | grep 'build.*test.TEST' | head
> ./solr/solr-ref-guide/build/test-results/test/TEST-org.apache.solr.client.ref_guide_examples.JsonRequestApiTest.xml
> ./solr/solr-ref-guide/build/test-results/test/TEST-org.apache.solr.client.ref_guide_examples.UsingSolrJRefGuideExamplesTest.xml
>  
> <https://link.getmailspring.com/link/a1463805-bfd9-43a5-bd19-08e9ae1a4...@getmailspring.com/0?redirect=TEST-org.apache.solr.client.ref_guide_examples.UsingSolrJRefGuideExamplesTest.xml=ZGV2QGx1Y2VuZS5hcGFjaGUub3Jn>
> ./solr/solr-ref-guide/build/test-results/test/TEST-org.apache.solr.client.ref_guide_examples.IndexingNestedDocuments.xml
> ./solr/solr-ref-guide/build/test-results/test/TEST-org.apache.solr.client.ref_guide_examples.ZkConfigFilesTest.xml
> ./solr/solr-ref-guide/build/test-results/test/TEST-org.apache.solr.client.ref_guide_examples.JsonRequestApiHeatmapFacetingTest.xml
> ./solr/prometheus-exporter/build/test-results/test/TEST-org.apache.solr.prometheus.exporter.SolrExporterIntegrationTest.xml
> ./solr/prometheus-exporter/build/test-results/test/TEST-org.apache.solr.prometheus.scraper.SolrStandaloneScraperBasicAuthTest.xml
> ./solr/prometheus-exporter/build/test-results/test/TEST-org.apache.solr.prometheus.exporter.MetricsQueryTemplateTest.xml
> ./solr/prometheus-exporter/build/test-results/test/TEST-org.apache.solr.prometheus.scraper.SolrStandaloneScraperTest.xml
> ./solr/prometheus-exporter/build/test-results/test/TEST-org.apache.solr.prometheus.scraper.SolrCloudScraperTest.xml
>
>
> Is there some other build command required to generate the other file
> patterns?
>
> Thanks,
> -Uv
>
> On Nov 30 2023, at 11:33 am, Yuvraaj Kelkar  wrote:
>
> Investigating.
>
> On Nov 26 2023, at 12:32 am, Mikhail Khludnev  wrote:
>
> Pardon
>
> On Sun, Nov 26, 2023 at 11:28 AM Gautam Worah 
> wrote:
>
> I think you meant to send it to d...@solr.apache.org?
>
> On Sun,

Re: ./crave pull .. 'heapdumps/* Fwd: [JENKINS] Solr » Solr-Check-9.x - Build # 5949 - Still Failing!

2023-12-01 Thread Mikhail Khludnev
Make sense.

On Fri, Dec 1, 2023 at 7:56 PM Yuvraaj Kelkar  wrote:

> I think the second option is what we'll go for.
> I'm going to add a flag to pull that will allow the user to specify extra
> flags to be given to rsync.
> Then we can call crave pull like this:
> ./crave pull --extra-rsync-flags ' --ignore-missing-args'
> '**/build/**/test/TEST-*.xml' '**/*.events' 'heapdumps/**' '**/hs_err_pid*'
>
>
> *** Note the additional space before the hypen in ' --ignore-missing-args'
>  .
>
> This should handle the missing source files/directories.
>
> What do you think?
>
> Thanks,
> -Uv
>
> On Dec 1 2023, at 12:56 am, Mikhail Khludnev  wrote:
>
> Hello Yuvraaj,
> Thanks for taking care of this. Honestly it's not my wheelhouse.
> I seems like there's a consideration that a test getting out of heap will
> create heapdumps folder and put a file into. I don't know wether
> test/gradle can dump heap there ever. At least we don't have tests dumps
> heap there now. So, whether this folder exists or is absent is not certain.
> We have a few options:
>  - drop heapdumps/** from crave pull until someone needs to investigate a
> test falling out of memory.
>  - hack crave pull to ignore path wildcards for absent dir
>  - execute $mkdir heapdumps or  $mkdir -p heapdumps (depending on script's
> error handling more) before $crave pull
>
>
> [image: Sent from Mailspring]
> On Thu, Nov 30, 2023 at 11:24 PM Yuvraaj Kelkar  wrote:
>
> I just started a build with crave:
> crave run ./gradlew --console=plain check integrationTests
>
> And at the end of it, looked for the patterns in the crave pull  command:
>
> admin@171074329f9e:/tmp/src/solr$ find . -name '*.events'
> admin@171074329f9e:/tmp/src/solr$ find . -name 'hs_err_pid*'
> admin@171074329f9e:/tmp/src/solr$
> admin@171074329f9e:/tmp/src/solr$ ls -l heapdumps
> ls: cannot access 'heapdumps': No such file or directory
>
>
> The only thing I could get a lot of output on was
>
> admin@171074329f9e:/tmp/src/solr$ find . | grep 'build.*test.TEST' | head
> ./solr/solr-ref-guide/build/test-results/test/TEST-org.apache.solr.client.ref_guide_examples.JsonRequestApiTest.xml
> ./solr/solr-ref-guide/build/test-results/test/TEST-org.apache.solr.client.ref_guide_examples.UsingSolrJRefGuideExamplesTest.xml
>  
> <https://link.getmailspring.com/link/a1463805-bfd9-43a5-bd19-08e9ae1a4...@getmailspring.com/0?redirect=TEST-org.apache.solr.client.ref_guide_examples.UsingSolrJRefGuideExamplesTest.xml=ZGV2QGx1Y2VuZS5hcGFjaGUub3Jn>
> ./solr/solr-ref-guide/build/test-results/test/TEST-org.apache.solr.client.ref_guide_examples.IndexingNestedDocuments.xml
> ./solr/solr-ref-guide/build/test-results/test/TEST-org.apache.solr.client.ref_guide_examples.ZkConfigFilesTest.xml
> ./solr/solr-ref-guide/build/test-results/test/TEST-org.apache.solr.client.ref_guide_examples.JsonRequestApiHeatmapFacetingTest.xml
> ./solr/prometheus-exporter/build/test-results/test/TEST-org.apache.solr.prometheus.exporter.SolrExporterIntegrationTest.xml
> ./solr/prometheus-exporter/build/test-results/test/TEST-org.apache.solr.prometheus.scraper.SolrStandaloneScraperBasicAuthTest.xml
> ./solr/prometheus-exporter/build/test-results/test/TEST-org.apache.solr.prometheus.exporter.MetricsQueryTemplateTest.xml
> ./solr/prometheus-exporter/build/test-results/test/TEST-org.apache.solr.prometheus.scraper.SolrStandaloneScraperTest.xml
> ./solr/prometheus-exporter/build/test-results/test/TEST-org.apache.solr.prometheus.scraper.SolrCloudScraperTest.xml
>
>
> Is there some other build command required to generate the other file
> patterns?
>
> Thanks,
> -Uv
>
> On Nov 30 2023, at 11:33 am, Yuvraaj Kelkar  wrote:
>
> Investigating.
>
> On Nov 26 2023, at 12:32 am, Mikhail Khludnev  wrote:
>
> Pardon
>
> On Sun, Nov 26, 2023 at 11:28 AM Gautam Worah 
> wrote:
>
> I think you meant to send it to d...@solr.apache.org?
>
> On Sun, Nov 26, 2023 at 12:24 AM Mikhail Khludnev  wrote:
>
> Hello
> It's rather like a logical error in crave pull. How to work around it?
>
> + status=0
> + ./crave pull '**/build/**/test/TEST-*.xml' '**/*.events' 'heapdumps/**' 
> '**/hs_err_pid*'
> Error: rsync: [sender] change_dir "/tmp/src/solr/heapdumps" failed: No such 
> file or directory (2)
> rsync error: some files/attrs were not transferred (see previous errors) 
> (code 23) at main.c(1682) [Receiver=3.1.3]
> rsync: [Receiver] write error: Broken pipe (32)
>
> + exit 0
>
>
> -- Forwarded message -
> From: *Apache Jenkins Server* 
> Date: Sun, Nov 26, 2023 at 11:17 AM
> Subject: [JENKINS] Solr » Solr-Check-9.x - Build # 5949 - Still Failing!
> To: 

Re: ./crave pull .. 'heapdumps/* Fwd: [JENKINS] Solr » Solr-Check-9.x - Build # 5949 - Still Failing!

2023-12-01 Thread Mikhail Khludnev
Hello Yuvraaj,
Thanks for taking care of this. Honestly it's not my wheelhouse.
I seems like there's a consideration that a test getting out of heap will
create heapdumps folder and put a file into. I don't know wether
test/gradle can dump heap there ever. At least we don't have tests dumps
heap there now. So, whether this folder exists or is absent is not certain.
We have a few options:
 - drop heapdumps/** from crave pull until someone needs to investigate a
test falling out of memory.
 - hack crave pull to ignore path wildcards for absent dir
 - execute $mkdir heapdumps or  $mkdir -p heapdumps (depending on script's
error handling more) before $crave pull


On Thu, Nov 30, 2023 at 11:24 PM Yuvraaj Kelkar  wrote:

> I just started a build with crave:
> crave run ./gradlew --console=plain check integrationTests
>
> And at the end of it, looked for the patterns in the crave pull  command:
>
> admin@171074329f9e:/tmp/src/solr$ find . -name '*.events'
> admin@171074329f9e:/tmp/src/solr$ find . -name 'hs_err_pid*'
> admin@171074329f9e:/tmp/src/solr$
> admin@171074329f9e:/tmp/src/solr$ ls -l heapdumps
> ls: cannot access 'heapdumps': No such file or directory
>
>
> The only thing I could get a lot of output on was
>
> admin@171074329f9e:/tmp/src/solr$ find . | grep 'build.*test.TEST' | head
> ./solr/solr-ref-guide/build/test-results/test/TEST-org.apache.solr.client.ref_guide_examples.JsonRequestApiTest.xml
> ./solr/solr-ref-guide/build/test-results/test/TEST-org.apache.solr.client.ref_guide_examples.UsingSolrJRefGuideExamplesTest.xml
>  
> <https://link.getmailspring.com/link/a1463805-bfd9-43a5-bd19-08e9ae1a4...@getmailspring.com/0?redirect=TEST-org.apache.solr.client.ref_guide_examples.UsingSolrJRefGuideExamplesTest.xml=ZGV2QGx1Y2VuZS5hcGFjaGUub3Jn>
> ./solr/solr-ref-guide/build/test-results/test/TEST-org.apache.solr.client.ref_guide_examples.IndexingNestedDocuments.xml
> ./solr/solr-ref-guide/build/test-results/test/TEST-org.apache.solr.client.ref_guide_examples.ZkConfigFilesTest.xml
> ./solr/solr-ref-guide/build/test-results/test/TEST-org.apache.solr.client.ref_guide_examples.JsonRequestApiHeatmapFacetingTest.xml
> ./solr/prometheus-exporter/build/test-results/test/TEST-org.apache.solr.prometheus.exporter.SolrExporterIntegrationTest.xml
> ./solr/prometheus-exporter/build/test-results/test/TEST-org.apache.solr.prometheus.scraper.SolrStandaloneScraperBasicAuthTest.xml
> ./solr/prometheus-exporter/build/test-results/test/TEST-org.apache.solr.prometheus.exporter.MetricsQueryTemplateTest.xml
> ./solr/prometheus-exporter/build/test-results/test/TEST-org.apache.solr.prometheus.scraper.SolrStandaloneScraperTest.xml
> ./solr/prometheus-exporter/build/test-results/test/TEST-org.apache.solr.prometheus.scraper.SolrCloudScraperTest.xml
>
>
> Is there some other build command required to generate the other file
> patterns?
>
> Thanks,
> -Uv
>
> On Nov 30 2023, at 11:33 am, Yuvraaj Kelkar  wrote:
>
> Investigating.
>
> On Nov 26 2023, at 12:32 am, Mikhail Khludnev  wrote:
>
> Pardon
>
> [image: Sent from Mailspring]
> On Sun, Nov 26, 2023 at 11:28 AM Gautam Worah 
> wrote:
>
> I think you meant to send it to d...@solr.apache.org?
>
> On Sun, Nov 26, 2023 at 12:24 AM Mikhail Khludnev  wrote:
>
> Hello
> It's rather like a logical error in crave pull. How to work around it?
>
> + status=0
> + ./crave pull '**/build/**/test/TEST-*.xml' '**/*.events' 'heapdumps/**' 
> '**/hs_err_pid*'
> Error: rsync: [sender] change_dir "/tmp/src/solr/heapdumps" failed: No such 
> file or directory (2)
> rsync error: some files/attrs were not transferred (see previous errors) 
> (code 23) at main.c(1682) [Receiver=3.1.3]
> rsync: [Receiver] write error: Broken pipe (32)
>
> + exit 0
>
>
> -- Forwarded message -
> From: *Apache Jenkins Server* 
> Date: Sun, Nov 26, 2023 at 11:17 AM
> Subject: [JENKINS] Solr » Solr-Check-9.x - Build # 5949 - Still Failing!
> To: 
>
>
> Build: https://ci-builds.apache.org/job/Solr/job/Solr-Check-9.x/5949/
>
> No tests ran.
>
> Build Log:
> [...truncated 1490 lines...]
> ERROR: Step ‘Publish JUnit test result report’ failed: No test report
> files were found. Configuration error?
> Email was triggered for: Failure - Any
> Sending email for trigger: Failure - Any
>
> -------------
> To unsubscribe, e-mail: builds-unsubscr...@solr.apache.org
> For additional commands, e-mail: builds-h...@solr.apache.org
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: ./crave pull .. 'heapdumps/* Fwd: [JENKINS] Solr » Solr-Check-9.x - Build # 5949 - Still Failing!

2023-11-26 Thread Mikhail Khludnev
Pardon

On Sun, Nov 26, 2023 at 11:28 AM Gautam Worah 
wrote:

> I think you meant to send it to d...@solr.apache.org?
>
> On Sun, Nov 26, 2023 at 12:24 AM Mikhail Khludnev  wrote:
>
>> Hello
>> It's rather like a logical error in crave pull. How to work around it?
>>
>> + status=0
>> + ./crave pull '**/build/**/test/TEST-*.xml' '**/*.events' 'heapdumps/**' 
>> '**/hs_err_pid*'
>> Error: rsync: [sender] change_dir "/tmp/src/solr/heapdumps" failed: No such 
>> file or directory (2)
>> rsync error: some files/attrs were not transferred (see previous errors) 
>> (code 23) at main.c(1682) [Receiver=3.1.3]
>> rsync: [Receiver] write error: Broken pipe (32)
>>
>> + exit 0
>>
>>
>> -- Forwarded message -
>> From: Apache Jenkins Server 
>> Date: Sun, Nov 26, 2023 at 11:17 AM
>> Subject: [JENKINS] Solr » Solr-Check-9.x - Build # 5949 - Still Failing!
>> To: 
>>
>>
>> Build: https://ci-builds.apache.org/job/Solr/job/Solr-Check-9.x/5949/
>>
>> No tests ran.
>>
>> Build Log:
>> [...truncated 1490 lines...]
>> ERROR: Step ‘Publish JUnit test result report’ failed: No test report
>> files were found. Configuration error?
>> Email was triggered for: Failure - Any
>> Sending email for trigger: Failure - Any
>>
>> -------------
>> To unsubscribe, e-mail: builds-unsubscr...@solr.apache.org
>> For additional commands, e-mail: builds-h...@solr.apache.org
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>
>

-- 
Sincerely yours
Mikhail Khludnev


./crave pull .. 'heapdumps/* Fwd: [JENKINS] Solr » Solr-Check-9.x - Build # 5949 - Still Failing!

2023-11-26 Thread Mikhail Khludnev
Hello
It's rather like a logical error in crave pull. How to work around it?

+ status=0
+ ./crave pull '**/build/**/test/TEST-*.xml' '**/*.events'
'heapdumps/**' '**/hs_err_pid*'
Error: rsync: [sender] change_dir "/tmp/src/solr/heapdumps" failed: No
such file or directory (2)
rsync error: some files/attrs were not transferred (see previous
errors) (code 23) at main.c(1682) [Receiver=3.1.3]
rsync: [Receiver] write error: Broken pipe (32)

+ exit 0


-- Forwarded message -
From: Apache Jenkins Server 
Date: Sun, Nov 26, 2023 at 11:17 AM
Subject: [JENKINS] Solr » Solr-Check-9.x - Build # 5949 - Still Failing!
To: 


Build: https://ci-builds.apache.org/job/Solr/job/Solr-Check-9.x/5949/

No tests ran.

Build Log:
[...truncated 1490 lines...]
ERROR: Step ‘Publish JUnit test result report’ failed: No test report files
were found. Configuration error?
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any

-
To unsubscribe, e-mail: builds-unsubscr...@solr.apache.org
For additional commands, e-mail: builds-h...@solr.apache.org


-- 
Sincerely yours
Mikhail Khludnev


Re: Welcome Patrick Zhai to the Lucene PMC

2023-11-11 Thread Mikhail Khludnev
Welcome, Patrick.

On Fri, Nov 10, 2023 at 11:05 PM Michael McCandless <
luc...@mikemccandless.com> wrote:

> I'm happy to announce that Patrick Zhai has accepted an invitation to join
> the Lucene Project Management Committee (PMC)!
>
> Congratulations Patrick, thank you for all your hard work improving
> Lucene's community and source code, and welcome aboard!
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Boolean field type

2023-11-09 Thread Mikhail Khludnev
Hello Michael.
This optimization "NOT the less common value" assumes that boolean field is
required, but how to enforce this mandatory field constraint in Lucene? I'm
not aware of something like Solr schema or mapping.
If saying foo:true is common, it means that the posting list goes like
dense sequentially increasing numbers 1,2,3,4,5.. May it already be
compressed by codecs like
https://lucene.apache.org/core/9_2_0/core/org/apache/lucene/util/packed/MonotonicBlockPackedWriter.html
?

On Thu, Nov 9, 2023 at 3:31 AM Michael Froh  wrote:

> Hey,
>
> I've been musing about ideas for a "clever" Boolean field type on Lucene
> for a while, and I think I might have an idea that could work. That said,
> this popped into my head this afternoon and has not been fully-baked. It
> may not be very clever at all.
>
> My experience is that Boolean fields tend to be overwhelmingly true or
> overwhelmingly false. I've had pretty good luck with using a keyword-style
> field, where the only term represents the more sparse value. (For example,
> I did a thing years ago with explicit tombstones, where versioned deletes
> would have the field "deleted" with a value of "true", and live
> documents didn't have the deleted field at all. Every query would add a
> filter on "NOT deleted:true".)
>
> That's great when you know up-front what the sparse value is going to be.
> Working on OpenSearch, I just created an issue suggesting that we take a
> hint from users for which value they think is going to be more common so we
> only index the less common one:
> https://github.com/opensearch-project/OpenSearch/issues/11143
>
> At the Lucene level, though, we could index a Boolean field type as the
> less common term when we flush (by counting the values and figuring out
> which is less common). Then, per segment, we can rewrite any query for the
> more common value as NOT the less common value.
>
> You can compute upper/lower bounds on the value frequencies cheaply during
> a merge, so I think you could usually write the doc IDs for the less common
> value directly (without needing to count them first), even when input
> segments disagree on which is the more common value.
>
> If your Boolean field is not overwhelmingly lopsided, you might even want
> to split segments to be 100% true or 100% false, such that queries against
> the Boolean field become match-all or match-none. On a retail website,
> maybe you have some toggle for "only show me results with property X" -- if
> all your property X products are in one segment or a handful of segments,
> you can drop the property X clause from the matching segments and skip the
> other segments.
>
> I guess one icky part of this compared to the usual Lucene field model is
> that I'm assuming a Boolean field is never missing (or I guess missing
> implies "false" by default?). Would that be a deal-breaker?
>
> Thanks,
> Froh
>


-- 
Sincerely yours
Mikhail Khludnev


Re: PackedInts functionalities

2023-10-17 Thread Mikhail Khludnev
Hello Tony
Is it possible to write a block of docfreqs and then a block of
postingoffsets?
Or why not write them as 10-bit integers and then split to quad and sextet
in the posting format code?

On Mon, Oct 16, 2023 at 11:50 PM Dongyu Xu  wrote:

> Hi devs,
>
> As I was working on https://github.com/apache/lucene/issues/12513 I
> needed to compress positive integers which are used to locate postings etc.
>
> To put it concretely, I will need to pack a few values per term
> contiguously and those values can have different bit-width. For example,
> consider that we need to encode docFreq and postingsStartOffset per term
> and docFreq takes 4 bit and the postingsStartOffset takes 6 bit. We
> expect to write the following for two terms.
>
> ```
> Term1 |  Term2
>
> docFreq(4bit) | postingsStartOffset(6bit) | docFreq(4bit) |
> postingsStartOffset(6bit)
>
> ```
>
> On the read path, I expect to locate the offest for a term first and
> followed by reading two values that have different bit-width.
>
> In the spirit of not re-inventing necessarily, I tried to explore the
> existing PackedInts util classes and I believe there is no support for this
> at the moment. The biggest gap I found is that the existing classes expect
> to write/read values of same bit-width.
>
> I'm writing to get feedback from yall to see if I missed anything.
>
> Cheers,
> Tony X
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Vector Search with OpenAI Embeddings: Lucene Is All You Need

2023-09-01 Thread Mikhail Khludnev
Thanks for sharing, Michael.
But can't we say that vector DBs may utilize GPUs that are hardly possible
with Lucene now?

On Fri, Sep 1, 2023 at 8:24 AM Kent Fitch  wrote:

> My testing shows Lucene's HNSW in a very positive light.  The ability to
> perform blended searches (vector/semantic and text) is valuable, even with
> high quality embeddings, and helps when the searcher's intent is to search
> for specific words or phrases (such as a name, or exact concepts) which get
> blurred-out by semantics.   I discussed blended searching using Lucene in
> this Code4Lib article: https://journal.code4lib.org/articles/17443
>
> And regarding performance, I have benchmarked Lucene's HNSW (circa Jan2023
> snapshot) on a test index of 192 million vectors of 1536 dimensions,
> reduced by PQ coding to 512 bytes and stored in HNSW.  Building this index
> was slow (lots of time merging...) but once it was built, it did fit
> entirely in memory (core i7-9800x (8 cores) with 128gb DDR4 memory running
> at 2400 MT/s) so no IO was required at search time.  (I modified the lucene
> similarity code to support expansion of each of the 512 PQ byte codes back
> to 3 floats for the distance calculation.)  I havent updated this to take
> advantage of the latest SIMD capability, but even so, once the HNSW
> structure is in memory, a single-threaded topK=10 search thread achieves
> 2.4 queries/second.  Two threads: 4.9 q/s, 4 threads: 7.2q/s, maxing out at
> 8 threads: 9.4 q/s.  I guess the non-linear scaling with threads is due to
> competition for memory bandwidth and cache.  Curiously, I'm not getting
> nearly as good performance out of the box using Milvus 2.3's diskANN, but I
> need to find out why before condemning it.
>
> Kent Fitch
>
> On Thu, Aug 31, 2023 at 7:53 PM Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> Thanks Michael, very interesting!  I of course agree that Lucene is all
>> you need, heh ;)
>>
>> Jimmy Lin also tweeted about the strength of Lucene's HNSW:
>> https://twitter.com/lintool/status/1681333664431460353?s=20
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Thu, Aug 31, 2023 at 3:31 AM Michael Wechner <
>> michael.wech...@wyona.com> wrote:
>>
>>> Hi Together
>>>
>>> You might be interesed in this paper / article
>>>
>>> https://arxiv.org/abs/2308.14963
>>>
>>> Thanks
>>>
>>> Michael
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>

-- 
Sincerely yours
Mikhail Khludnev


Re: How to retain % sign against numbers in lucene indexing/ search

2023-07-12 Thread Mikhail Khludnev
Hello Amitesh.
If StandardTokenizer does so (but it's worth to doublecheck on Solr Admin
Analysis screen), you can experiment with WhitespaceTokenizer.

On Wed, Jul 12, 2023 at 3:33 PM Amitesh Kumar  wrote:

> Hi Group,
>
> I am facing a requirement change to get % sign retained in searches. e.g
>
> Sample search docs:
> 1. Number of boys 50
> 2. My score was 50%
> 3. 40-50% for pass score
>
> Search query: 50%
> Expected results: Doc-2, Doc-3 i.e.
> My score was 50%
> 40-50% for pass score
>
> Actual result: All 4 documents
>
> On the implementation front, I am using a set of filters like
> lowerCaseFilter, EnglishPossessiveFilter etc in addition to base tokenizer
> StandardTokenizer.
>
> My analysis suggests, StandardTOkenizer strips off the %  sign and hence
> the behavior.Has someone faced similar requirements? Any help/guidance is
> highly appreciated.
>
> *Warm Regards,*
> *Amitesh  K*
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Committer Freebie

2023-07-05 Thread Mikhail Khludnev
Right. I've got Copilot working with no charge for my account included into
Apache org.
That "Request for Business'' seems like some different thing.
Thanks for the clue, Mark!

On Fri, Jun 16, 2023 at 2:47 AM Mark Miller  wrote:

> Hmm, sorry bout that, I assumed there would be no request. I must have
> requested way way back or something. I just went to the page where it used
> to ask me to pick a yearly or monthly payment and it said I don’t have to
> pay when it used to make me pick. But I just read it went GA, so that seems
> weird you’d still have to make a request. CopilotX (the chatgpt 4) version
> is definitely early access waiting list, but I swear I just read regular
> copilot went GA - that’s the article I was reading when I found out it was
> now free. That’s says copilot for business.  Is there a not for business
> version your missing?
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Welcome Chris Hegarty to the Lucene PMC

2023-06-19 Thread Mikhail Khludnev
Welcome, Chris.

On Mon, Jun 19, 2023 at 12:53 PM Adrien Grand  wrote:

> I'm pleased to announce that Chris Hegarty has accepted an invitation to
> join the Lucene PMC!
>
> Congratulations Chris, and welcome aboard!
>
> --
> Adrien
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Committer Freebie

2023-06-14 Thread Mikhail Khludnev
Thanks Xuang!

On Wed, Jun 14, 2023 at 3:58 PM Lu Xugang  wrote:

> Hi, Mikhail.  I submitted the request about one month ago and got free
> access recently. The request does not take effect immediately.
>
> Xugang
> https://www.amazingkoala.com.cn/
>
>
> Mikhail Khludnev  于2023年6月14日周三 15:37写道:
>
>> Hello Mark, thanks for the clue.
>> Do you know how to promote the request?
>> https://github.com/settings/copilot Mine is stuck in request submitted
>> state
>>
>>- [image: @apache]*apache*member
>>Request for Copilot for Business submitted.
>>
>>
>> On Tue, Jun 13, 2023 at 1:05 PM Mark Miller 
>> wrote:
>>
>>> Purely FYI
>>>
>>> Figured it’s worth sharing that committers now appear to have free
>>> access to GitHub Copilot.
>>>
>>> Didn’t seem to in the past - I used the free trial, didn’t find it worth
>>> paying the 100 bucks for it to be part of my current ecosystem of dev
>>> tools, but as I was on my way out, I saw this note that said if you were a
>>> committer on a popular GitHub OpenSource project, you got it for free.
>>>
>>> But it wanted my money. So a couple weeks ago I found some like sales
>>> contact form and I wrote some self serving rant about how outrageous the
>>> situation was. Common. Then forgot about it and went on. Common.
>>>
>>> But then I saw in my newsfeed the other day that it went GA or
>>> something. I thought I was GA, the waiting list is for CopilotX. They were
>>> allowing signups and taking money. So I clicked the news link, and low and
>>> behold, it said I didn’t have to pay. So I hope my indignation was the
>>> instigator, but probably they expanded the covered projects for this so
>>> called GA or something.
>>>
>>> If it extends to CopilotX, that will be a nice little freebie.
>>>
>>> Just don’t let Robert catch you with it. Or probably your employer. And
>>> it will hilariously pale in comparison to my custom Policeman IntelliJ
>>> Plugin that only outputs voice in a stunningly accurate Uwe voice clone,
>>> taking no input, just calling out violations in what you are currently
>>> working on.
>>>
>>> But it’s free, 10$ a month value. If it expands to CopilotX, much more
>>> value.
>>> --
>>> - MRM
>>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: Committer Freebie

2023-06-14 Thread Mikhail Khludnev
Hello Mark, thanks for the clue.
Do you know how to promote the request? https://github.com/settings/copilot
Mine is stuck in request submitted state

   - [image: @apache]*apache*member
   Request for Copilot for Business submitted.


On Tue, Jun 13, 2023 at 1:05 PM Mark Miller  wrote:

> Purely FYI
>
> Figured it’s worth sharing that committers now appear to have free access
> to GitHub Copilot.
>
> Didn’t seem to in the past - I used the free trial, didn’t find it worth
> paying the 100 bucks for it to be part of my current ecosystem of dev
> tools, but as I was on my way out, I saw this note that said if you were a
> committer on a popular GitHub OpenSource project, you got it for free.
>
> But it wanted my money. So a couple weeks ago I found some like sales
> contact form and I wrote some self serving rant about how outrageous the
> situation was. Common. Then forgot about it and went on. Common.
>
> But then I saw in my newsfeed the other day that it went GA or something.
> I thought I was GA, the waiting list is for CopilotX. They were allowing
> signups and taking money. So I clicked the news link, and low and behold,
> it said I didn’t have to pay. So I hope my indignation was the instigator,
> but probably they expanded the covered projects for this so called GA or
> something.
>
> If it extends to CopilotX, that will be a nice little freebie.
>
> Just don’t let Robert catch you with it. Or probably your employer. And it
> will hilariously pale in comparison to my custom Policeman IntelliJ Plugin
> that only outputs voice in a stunningly accurate Uwe voice clone, taking no
> input, just calling out violations in what you are currently working on.
>
> But it’s free, 10$ a month value. If it expands to CopilotX, much more
> value.
> --
> - MRM
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Question for customize index segment search order

2023-05-12 Thread Mikhail Khludnev
Hello, Wei.
Pardon for pinging you back to the Lucene field.
Here's the loop over segments
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L674
So, presumably:
 - custom searcher may loop segments out of order
 - custom wrapper over index reader may yield list of child contexts in
reverse order
 - some code around NTR commit may put recent segments in the beginning.
I'm not aware of any of these^ implementations, but it should be something
which is needed often.

On Fri, May 12, 2023 at 12:03 AM Wei  wrote:

> Hi ,
>
> We have a index that has multiple segments generated with continuous
> updates.  There is always a large dominant segment after index rebuild,
> then many small segments are generated with continuous updates.  At query
> time we apply early termination with EarlyTerminatingCollector
>
> https://github.com/apache/solr/blob/d9ddba3ac51ece953d762c796f62730e27629966/solr/core/src/java/org/apache/solr/search/EarlyTerminatingCollector.java
> ,
> which triggers EarlyTerminatingCollectorException in SolrIndexSearcher
>
> https://github.com/apache/solr/blob/d9ddba3ac51ece953d762c796f62730e27629966/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L281
> .
> We see a problem that the limit can be reached within the dominant segment
> alone (seems it is always traversed first) while documents with recent
> updates in the newer segments doesn't get a chance to be scored.  Is it
> possible to customize the segment visiting order in Solr so that the latest
> generated segments are searched first?  Any suggestion is appreciated.
>
> Thanks,
> Wei
>


-- 
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!


Re: Lucene PMC Chair Greg Miller

2023-03-07 Thread Mikhail Khludnev
Thank you, Bruno. Congratulations, Greg.

On Mon, Mar 6, 2023 at 8:16 PM Bruno Roustant  wrote:

> Hello Lucene developers,
>
> Lucene Program Management Committee has elected a new chair, Greg Miller,
> and the Board has approved.
>
> Greg, thank you for stepping up, and congratulations!
>
>
> - Bruno
>


-- 
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!


Re: multi-term synonym prevents single-term match -- known issue?

2023-02-19 Thread Mikhail Khludnev
Opened reproducer https://github.com/apache/lucene/pull/12157

On Mon, Feb 13, 2023 at 6:46 PM Mikhail Khludnev  wrote:

> It's time to summon Lucene devs
> https://issues.apache.org/jira/browse/SOLR-16652?focusedCommentId=17687998=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17687998
>
> it seems by design
> https://github.com/apache/lucene/blob/main/lucene/queryparser/src/test/org/apache/lucene/queryparser/classic/TestQueryParser.java#L591
> It sets mw synonym: "guinea pig => cavy"
> dumb.parse("guinea pig") => ((+field:guinea +field:pig) field:cavy)
> Doesn't match just 'guinea' as expected in this ticket.
>
> On Mon, Feb 13, 2023 at 5:33 PM Rudi Seitz  wrote:
>
>> Thanks Mikhail.
>> I think your directional approach ("foo bar=>baz,foo,bar") would work, but
>> we'd also need "baz=>baz,foo bar" for a complete workaround.
>> I've added your message as a comment on the ticket.
>> Rudi
>>
>> On Sat, Feb 11, 2023 at 12:34 PM Mikhail Khludnev 
>> wrote:
>>
>> > Thanks for raising a ticket. Here are just two considerations:
>> > > we could change the synonym rule to "foo bar,baz,foo,bar" but this
>> would
>> > mean that a query for "foo" could now match a document containing only
>> > "bar", which is not the intent of the original rule.
>> > Ok. The later issue can be probably fixed by directing synonyms
>> > foo bar=>baz,foo,bar
>> > Right, It seems like a weird band aid.
>> >
>> > I stepped through lucene code, MUST occur for synonyms is defined
>> >
>> >
>> https://github.com/apache/lucene/blob/7baa01b3c2f93e6b172e986aac8ef577a87ebceb/lucene/core/src/java/org/apache/lucene/util/QueryBuilder.java#L534
>> > Presumably, original terms could go with defaultOperator, and synonym
>> > replacement keep MUST.
>> >
>> >
>> >
>> >
>> >
>> > On Sat, Feb 11, 2023 at 12:17 AM Rudi Seitz  wrote:
>> >
>> > > Thanks Mikhail and Michael.
>> > > Based on your feedback, I created a ticket:
>> > > https://issues.apache.org/jira/browse/SOLR-16652
>> > > In the ticket, I mentioned why updating the synonym rule or setting
>> > > sow=true causes other problems in this case, unfortunately. I haven't
>> yet
>> > > looked through code to see where the behavior could be changed.
>> > > Rudi
>> > >
>> > >
>> > > On Fri, Feb 10, 2023 at 11:26 AM Michael Gibney <
>> > mich...@michaelgibney.net
>> > > >
>> > > wrote:
>> > >
>> > > > Rudi,
>> > > >
>> > > > I agree, this does not seem like how it should behave. Probably
>> > > > something that could be fixed in edismax, not something lower-level
>> > > > (Lucene)?
>> > > >
>> > > > Michael
>> > > >
>> > > > On Fri, Feb 10, 2023 at 9:38 AM Mikhail Khludnev 
>> > > wrote:
>> > > > >
>> > > > > Hello, Rudi.
>> > > > > Well, it doesn't seem perfect. Probably it's can be fixed
>> > > > > via
>> > > > > foo bar,zzz,foo,bar
>> > > > > And in some sort of sense this behavior is reasonable.
>> > > > > Also you can experiment with sow and pf params (the later param is
>> > > > > described in dismax page only).
>> > > > >
>> > > > > On Thu, Feb 9, 2023 at 8:19 PM Rudi Seitz 
>> > wrote:
>> > > > >
>> > > > > > Is this known behavior or is it worth a JIRA ticket?
>> > > > > >
>> > > > > > Searching against a text_general field in Solr 9.1, if my
>> edismax
>> > > > query is
>> > > > > > "foo bar" I should be able to get matches for "foo" without
>> "bar"
>> > and
>> > > > vice
>> > > > > > versa. However, if there happens to be a synonym rule applied at
>> > > query
>> > > > > > time, like "foo bar,zzz" I can no longer get single-term matches
>> > > > against
>> > > > > > "foo" or "bar." Both terms are now required, but can occur in
>> > either
>> > > > order.
>> > > > > > If we change the text_general analysis chain to apply synonyms
>> at
>> > > 

Re: multi-term synonym prevents single-term match -- known issue?

2023-02-13 Thread Mikhail Khludnev
It's time to summon Lucene devs
https://issues.apache.org/jira/browse/SOLR-16652?focusedCommentId=17687998=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17687998

it seems by design
https://github.com/apache/lucene/blob/main/lucene/queryparser/src/test/org/apache/lucene/queryparser/classic/TestQueryParser.java#L591
It sets mw synonym: "guinea pig => cavy"
dumb.parse("guinea pig") => ((+field:guinea +field:pig) field:cavy)
Doesn't match just 'guinea' as expected in this ticket.

On Mon, Feb 13, 2023 at 5:33 PM Rudi Seitz  wrote:

> Thanks Mikhail.
> I think your directional approach ("foo bar=>baz,foo,bar") would work, but
> we'd also need "baz=>baz,foo bar" for a complete workaround.
> I've added your message as a comment on the ticket.
> Rudi
>
> On Sat, Feb 11, 2023 at 12:34 PM Mikhail Khludnev  wrote:
>
> > Thanks for raising a ticket. Here are just two considerations:
> > > we could change the synonym rule to "foo bar,baz,foo,bar" but this
> would
> > mean that a query for "foo" could now match a document containing only
> > "bar", which is not the intent of the original rule.
> > Ok. The later issue can be probably fixed by directing synonyms
> > foo bar=>baz,foo,bar
> > Right, It seems like a weird band aid.
> >
> > I stepped through lucene code, MUST occur for synonyms is defined
> >
> >
> https://github.com/apache/lucene/blob/7baa01b3c2f93e6b172e986aac8ef577a87ebceb/lucene/core/src/java/org/apache/lucene/util/QueryBuilder.java#L534
> > Presumably, original terms could go with defaultOperator, and synonym
> > replacement keep MUST.
> >
> >
> >
> >
> >
> > On Sat, Feb 11, 2023 at 12:17 AM Rudi Seitz  wrote:
> >
> > > Thanks Mikhail and Michael.
> > > Based on your feedback, I created a ticket:
> > > https://issues.apache.org/jira/browse/SOLR-16652
> > > In the ticket, I mentioned why updating the synonym rule or setting
> > > sow=true causes other problems in this case, unfortunately. I haven't
> yet
> > > looked through code to see where the behavior could be changed.
> > > Rudi
> > >
> > >
> > > On Fri, Feb 10, 2023 at 11:26 AM Michael Gibney <
> > mich...@michaelgibney.net
> > > >
> > > wrote:
> > >
> > > > Rudi,
> > > >
> > > > I agree, this does not seem like how it should behave. Probably
> > > > something that could be fixed in edismax, not something lower-level
> > > > (Lucene)?
> > > >
> > > > Michael
> > > >
> > > > On Fri, Feb 10, 2023 at 9:38 AM Mikhail Khludnev 
> > > wrote:
> > > > >
> > > > > Hello, Rudi.
> > > > > Well, it doesn't seem perfect. Probably it's can be fixed
> > > > > via
> > > > > foo bar,zzz,foo,bar
> > > > > And in some sort of sense this behavior is reasonable.
> > > > > Also you can experiment with sow and pf params (the later param is
> > > > > described in dismax page only).
> > > > >
> > > > > On Thu, Feb 9, 2023 at 8:19 PM Rudi Seitz 
> > wrote:
> > > > >
> > > > > > Is this known behavior or is it worth a JIRA ticket?
> > > > > >
> > > > > > Searching against a text_general field in Solr 9.1, if my edismax
> > > > query is
> > > > > > "foo bar" I should be able to get matches for "foo" without "bar"
> > and
> > > > vice
> > > > > > versa. However, if there happens to be a synonym rule applied at
> > > query
> > > > > > time, like "foo bar,zzz" I can no longer get single-term matches
> > > > against
> > > > > > "foo" or "bar." Both terms are now required, but can occur in
> > either
> > > > order.
> > > > > > If we change the text_general analysis chain to apply synonyms at
> > > index
> > > > > > time instead of query time, this behavior goes away and
> single-term
> > > > matches
> > > > > > are again possible.
> > > > > >
> > > > > > To reproduce, use the _default configset with "foo bar,zzz" added
> > to
> > > > > > synonyms.txt. Index these four docs:
> > > > > >
> > > > > > {"id":"1", "title_txt":"foo"

Re: Maximum score estimation

2023-02-13 Thread Mikhail Khludnev
Hello.
Just FYI. I scratched a little prototype
https://github.com/mkhludnev/likely/blob/main/src/test/java/org/apache/lucene/contrb/highly/TestLikelyReader.java#L53
To estimate maximum possible score for the query against an index:
 - it creates a virtual index (LikelyReader), which
 - contains all terms from the original index with the same docCount
 - matching all of these terms in the first doc (docnum=0) with the maximum
termFreq (which estimating is a separate question).
So, if we search over this LikelyReader we get a score estimate, which can
hardly be exceeded by the same query over the original index.
I suppose this might be useful for LTR as a better alternative to the query
score feature.

On Tue, Dec 6, 2022 at 10:02 AM Mikhail Khludnev  wrote:

> Hello dev!
> Users are interested in the meaning of absolute value of the score, but we
> always reply that it's just relative value. Maximum score of matched docs
> is not an answer.
> Ultimately we need to measure how much sense a query has in the index.
> e.g. [jet OR propulsion OR spider] query should be measured like
> nonsense, because the best matching docs have much lower scores than
> hypothetical (and assuming absent) doc matching [jet AND propulsion AND
> spider].
> Could it be a method that returns the maximum possible score if all query
> terms would match. Something like stubbing postings on virtual all_matching
> doc with average stats like tf and field length and kicks scorers in? It
> reminds me something about probabilistic retrieval, but not much. Is there
> anything like this already?
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Welcome Ben Trent as Lucene committer

2023-01-27 Thread Mikhail Khludnev
Congratulations, Ben.

On Fri, Jan 27, 2023 at 8:40 PM Benjamin Trent 
wrote:

> Hey y'all!
>
> This is truly an honor!
>
> Well, I am Ben Trent and have been writing code for over a decade now.
> Which I know is not a very long time compared to most folks. I originally
> wanted to do research and work in pure mathematics (my baccalaureate), but
> quickly realized I am nowhere near smart enough to make money at that. So,
> like many folks, I switched to computing and haven't looked back.
>
> In my spare time (when not wrangling one of my children), I enjoy movies
> (especially old kung fu, anything Golden Harvest or Shaw Brothers), good
> beer, reading, playing guitar, and hiking.
>
> Thank you all for the warm welcome! See you online!
>
> Ben
>
> On Fri, Jan 27, 2023 at 10:26 AM Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com> wrote:
>
>> Welcome and congratulations, Ben!
>>
>> On Fri, Jan 27, 2023 at 8:48 PM Adrien Grand  wrote:
>> >
>> > I'm pleased to announce that Ben Trent has accepted the PMC's
>> > invitation to become a committer.
>> >
>> > Ben, the tradition is that new committers introduce themselves with a
>> > brief bio.
>> >
>> > Congratulations and welcome!
>> >
>> > --
>> > Adrien
>> >
>> > -----
>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> > For additional commands, e-mail: dev-h...@lucene.apache.org
>> >
>>
>

-- 
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!


Re: Maximum score estimation

2022-12-18 Thread Mikhail Khludnev
Thanks for replym Walter.
Recently Robert commented on PR with the link
https://cwiki.apache.org/confluence/display/LUCENE/ScoresAsPercentages it
gives arguments against my proposal. Honestly, I'm still in doubt.

On Tue, Dec 6, 2022 at 8:15 PM Walter Underwood 
wrote:

> As you point out, this is a probabilistic relevance model. Lucene uses a
> vector space model.
>
> A probabilistic model gives an estimate of how relevant each document is
> to the query. Unfortunately, their overall relevance isn’t as good as a
> vector space model.
>
> You could calculate an ideal score, but that can change every time a
> document is added to or deleted from the index, because of idf. So the
> ideal score isn’t a useful mental model.
>
> Essentially, you need to tell your users to worry about something that
> matters. The absolute value of the score does not matter.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> On Dec 5, 2022, at 11:02 PM, Mikhail Khludnev  wrote:
>
> Hello dev!
> Users are interested in the meaning of absolute value of the score, but we
> always reply that it's just relative value. Maximum score of matched docs
> is not an answer.
> Ultimately we need to measure how much sense a query has in the index.
> e.g. [jet OR propulsion OR spider] query should be measured like
> nonsense, because the best matching docs have much lower scores than
> hypothetical (and assuming absent) doc matching [jet AND propulsion AND
> spider].
> Could it be a method that returns the maximum possible score if all query
> terms would match. Something like stubbing postings on virtual all_matching
> doc with average stats like tf and field length and kicks scorers in? It
> reminds me something about probabilistic retrieval, but not much. Is there
> anything like this already?
>
> --
> Sincerely yours
> Mikhail Khludnev
>
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: Heap Size Space and Span Queries

2022-12-15 Thread Mikhail Khludnev
Hi
I scratched a simple qparser plugin to experiment with intervals in Solr.
https://github.com/mkhludnev/solr-flexible-qparser
I pushed the jar under releases, and described how to use it in README.md.
Sjoerd,
if spans really blows all heap, you can give a try with intervals with this
plugin. Notice the minimum Solr version required.

On Wed, Dec 14, 2022 at 9:26 PM Mikhail Khludnev  wrote:

> Developers,
> Is it expected for Spans? Can IntervalsQuery help here?
>
> On Wed, Dec 14, 2022 at 5:41 PM Sjoerd Smeets  wrote:
>
>> Hi,
>>
>> I've implemented a Span Query parser and when running the below query, I'm
>> seeing Heap Size Space messages on certain shards:
>>
>> o.a.s.s.HttpSolrCall null:java.lang.RuntimeException:
>> java.lang.OutOfMemoryError: Java heap space
>>
>> The span query that I'm running is the following:
>>
>> ((spanNear([unstemmed_text:charge, unstemmed_text:account], 4, false)
>> spanNear([unstemmed_text:pledge, unstemmed_text:account], 4, false))
>> spanNear([unstemmed_text:pledge, unstemmed_text:deposit], 4, false))
>> spanNear([unstemmed_text:charge, unstemmed_text:deposit], 4, false)
>>
>> The heap size at the moment is set to 48Gb. We are running 4 shards in 1
>> JVM and the 4 shards combined have 24M docs evenly distributed across the
>> shards. We do use the collapse feature as well.
>>
>> This is on Solr 8.6.0
>>
>> What are the considerations for running Span Queries and heap sizes?
>>
>> Any suggestions are welcome
>>
>> Sjoerd
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Heap Size Space and Span Queries

2022-12-15 Thread Mikhail Khludnev
Michael, thanks for stepping in!

>   it seems that simple phrase
queries would suffice here in place of spanNear?

I think it wouldn't. It seems to me 4 is slop, and false is inOrder.
Sjoerd, can you comment about particualt span queries you uses?
Also, do you have any heap dump summary to confirm high memory consumption
by spans?

On Thu, Dec 15, 2022 at 5:33 PM Michael Gibney 
wrote:

> I don't think that nested boolean disjunctions consisting of isolated
> spanNear queries at the leaves should have memory issues (as opposed
> to nested spanNear queries around disjunctions, which might well do).
> Am I misreading the string representation of that query? A little bit
> more explicit information about how the query is built, so that we can
> be certain of what we're dealing with, would be helpful.
>
> It'd certainly be worth trying IntervalsQuery -- but part of what
> makes me think I must be missing something in interpreting the string
> representation of the query provided: it seems that simple phrase
> queries would suffice here in place of spanNear?
>
> Regarding SpanQuery vs. IntervalsQuery performance and
> characteristics, there's some possibly-relevant discussion on
> LUCENE-9204:
>
>
> https://issues.apache.org/jira/browse/LUCENE-9204?focusedCommentId=17352589#comment-17352589
>
> Michael
>
>
> On Wed, Dec 14, 2022 at 1:27 PM Mikhail Khludnev  wrote:
> >
> > Developers,
> > Is it expected for Spans? Can IntervalsQuery help here?
> >
> > On Wed, Dec 14, 2022 at 5:41 PM Sjoerd Smeets  wrote:
> >>
> >> Hi,
> >>
> >> I've implemented a Span Query parser and when running the below query,
> I'm
> >> seeing Heap Size Space messages on certain shards:
> >>
> >> o.a.s.s.HttpSolrCall null:java.lang.RuntimeException:
> >> java.lang.OutOfMemoryError: Java heap space
> >>
> >> The span query that I'm running is the following:
> >>
> >> ((spanNear([unstemmed_text:charge, unstemmed_text:account], 4, false)
> >> spanNear([unstemmed_text:pledge, unstemmed_text:account], 4, false))
> >> spanNear([unstemmed_text:pledge, unstemmed_text:deposit], 4, false))
> >> spanNear([unstemmed_text:charge, unstemmed_text:deposit], 4, false)
> >>
> >> The heap size at the moment is set to 48Gb. We are running 4 shards in 1
> >> JVM and the 4 shards combined have 24M docs evenly distributed across
> the
> >> shards. We do use the collapse feature as well.
> >>
> >> This is on Solr 8.6.0
> >>
> >> What are the considerations for running Span Queries and heap sizes?
> >>
> >> Any suggestions are welcome
> >>
> >> Sjoerd
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: Heap Size Space and Span Queries

2022-12-14 Thread Mikhail Khludnev
Developers,
Is it expected for Spans? Can IntervalsQuery help here?

On Wed, Dec 14, 2022 at 5:41 PM Sjoerd Smeets  wrote:

> Hi,
>
> I've implemented a Span Query parser and when running the below query, I'm
> seeing Heap Size Space messages on certain shards:
>
> o.a.s.s.HttpSolrCall null:java.lang.RuntimeException:
> java.lang.OutOfMemoryError: Java heap space
>
> The span query that I'm running is the following:
>
> ((spanNear([unstemmed_text:charge, unstemmed_text:account], 4, false)
> spanNear([unstemmed_text:pledge, unstemmed_text:account], 4, false))
> spanNear([unstemmed_text:pledge, unstemmed_text:deposit], 4, false))
> spanNear([unstemmed_text:charge, unstemmed_text:deposit], 4, false)
>
> The heap size at the moment is set to 48Gb. We are running 4 shards in 1
> JVM and the 4 shards combined have 24M docs evenly distributed across the
> shards. We do use the collapse feature as well.
>
> This is on Solr 8.6.0
>
> What are the considerations for running Span Queries and heap sizes?
>
> Any suggestions are welcome
>
> Sjoerd
>


-- 
Sincerely yours
Mikhail Khludnev


Maximum score estimation

2022-12-05 Thread Mikhail Khludnev
Hello dev!
Users are interested in the meaning of absolute value of the score, but we
always reply that it's just relative value. Maximum score of matched docs
is not an answer.
Ultimately we need to measure how much sense a query has in the index. e.g.
[jet OR propulsion OR spider] query should be measured like
nonsense, because the best matching docs have much lower scores than
hypothetical (and assuming absent) doc matching [jet AND propulsion AND
spider].
Could it be a method that returns the maximum possible score if all query
terms would match. Something like stubbing postings on virtual all_matching
doc with average stats like tf and field length and kicks scorers in? It
reminds me something about probabilistic retrieval, but not much. Is there
anything like this already?

-- 
Sincerely yours
Mikhail Khludnev


Re: IntelliJ Project Generation?

2022-11-21 Thread Mikhail Khludnev
Hello, Greg.

https://github.com/apache/lucene/blob/main/CONTRIBUTING.md#ide-support


On Mon, Nov 21, 2022 at 9:15 PM Greg Miller  wrote:

> Hi folks-
>
> Apologies if I missed a discussion somewhere (I tried searching the list
> and issues, but came up short). Was support for generating IntelliJ project
> files removed as a gradle task at some point? We used to support generation
> of both Eclipse and IntelliJ project files, but I only see Eclipse support
> under `./gradlew tasks` now. I need to re-setup my IntelliJ project and
> just noticed this convenient functionality missing.
>
> Again, apologies in advance if I'm overlooking something obvious here.
>
> Cheers,
> -Greg
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Multi-segments and HNSW

2022-11-02 Thread Mikhail Khludnev
Hi, MyCoy.
I suppose these questions should go into dev@ list. Please join.

On Wed, Nov 2, 2022 at 12:57 AM MyCoy Z  wrote:

> Hi:
>
> I'm studying the HNSW source code and have some questions regarding
> Lucene's multi-segments and HNSW.
>
> First, some of my understanding:
> 1. While creating the index, when two segments are being merged, it could
> rebuild the HNSW graph based on the docs and vectors in the two segments.
> 2. But while reading the index, each segment's graph is loaded separately.
> There is no way to merge multiple-graphs.
> The search will iterate each segment separately.
> Please let me know if there is any misunderstanding.
>
>
> Since HNSW is a graph, the connections between the nodes could matter a
> lot.
> I can imagine some pros and cons here.
> 1. By splitting the docs into multiple separate graphs, it could help the
> diversity by retrieving more docs.
> For example, if just a single graph, some docs could be too far in the
> Neighbor list to be retrieved. And one way to mitigate this is, dividing
> the docs into multiple graphs.
> It could also help to boost the performance.
>
> 2. However, too many segments could cause other issues.
> For example, retrieving too many irrelevant docs, especially if there
> are not so many docs in a segment.
>
>
> So, I think the number of segments and the size of the graphs could have a
> real impact on the retrieving quality and performance.
>
> I'm wondering if there is any best practice, e.g. how many docs should be
> in a single graph?
> Or does anyone have some production experience to share?
>
> Thanks & Regards
> MyCoy
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Expressions greedy advanceExact implementation

2022-10-26 Thread Mikhail Khludnev
Hello, Michael.
I suppose you can bind f2 to custom lazy implementation of DoubleValuesSource,
which defer advanceExact() by storing doc num and returning true always,
and actually advancing on doubleValue() only.

On Tue, Oct 25, 2022 at 8:13 PM Michael Sokolov  wrote:

> ExpressionFunctionValueSource lazily evaluates in doubleValues: an
> expression like
>
>condition ? f1 : f2
>
> will only evaluate one of f1 or f2.
>
> At the same time, the advanceExact() call is greedy -- when you
> advance that expression it will also advance both f1 and f2. But
> here's the thing: it always returns true, regardless of whether f1 and
> f2 advance. Which makes sense from the point of view of the lazy
> evaluation -- if condition is true we don't care whether f2 advances
> or not.
>
> My question is whether we could defer these child advanceExact calls
> until ExpressionFunctionValues.doubleValue()?
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: Welcome Luca Cavanna as Lucene committer

2022-10-05 Thread Mikhail Khludnev
Welcome, Luca.

On Wed, Oct 5, 2022 at 8:04 PM Adrien Grand  wrote:

> I'm pleased to announce that Luca Cavanna has accepted the PMC's
> invitation to become a committer.
>
> Luca, the tradition is that new committers introduce themselves with a
> brief bio.
>
> Congratulations and welcome!
>
> --
> Adrien
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Welcome Vigya Sharma as Lucene committer

2022-07-29 Thread Mikhail Khludnev
Welcome, Vigya!

On Fri, Jul 29, 2022 at 5:45 AM Vigya Sharma  wrote:

> Thanks everyone for the warm welcome. It is an honor to be invited as a
> Lucene committer, and I look forward to contributing more to the community.
>
> A little bit about me - I currently work for the Product Search team at
> Amazon, and am based out of the San Francisco Bay Area in California, US.
> I am interested in a wide variety of computer science areas, and, in the
> last few years, have focused more on distributed systems, concurrency,
> system software and performance. Outside of tech., I like spending my time
> outdoors - running, skiing, and long road trips. I completed my first
> marathon (the SFMarathon) last week, and now, getting this invitation has
> made this month a highlight of the year.
>
> I had known that Lucene powers some of the most popular search and
> analytics use cases across the globe, but as I've gotten more involved, the
> depth and breadth of this software has blown my mind. I am deeply impressed
> by what this community has built, and how it continues to work together and
> grow. It is a great honor to be trusted with committer privileges, and I
> look forward to learning and contributing to multiple different parts of
> the library.
>
> Thank you,
> Vigya
>
>
> On Thu, Jul 28, 2022 at 12:20 PM Anshum Gupta 
> wrote:
>
>> Congratulations and welcome, Vigya!
>>
>> On Thu, Jul 28, 2022 at 12:34 AM Adrien Grand  wrote:
>>
>>> I'm pleased to announce that Vigya Sharma has accepted the PMC's
>>> invitation to become a committer.
>>>
>>> Vigya, the tradition is that new committers introduce themselves with a
>>> brief bio.
>>>
>>> Congratulations and welcome!
>>>
>>> --
>>> Adrien
>>>
>>
>>
>> --
>> Anshum Gupta
>>
>
>
> --
> - Vigya
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Welcome Lu Xugang as Lucene committer

2022-06-02 Thread Mikhail Khludnev
Welcome, Lu.

On Wed, Jun 1, 2022 at 12:59 PM 陆徐刚  wrote:

> Thanks Adrien for the announcement and all for the welcome! It’s a great
> honor for me be a Lucene committer.
>
> I live in ShangHai, China and work at EOI company which focus on AIOps.
> Thanks to Lucene such a great project which bring me to the Java world
> since 2017.
>
> Since Lucene 7.5.0, I start to read source and write blogs on my personal
> page (http://www.amazingkoala.com.cn) to help other lucene enthusiast
> understand Lucene internal.
>
> some of my favorite things:
>   - video game: World of Warcraft
>   - Java keyword: finally
>
> Lu Xugang
>
> On Jun 1, 2022, at 16:34, Anshum Gupta  wrote:
>
> 
> Congratulations and welcome, Xugang!
>
> On Wed, Jun 1, 2022 at 12:07 AM Adrien Grand  wrote:
>
>> I'm pleased to announce that Lu Xugang has accepted the PMC's
>> invitation to become a committer.
>>
>> Xugang, the tradition is that new committers introduce themselves with a
>> brief bio.
>>
>> Congratulations and welcome!
>>
>> --
>> Adrien
>>
>
>
> --
> Anshum Gupta
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: Welcome Chris Hegarty as Lucene committer

2022-06-02 Thread Mikhail Khludnev
Welcome, Chirs!

On Wed, Jun 1, 2022 at 10:29 AM Chris Hegarty
 wrote:

> Hi,
>
> I am both honoured and humbled to have been invited to become a committer.
> Thank you.
>
> I've been working on the development of the Java Platform and the JDK for
> a little more than 20 years. First in the Javasoft group at Sun
> Microsystems, and later in the Java Platform Group at Oracle. After
> spending much of my working life as a "producer of Java", I'm now with
> Elastic and looking forward to seeing what it is like as a "user of Java”.
> There is so much exciting and interesting work happening in this space, I
> hope to be able to make some positive contributions, even in a small way.
>
> -Chris.
>
> > On 1 Jun 2022, at 08:04, Adrien Grand  wrote:
> >
> > I'm pleased to announce that Chris Hegarty has accepted the PMC's
> > invitation to become a committer.
> >
> > Chris, the tradition is that new committers introduce themselves with a
> > brief bio.
> >
> > Congratulations and welcome!
> >
> > --
> > Adrien
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: [VOTE] Migration to GitHub issue from Jira (LUCENE-10557)

2022-05-30 Thread Mikhail Khludnev
Hello, Tomoko.

+0

Thanks for moving it toward.

On Mon, May 30, 2022 at 6:40 PM Tomoko Uchida 
wrote:

> Hi everyone!
>
> As we had previous discussion thread [1], I propose migration to GitHub
> issue from Jira.
> It'd be technically possible (see [2] for details) and I think it'd be
> good for the project - not only for welcoming new developers who are not
> familiar with Jira, but also for improving the experiences of long-term
> committers/contributors by consolidating the conversation platform.
>
> You can see a short summary of the discussion, some stats on current Jira
> issues, and a draft migration plan in [2].
> Please review [2] if you haven't seen it and vote for this proposal.
>
> The vote will be open until 2022-06-06 16:00 UTC.
>
> [ ] +1  approve
> [ ] +0  no opinion
> [ ] -1  disapprove (and reason why)
>
> Here is my +1
>
> *IMPORTANT NOTE*
> I set a local protocol for this vote.
> There are 95 committers on this project [3] - the vote will be effective
> if it successfully gains more than 15% of voters (>= 15) from committers
> (including PMC members). This means, that although only PMC member votes
> are counted for the final result, the votes from all committers are
> important to make the vote result effective.
>
> If there are less than 15 votes at 2022-06-06 16:00 UTC, I will expand the
> term to 2022-06-13 16:00 UTC. If this fails to get sufficient voters after
> the expanded time limit, I'll cancel this vote regardless of the result.
> But why do I set such an extra bar? My fear is that if such things are
> decided by the opinions of a few members, the result shouldn't yield a good
> outcome for the future. It isn't my goal to just pass the vote [4].
>
> [1] https://lists.apache.org/thread/78wj0vll73sct065m5jjm4z8gqb5yffk
> [2] https://issues.apache.org/jira/browse/LUCENE-10557
> [3] https://projects.apache.org/committee.html?lucene
> [4] I'm sorry for being overly cautious, but I have never met in person or
> virtually any of the committers (with a very few exceptions), therefore
> cannot assess if the vote result is reliable or not unless there is certain
> explicit feedback.
>
> Tomoko
>


-- 
Sincerely yours
Mikhail Khludnev


Re: How to see test case logs in Intellij

2022-05-18 Thread Mikhail Khludnev
Hi, Shah.
I guess I see something like this recently. When I run Lucene test in
intellyJ via gradle, debug view has two tabs: TestFooBar, :test.
Stack and variables are shown in latter one but console output is shown in
the former one.

On Wed, May 18, 2022 at 9:27 PM Rushabh Shah
 wrote:

> Hi Lucene devs,
> I am pretty new to the Lucene project and to the Gradle build tool also.
> When I run any test case via Intellij, I am not able to see any logs
> related to that test case. Do I need to set some special property in some
> config file to view the logs ? Please help. Thank you.
>
>
> Rushabh Shah
>


-- 
Sincerely yours
Mikhail Khludnev


XML retrieval with Intervals

2022-05-06 Thread Mikhail Khludnev
Hi Devs!

I found intervals quite nice and natural for retrieving scoped data
(thanks, Alan!):
foo stuff bar
I.containing(I.ordered(I.term(""), I.term("")),
   I.unordered(I.term("bar"), I.term("foo")));
It works like a charm until it encounter ill nested tags:
foo bug bar
Due to intrinsic minimalizations it picks the internal tag. I feel like
plain intervals backed on positions lack tag scoping information.
Do you know any approaches for retrieving XML in Lucene?

-- 
Sincerely yours
Mikhail Khludnev


Re: FST codec for *infix* queries. No luck so far.

2022-04-26 Thread Mikhail Khludnev
Hi, Michael.

On Tue, Apr 26, 2022 at 10:45 PM Michael Sokolov  wrote:

> I'm not sure under which scenario ngrams (edgengrams) would not be an
> option?

Edgengrams bumps index size a few times, since postings are repeated per
every derived term. Some systems can't afford such a big footprint.


> Another to try maybe would be something like BPE (byte pair
> encoding). In this encoding, you train a set of tokens from a
> vocabulary based on frequency of occurrence, and agglomerate them
> iteratively until you have the vocabulary at a size you like. You tend
> to end up with commonly-ocurring subwords (morphemes) that can
> possibly be good indexing choices for this sort of thing?
>
It's a productive idea, but there will be some queries which yield
no results due to this pruning.


>
> On Tue, Apr 26, 2022 at 9:07 AM Michael McCandless
>  wrote:
> >
> > One small datapoint: Amazon's customer facing product search now
> includes some infix suggestions (using Lucene's AnalyzingInfixSuggester),
> but only in fallback cases when the prefix suggesters didn't find
> compelling options.
> >
> > And I think Netflix's suggester used to be primarily infix, but now when
> I tested it, I get no suggestions at all, only live search results, which I
> like less :)
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
> >
> > On Tue, Apr 26, 2022 at 8:13 AM Dawid Weiss 
> wrote:
> >>
> >> Hi Mikhail,
> >>
> >> I don't have any spectacular suggestions but something stemming from
> experience.
> >>
> >> 1) While the problem is intellectually interesting, I rarely found
> >> anybody who'd be comfortable with using infix suggestions - people are
> >> very used to "completions" happening on a prefix of one or multiple
> >> words (see my note below, though).
> >>
> >> 2) Wouldn't it be better/ more efficient to maintain an fst/ index of
> >> word suffix(es) -> complete word instead of offsets within the block?
> >> This can be combined with term frequency to limit the number of
> >> suggested words to just certain categories (or most frequent terms)
> >> which would make the fst smaller still.
> >>
> >> 3) I'd never try to store infixes shorter than 2, 3 characters (you
> >> said you did it - "I even limited suffixes length to reduce their
> >> number"). This requires folks to type in longer input but prevents fst
> >> bloat and in general leads to higher-quality suggestions (since
> >> there'll be so many of them).
> >>
> >> > Otherwise, with many smaller segments fully scanning term
> dictionaries is comparable to seeking suffixes FST and scanning certain
> blocks.
> >>
> >> Yeah, I'd expect the automaton here to be huge. The complexity of the
> >> vocabulary and number of characters in the language will also play a
> >> key role.
> >>
> >> 4) IntelliJ idea has this kind of "search everywhere" functionality
> >> which greps for infixes (it is really nice). I recall looking at the
> >> (open source engine) to see how it was done and my conclusion from
> >> glancing over the code was that it's a fixed, coarse, n-gram based
> >> index of consecutive letters pointing at potential matches, which are
> >> then revalidated against the query. So you have a super-simple index,
> >> with a very fast lookup and the cost of verifying and finding exact
> >> matches is shifted to once you have a candidate list. While this
> >> doesn't help with Lucene indexes, perhaps it's a sign that for this
> >> particular task a different index/search paradigm is needed?
> >>
> >>
> >> Dawid
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>
>


-- 
Sincerely yours
Mikhail Khludnev


Re: FST codec for *infix* queries. No luck so far.

2022-04-26 Thread Mikhail Khludnev
Hello, David.
Thanks for your answers. Let me comment below.

On Tue, Apr 26, 2022 at 3:13 PM Dawid Weiss  wrote:

> Hi Mikhail,
>
> I don't have any spectacular suggestions but something stemming from
> experience.
>
> 1) While the problem is intellectually interesting, I rarely found
> anybody who'd be comfortable with using infix suggestions - people are
> very used to "completions" happening on a prefix of one or multiple
> words (see my note below, though).
>
It's interesting that I asked about generic search for *foo* queries, but
you read it as a question about infix suggestions.
It's a little bit odd but I meet customers who ask about generic search for
*infix* often - find me everything including these letters 'foo'.
I usually try to convince them that they are focusing on positive results,
but such high recall search is prone for false positives, and this makes it
quite useless.


>
> 2) Wouldn't it be better/ more efficient to maintain an fst/ index of
> word suffix(es) -> complete word instead of offsets within the block?
> This can be combined with term frequency to limit the number of
> suggested words to just certain categories (or most frequent terms)
> which would make the fst smaller still.
>
Well, I did a prototype which uses infix suggester for query expansion. It
looks quite good. But it a small lucene index, not FST with terms outputs.
Also, for such odd requirements pruning is undesirable - find me
everything, you know.


>
> 3) I'd never try to store infixes shorter than 2, 3 characters (you
> said you did it - "I even limited suffixes length to reduce their
> number"). This requires folks to type in longer input but prevents fst
> bloat and in general leads to higher-quality suggestions (since
> there'll be so many of them).
>
Good spot. Short infixes are out of use.


>
> > Otherwise, with many smaller segments fully scanning term dictionaries
> is comparable to seeking suffixes FST and scanning certain blocks.
>
> Yeah, I'd expect the automaton here to be huge. The complexity of the
> vocabulary and number of characters in the language will also play a
> key role.
>
> 4) IntelliJ idea has this kind of "search everywhere" functionality
> which greps for infixes (it is really nice). I recall looking at the
> (open source engine) to see how it was done and my conclusion from
> glancing over the code was that it's a fixed, coarse, n-gram based
> index of consecutive letters pointing at potential matches, which are
> then revalidated against the query. So you have a super-simple index,
> with a very fast lookup and the cost of verifying and finding exact
> matches is shifted to once you have a candidate list. While this
> doesn't help with Lucene indexes, perhaps it's a sign that for this
> particular task a different index/search paradigm is needed?
>
>
> Dawid
>
> -----
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-- 
Sincerely yours
Mikhail Khludnev


FST codec for *infix* queries. No luck so far.

2022-04-22 Thread Mikhail Khludnev
Hello, Devs!
I tried to introduce a custom index to speedup *infix* queries. Note: I'm
interested in cases where EdgeNGram is not an option.  For example, if the
term 'foobar' is stored in a block at position 200, and 'bar' at 100. I try
to put the following suffixes in FST:
foobar->[200]
oobar->[200]
obar->[200]
bar->[100,200]
ar->[100,200]
r->[100,200]
The idea is to seekCeil(oba) (when querying *oba*), get lists of offsets,
read term blocks, filter terms by *oba*, expand *oba* to foobar term.
Gotcha!
The standard blocktree codec seemed too complex for hacking.
I took blockterms with VariableGapTermsIndexWriter and added terms index
file with FST of all terms suffixes mapped to offsets to blocks in
blockterms' .tib. Also, I have to write term heads in .tib blocks, since
blocks store only term tails.
The first problem is the suffixes FST size: FST in .tiv stores only part of
terms. Since suffix terms don't have their own blocks and just point to
original terms blocks, their FST contains all of them.
And it's a really important property of original terms index: output
offsets are increasing as input terms order. This allows to store only part
of terms, and outputs for suffixes FST are unordered, and it requires to
store all suffixes.
For 5mln enwiki docs index tiv size is 1.9M and full suffixes FST takes
3.3G. I even limited suffixes length to reduce their number.
Benchmark shows only 10% gain. So, it's a failure. I only can say that it
might show better improvement with fewer huge segments. Otherwise, with
many smaller segments fully scanning term dictionaries is comparable to
seeking suffixes FST and scanning certain blocks.
WDYT? Do you have a better idea or point to some whitepaper?
Are there any codec examples of writing top-level data structures like
GlobalOrds index or livedocs bitmask?
--
Sincerely yours
Mikhail Khludnev


Re: Welcome Haoyu (Patrick) Zhai as Lucene Committer

2021-12-19 Thread Mikhail Khludnev
Welcome, Haoyu!

On Sun, Dec 19, 2021 at 8:14 PM David Smiley  wrote:

> Congratulations Haoyu!
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Sun, Dec 19, 2021 at 4:12 AM Dawid Weiss  wrote:
>
>> Hello everyone!
>>
>> Please welcome Haoyu Zhai as the latest Lucene committer. You may also
>> know Haoyu as Patrick - this is perhaps his kind gesture to those of
>> us whose tongues are less flexible in pronouncing difficult first
>> names. :)
>>
>> It's a tradition to briefly introduce yourself to the group, Patrick.
>> Welcome and thank you!
>>
>> Dawid
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>

-- 
Sincerely yours
Mikhail Khludnev


Re: Welcome Julie Tibshirani to the Lucene PMC

2021-11-30 Thread Mikhail Khludnev
Congratulations, Julie!

On Wed, Dec 1, 2021 at 12:49 AM Adrien Grand  wrote:

> I'm pleased to announce that Julie Tibshirani has accepted an invitation
> to join the Lucene PMC!
>
> Congratulations Julie, and welcome aboard!
>
> --
> Adrien
>


-- 
Sincerely yours
Mikhail Khludnev


Re: jira patch "precommit" jenkins jobs? (don't seem to be running lately)

2020-12-12 Thread Mikhail Khludnev
Hello, Christine.
I don't think it runs tests at all, I suppose it just does
https://github.com/apache/lucene-solr/blob/ccf3e604537e884e25d33dc9d921cc5e5e1fa284/.github/workflows/gradle-precommit.yml#L36

On Wed, Dec 9, 2020 at 5:15 AM Christine Poerschke (BLOOMBERG/ LONDON) <
cpoersc...@bloomberg.net> wrote:

> Am not familiar with the yetus setup but from the
> https://github.com/apache/lucene-solr/tree/master/.github/workflows
> content and observation of github checks on pull requests my understanding
> is that github runs some tests but not all tests and what gets run depends
> on the content of the pull request.
>
> Hope that helps.
>
> Christine
>
> From: dev@lucene.apache.org At: 12/05/20 18:44:40
> To: dev@lucene.apache.org
> Cc: hossman_luc...@fucit.org
> Subject: Re: jira patch "precommit" jenkins jobs? (don't seem to be
> running lately)
>
> Uwe, thank you for your response. I remember that yetus run tests via
> JIRA's precommit before, but github checks doesn't run tests. Is it
> correct?
>
> --
> Mikhail
>
> On Mon, Oct 12, 2020 at 9:18 AM Uwe Schindler  wrote:
>
>> It no longer works since jenkins was moved to new hardware.
>>
>> IMHO, you should use pull requests in GitHub. There we have full support
>> for automatic precommit. We use it every day, much easier than Jira. I'd
>> not spend much time in reactivating it. It's dead.
>>
>> On Jira it's disabled since longer time.
>>
>> Uwe
>>
>> Am October 12, 2020 3:50:26 PM UTC schrieb Chris Hostetter <
>> hossman_luc...@fucit.org>:
>>>
>>>
>>> Does anyone know / un derstand the current status of the "PreCommit"
>>> jenkins jobs that are suppose to run against jira issues in the "Path
>>> Available" status?
>>>
>>> For example: I noticed this AM that even though SOLR-14870 was in the
>>> "Path Available" status all weekend (I didn't want to commit build changes
>>> on a friday afternoon) it never got a comment from the jenkins build bot
>>> regarding the patch -- when I went looking for the "PreCommit" build jobs
>>> in jenkins i found that they are marked "N/A" for last
>>> success/failure/durration -- which i believe means they haven't run at all
>>> since all the jenkins jobs were moved to ci-builds.apache.org?
>>>
>>> https://ci-builds.apache.org/job/Lucene/
>>> https://ci-builds.apache.org/job/Lucene/job/PreCommit-LUCENE-Build/
>>> https://ci-builds.apache.org/job/Lucene/job/PreCommit-SOLR-Build/
>>>
>>> ...the descriptions of those jobs say they are run by the
>>> "PreCommit-Admin" job, but the links is a 404.  searching jenkinds jobs
>>> turns up a few other "PreCommit" jobs in other projects -- most are
>>> disabled, except for this "Atlas" one which has run somewhat recently --
>>> but it looks like people were manually triggering it?
>>>
>>> https://ci-builds.apache.org/job/Atlas/job/PreCommit-ATLAS-Build-Test/
>>>
>>> do any of hte jenkins admins/experts know what's the status of the
>>> infra/jira hooks to get these jobs working again?
>>>
>>>
>>>
>>> -Hoss
>>> http://www.lucidworks.com/
>>>
>>> --
>>>
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>
>> --
>> Uwe Schindler
>> Achterdiek 19, 28357 Bremen
>> https://www.thetaphi.de
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: jira patch "precommit" jenkins jobs? (don't seem to be running lately)

2020-12-05 Thread Mikhail Khludnev
Uwe, thank you for your response. I remember that yetus run tests via
JIRA's precommit before, but github checks doesn't run tests. Is it
correct?

--
Mikhail

On Mon, Oct 12, 2020 at 9:18 AM Uwe Schindler  wrote:

> It no longer works since jenkins was moved to new hardware.
>
> IMHO, you should use pull requests in GitHub. There we have full support
> for automatic precommit. We use it every day, much easier than Jira. I'd
> not spend much time in reactivating it. It's dead.
>
> On Jira it's disabled since longer time.
>
> Uwe
>
> Am October 12, 2020 3:50:26 PM UTC schrieb Chris Hostetter <
> hossman_luc...@fucit.org>:
>>
>>
>> Does anyone know / un derstand the current status of the "PreCommit"
>> jenkins jobs that are suppose to run against jira issues in the "Path
>> Available" status?
>>
>> For example: I noticed this AM that even though SOLR-14870 was in the
>> "Path Available" status all weekend (I didn't want to commit build changes
>> on a friday afternoon) it never got a comment from the jenkins build bot
>> regarding the patch -- when I went looking for the "PreCommit" build jobs
>> in jenkins i found that they are marked "N/A" for last
>> success/failure/durration -- which i believe means they haven't run at all
>> since all the jenkins jobs were moved to ci-builds.apache.org?
>>
>> https://ci-builds.apache.org/job/Lucene/
>> https://ci-builds.apache.org/job/Lucene/job/PreCommit-LUCENE-Build/
>> https://ci-builds.apache.org/job/Lucene/job/PreCommit-SOLR-Build/
>>
>> ...the descriptions of those jobs say they are run by the
>> "PreCommit-Admin" job, but the links is a 404.  searching jenkinds jobs
>> turns up a few other "PreCommit" jobs in other projects -- most are
>> disabled, except for this "Atlas" one which has run somewhat recently --
>> but it looks like people were manually triggering it?
>>
>> https://ci-builds.apache.org/job/Atlas/job/PreCommit-ATLAS-Build-Test/
>>
>> do any of hte jenkins admins/experts know what's the status of the
>> infra/jira hooks to get these jobs working again?
>>
>>
>>
>> -Hoss
>> http://www.lucidworks.com/
>> --
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
> --
> Uwe Schindler
> Achterdiek 19, 28357 Bremen
> https://www.thetaphi.de
>


-- 
Sincerely yours
Mikhail Khludnev


Re: 8.6 release

2020-07-02 Thread Mikhail Khludnev
o the plan is to cut the release branch on next Tuesday June 30th. If
>>> you anticipate a problem with the date, please reply.
>>> >
>>> >
>>> >
>>> > Is there any JIRA issue that must be committed before the release is
>>> made and that has not already the appropriate "Fix Version"?
>>> >
>>> >
>>> >
>>> > Currently there 3 unresolved issues flagged as Fix Version = 8.6:
>>> >
>>> > Add tests for corruptions caused by byte flips LUCENE-9356
>>> >
>>> > Fix linefiledocs compression or replace in tests LUCENE-9191
>>> >
>>> > Can we merge small segments during refresh, for faster searching?
>>> LUCENE-8962
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > Le mer. 24 juin 2020 à 21:05, David Smiley 
>>> a écrit :
>>> >
>>> > Thanks starting this discussion, Cassandra.
>>> >
>>> >
>>> >
>>> > I reviewed the issues I was involved with and I don't quite see
>>> something worth noting.
>>> >
>>> >
>>> >
>>> > I plan to add a note about a change in defaults within
>>> UnifiedHighlighter that could be a significant perf regression.  This
>>> wasn't introduced in 8.6 but introduced in 8.5 and it's significant enough
>>> to bring attention to.  I could add it in 8.5's section but then add a
>>> short pointer to it in 8.6.
>>> >
>>> >
>>> >
>>> > ~ David
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > On Wed, Jun 24, 2020 at 2:52 PM Cassandra Targett <
>>> casstarg...@gmail.com> wrote:
>>> >
>>> > I started looking at the Ref Guide for 8.6 to get it ready, and notice
>>> there are no Upgrade Notes in `solr-upgrade-notes.adoc` for 8.6. Is it
>>> really true that none are needed at all?
>>> >
>>> > I’ll add what I usually do about new features/changes that maybe
>>> wouldn’t normally make the old Upgrade Notes section, I just find it
>>> surprising that there weren’t any devs who thought any of the 100 or so
>>> Solr changes warrant any user caveats.
>>> >
>>> > On Jun 17, 2020, 12:27 PM -0500, Tomás Fernández Löbbe <
>>> tomasflo...@gmail.com>, wrote:
>>> >
>>> >
>>> > +1. Thanks Bruno
>>> >
>>> >
>>> >
>>> > On Wed, Jun 17, 2020 at 6:22 AM Mike Drob  wrote:
>>> >
>>> > +1
>>> >
>>> >
>>> >
>>> > The release wizard python script should be sufficient for everything.
>>> If you run into any issues with it, let me know, I used it for 8.5.2 and
>>> think I understand it pretty well.
>>> >
>>> >
>>> >
>>> > On Tue, Jun 16, 2020 at 8:31 AM Bruno Roustant <
>>> bruno.roust...@gmail.com> wrote:
>>> >
>>> > Hi all,
>>> >
>>> > It’s been a while since we released Lucene/Solr 8.5.
>>> >
>>> > I’d like to volunteer to be a release manager for an 8.6 release. If
>>> there's agreement, then I plan to cut the release branch two weeks today,
>>> on June 30th, and then to build the first RC two days later.
>>> >
>>> >
>>> >
>>> > This will be my first time as release manager so I'll probably need
>>> some guidance. Currently I have two resource links on this subject:
>>> >
>>> > https://cwiki.apache.org/confluence/display/LUCENE/ReleaseTodo
>>> >
>>> >
>>> https://github.com/apache/lucene-solr/tree/master/dev-tools/scripts#releasewizardpy
>>> >
>>> > If you have more, please share with me.
>>> >
>>> >
>>> >
>>> > Bruno
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>>
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>

-- 
Sincerely yours
Mikhail Khludnev


Re: FunctionScoreQuery how to use it

2020-07-01 Thread Mikhail Khludnev
Hi, Vincenzo.

Discussed earlier
https://www.mail-archive.com/java-user@lucene.apache.org/msg50255.html

On Wed, Jul 1, 2020 at 8:36 PM Vincenzo D'Amore  wrote:

> Hi all,
>
> I'm struggling with an old class that extends CustomScoreQuery.
> I was trying to port to solr 8.5.2 and I'm looking for an example on how to
> implement it using FunctionScoreQuery.
>
> Do you know if there are examples that explain how to port the code to the
> new implementation?
>
> --
> Vincenzo D'Amore
>


-- 
Sincerely yours
Mikhail Khludnev


Re: [JENKINS] Lucene-Solr-8.x-Windows (64bit/jdk-13.0.2) - Build # 1341 - Failure!

2020-07-01 Thread Mikhail Khludnev
> (at line 30)
>  [ecj-lint] import
> org.apache.solr.client.solrj.response.SolrResponseBase;
>  [ecj-lint]
> ^^
>  [ecj-lint] The import
> org.apache.solr.client.solrj.response.SolrResponseBase is never used
>  [ecj-lint] --
>  [ecj-lint] 23. ERROR in
> C:\Users\jenkins\workspace\Lucene-Solr-8.x-Windows\solr\core\src\test\org\apache\solr\search\facet\TestJsonFacetsWithNestedObjects.java
> (at line 33)
>  [ecj-lint] import org.apache.solr.common.util.NamedList;
>  [ecj-lint]^
>  [ecj-lint] The import org.apache.solr.common.util.NamedList is never used
>  [ecj-lint] --
>  [ecj-lint] --
>  [ecj-lint] 24. WARNING in
> C:\Users\jenkins\workspace\Lucene-Solr-8.x-Windows\solr\core\src\test\org\apache\solr\update\DirectUpdateHandlerTest.java
> (at line 231)
>  [ecj-lint] DirectUpdateHandler2 duh2 = (DirectUpdateHandler2)updater;
>  [ecj-lint]  
>  [ecj-lint] Resource leak: 'duh2' is never closed
>  [ecj-lint] --
>  [ecj-lint] 25. WARNING in
> C:\Users\jenkins\workspace\Lucene-Solr-8.x-Windows\solr\core\src\test\org\apache\solr\update\DirectUpdateHandlerTest.java
> (at line 291)
>  [ecj-lint] DirectUpdateHandler2 duh2 = (DirectUpdateHandler2)updater;
>  [ecj-lint]  
>  [ecj-lint] Resource leak: 'duh2' is never closed
>  [ecj-lint] --
>  [ecj-lint] --
>  [ecj-lint] 26. WARNING in
> C:\Users\jenkins\workspace\Lucene-Solr-8.x-Windows\solr\core\src\test\org\apache\solr\update\UpdateParamsTest.java
> (at line 45)
>  [ecj-lint] UpdateRequestHandler handler = new UpdateRequestHandler();
>  [ecj-lint]  ^^^
>  [ecj-lint] Resource leak: 'handler' is never closed
>  [ecj-lint] --
>  [ecj-lint] --
>  [ecj-lint] 27. WARNING in
> C:\Users\jenkins\workspace\Lucene-Solr-8.x-Windows\solr\core\src\test\org\apache\solr\update\processor\SignatureUpdateProcessorFactoryTest.java
> (at line 331)
>  [ecj-lint] UpdateRequestHandler h = new UpdateRequestHandler();
>  [ecj-lint]  ^
>  [ecj-lint] Resource leak: 'h' is never closed
>  [ecj-lint] --
>  [ecj-lint] --
>  [ecj-lint] 28. WARNING in
> C:\Users\jenkins\workspace\Lucene-Solr-8.x-Windows\solr\core\src\test\org\apache\solr\update\processor\UniqFieldsUpdateProcessorFactoryTest.java
> (at line 113)
>  [ecj-lint] UpdateRequestHandler handler = new UpdateRequestHandler();
>  [ecj-lint]  ^^^
>  [ecj-lint] Resource leak: 'handler' is never closed
>  [ecj-lint] --
>  [ecj-lint] 28 problems (9 errors, 19 warnings)
>
> BUILD FAILED
> C:\Users\jenkins\workspace\Lucene-Solr-8.x-Windows\build.xml:634: The
> following error occurred while executing this line:
> C:\Users\jenkins\workspace\Lucene-Solr-8.x-Windows\build.xml:101: The
> following error occurred while executing this line:
> C:\Users\jenkins\workspace\Lucene-Solr-8.x-Windows\solr\build.xml:644: The
> following error occurred while executing this line:
> C:\Users\jenkins\workspace\Lucene-Solr-8.x-Windows\lucene\common-build.xml:2133:
> The following error occurred while executing this line:
> C:\Users\jenkins\workspace\Lucene-Solr-8.x-Windows\lucene\common-build.xml:2166:
> Compile failed; see the compiler error output for details.
>
> Total time: 70 minutes 56 seconds
> Build step 'Invoke Ant' marked build as failure
> Setting
> ANT_1_8_2_HOME=C:\Users\jenkins\tools\hudson.tasks.Ant_AntInstallation\ANT_1.8.2
> Archiving artifacts
> [Java] Skipping execution of recorder since overall result is 'FAILURE'
> Setting
> ANT_1_8_2_HOME=C:\Users\jenkins\tools\hudson.tasks.Ant_AntInstallation\ANT_1.8.2
> Recording test results
> Setting
> ANT_1_8_2_HOME=C:\Users\jenkins\tools\hudson.tasks.Ant_AntInstallation\ANT_1.8.2
> Email was triggered for: Failure - Any
> Sending email for trigger: Failure - Any
> Setting
> ANT_1_8_2_HOME=C:\Users\jenkins\tools\hudson.tasks.Ant_AntInstallation\ANT_1.8.2
> Setting
> ANT_1_8_2_HOME=C:\Users\jenkins\tools\hudson.tasks.Ant_AntInstallation\ANT_1.8.2
> Setting
> ANT_1_8_2_HOME=C:\Users\jenkins\tools\hudson.tasks.Ant_AntInstallation\ANT_1.8.2
> Setting
> ANT_1_8_2_HOME=C:\Users\jenkins\tools\hudson.tasks.Ant_AntInstallation\ANT_1.8.2
> Setting
> ANT_1_8_2_HOME=C:\Users\jenkins\tools\hudson.tasks.Ant_AntInstallation\ANT_1.8.2
> Setting
> ANT_1_8_2_HOME=C:\Users\jenkins\tools\hudson.tasks.Ant_AntInstallation\ANT_1.8.2
>
> -
> To unsubscribe, e-mail: builds-unsubscr...@lucene.apache.org
> For additional commands, e-mail: builds-h...@lucene.apache.org



-- 
Sincerely yours
Mikhail Khludnev


Re: Welcome Ilan Ginzburg as Lucene/Solr committer

2020-06-21 Thread Mikhail Khludnev
Welcome,  Ilan!

On Sun, Jun 21, 2020 at 2:44 AM Noble Paul  wrote:

> Hi all,
>
> Please join me in welcoming Ilan Ginzburg as the latest Lucene/Solr
> committer.
> Ilan, it's tradition for you to introduce yourself with a brief bio.
>
> Congratulations and Welcome!
> Noble
>


-- 
Sincerely yours
Mikhail Khludnev


Re: How to build SolrCloud collection from instance dirs after ZK is lost?

2020-06-20 Thread Mikhail Khludnev
Hello, Colleagues.
Thank you for sharing your experience and ideas.
I'm aiming for the simplest scenario with default routing (is it
'composite'?) so, I assume that if I set the same number of shards, the
hash ranges will match. I plan to create new collection with
replicationFactor=1 shuffle=false and specify nodeset in the same order as
remaining shard dirs distributed across nodes that let me collocate old and
new cores and just exec $mv. Also, replicas can be placed to the
certain nodes explicitly.

On Sat, Jun 20, 2020 at 1:59 AM Shawn Heisey  wrote:

> On 6/18/2020 1:35 AM, Mikhail Khludnev wrote:
> > I'm challenged with cluster recovery. Think about total failure: ZK
> > state is lost, however instanceDirs survived since they are mounted via
> > EBS. Let's say collection is read/only and/or it doesn't have
> > replicas, just leaders.
> > Is there a way to create a new empty collection and say, hey here's
> > shard1 instance, shard2 instance is there etc?
> >
> > Customer says that the old version of solr does it automatically: when
> > empty zk is connected, collection's shards just appear there. Right now
> > due to https://issues.apache.org/jira/browse/SOLR-12066Cleanup deleted
> > core when node start - if instances with data dirs connect to empty ZK
> > it just wipes dirs away.
>
> I think that SOLR-12066 was a mistake.  See SOLR-13396, which is linked
> to SOLR-12066.  There are some interesting ideas outlined in SOLR-13396.
>
> There is info in the clusterstate that is currently not recorded
> anywhere but zookeeper, making it impossible to fully reconstruct a
> collection from existing cores when ZK data is lost.
>
> A quick look at the cloud example on version 8.5.1 tells me that for
> such reconstruction to be possible, in addition to what it currently
> contains, core.properties would need to record the shard hash range, the
> router, maxShardsPerNode, and autoAddReplicas.  And there may be other
> things related to features that the cloud example does not use.
>
> If both properties and clusterstate in ZK are available, any mismatches
> between the two should generate a WARN log, and ZK info should probably
> be preferred over properties.  A Collections API action should probably
> be created to force mismatches back into agreement.
>
> Alternately, the new info could be recorded in a new file, with
> cloud.properties being one possibility for the filename.  I can think of
> reasons to prefer this approach, but I worry about the stability of
> adding a whole new file to the config mechanisms.
>
> If the capability does not already exist, I think there should be some
> combination of Collections API actions that will allow somebody to
> manually reconstruct the collection clusterstate in ZK.
>
> Side note:  While playing with examples on 8.5.1 so I could be accurate
> on this message, I discovered that the "Files" tab in the admin UI has
> issues, in both cloud and standalone mode.  The following screenshot has
> some red lines added to problems I found.  Subdirectories do not work
> correctly, the column for filenames is not wide enough for the example
> configs, and the filenames do not have mouseover expansion which would
> be an alternate way to deal with really long filenames.
>
>
> https://www.dropbox.com/s/4lm3uad2uv53630/SolrAdminFilesTabProblems.png?dl=0
>
> That's probably worthy of an issue, but I don't want to open one without
> discussion.
>
> Thanks,
> Shawn
>
> -----
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-- 
Sincerely yours
Mikhail Khludnev


How to build SolrCloud collection from instance dirs after ZK is lost?

2020-06-18 Thread Mikhail Khludnev
Hello,
I'm challenged with cluster recovery. Think about total failure: ZK state
is lost, however instanceDirs survived since they are mounted via EBS.
Let's say collection is read/only and/or it doesn't have replicas, just
leaders.
Is there a way to create a new empty collection and say, hey here's shard1
instance, shard2 instance is there etc?

Customer says that the old version of solr does it automatically: when
empty zk is connected, collection's shards just appear there. Right now due
to https://issues.apache.org/jira/browse/SOLR-12066 Cleanup deleted core
when node start - if instances with data dirs connect to empty ZK it just
wipes dirs away.

Thanks
-- 
Sincerely yours
Mikhail Khludnev


Re: [VOTE] Lucene logo contest

2020-06-16 Thread Mikhail Khludnev
A. Submitted by Dustin Haver [2]

On Tue, Jun 16, 2020 at 1:08 AM Ryan Ernst  wrote:

> Dear Lucene and Solr developers!
>
> In February a contest was started to design a new logo for Lucene [1].
> That contest concluded, and I am now (admittedly a little late!) calling a
> vote.
>
> The entries are labeled as follows:
>
> A. Submitted by Dustin Haver [2]
>
> B. Submitted by Stamatis Zampetakis [3] Note that this has several
> variants. Within the linked entry there are 7 patterns and 7 color
> palettes. Any vote for B should contain the pattern number, like B1 or B3.
> If a B variant wins, we will have a followup vote on the color palette.
>
> C. The current Lucene logo [4]
>
> Please vote for one of the three (or nine depending on your perspective!)
> above choices. Note that anyone in the Lucene+Solr community is invited to
> express their opinion, though only Lucene+Solr PMC cast binding votes
> (indicate non-binding votes in your reply, please). This vote will close
> one week from today, Mon, June 22, 2020.
>
> Thanks!
>
> [1] https://issues.apache.org/jira/browse/LUCENE-9221
> [2]
> https://issues.apache.org/jira/secure/attachment/12999548/Screen%20Shot%202020-04-10%20at%208.29.32%20AM.png
> [3]
> https://issues.apache.org/jira/secure/attachment/12997768/zabetak-1-7.pdf
> [4]
> https://lucene.apache.org/theme/images/lucene/lucene_logo_green_300.png
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Welcome Mayya Sharipova as Lucene/Solr committer

2020-06-09 Thread Mikhail Khludnev
Welcome, Mayya!

On Mon, Jun 8, 2020 at 7:58 PM jim ferenczi  wrote:

> Hi all,
>
> Please join me in welcoming Mayya Sharipova as the latest Lucene/Solr
> committer.
> Mayya, it's tradition for you to introduce yourself with a brief bio.
>
> Congratulations and Welcome!
>
> Jim
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Constant Scoring Queries; inconsistent approaches

2020-05-17 Thread Mikhail Khludnev
Hello, David.
Having 1 as a score is a good idea.

On Sun, May 17, 2020 at 7:25 PM David Smiley  wrote:

> I see some inconsistency in what some "constant scoring queries" return
> for a score.  Ultimately, I'm arguing here for consistency in 9.0, and
> perhaps a bit of documentation clarity on such Queries.  It's an edge-case
> because typical constant scoring queries are used
> with org.apache.lucene.search.BooleanClause.Occur#FILTER or similar where
> it simply doesn't matter.  But nonetheless it's possible to combine it with
> other queries in a BooleanQuery and the choice has an impact.
>
> Lucene's ConstantScoreQuery yields 1 but it can be boosted (e.g.
> multiplied by whatever).  Many other constant scoring queries in Lucene do
> likewise by using ConstantScoreWeight and propagating the "boost"
> parameter.  Picking one query at random doing this
> is DocValuesFieldExistsQuery.
>
> However, I found some that choose 0 which isn't boostablle (because zero
> times anything is zero). ToParentBlockJoinQuery (and Child equivalent) use
> 0.  Solr's Filter.java (formerly in Lucene), which I'm slowly removing,
> chooses 0 as well.
>
> Shall we standardize on a score of 1 for Lucene/Solr 9.0?  Or 0?  Or do
> some queries break with the norm for a good reason?
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>


-- 
Sincerely yours
Mikhail Khludnev


Re: DenseNumericDocValues corner case issue

2020-05-06 Thread Mikhail Khludnev
Hello, John.
I tended to say, you are calling API in a little bit specific way
https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/index/DocValuesIterator.java#L29

Thanks.

On Wed, May 6, 2020 at 11:04 PM John Wang  wrote:

> Hi folks:
>
> We ran into a problem with DenseNumericDocValues in Lucene 8.0 codec where
> advanceExact returns true even when advancing to an invalid docid.
>
> Our code looks like:
>
> if (docval.advanceExact(docid)) {
> var myVal = docVal.get(docid);
> }
>
> when docid == DocIdSetIterator.NO_MORE_DOCS, the docVal.get() call barfs.
>
> I am not sure if that is a bug in Lucene or is it the way I am calling the
> API.
>
> Following is the link to the code for context. Any advice is appreciated.
>
>
> https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java#L413
>
> Thanks
>
> -John
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Welcome Eric Pugh as a Lucene/Solr committer

2020-04-07 Thread Mikhail Khludnev
Welcome, Eric.

On Tue, Apr 7, 2020 at 4:57 PM Eric Pugh 
wrote:

> Thank you everyone!  I’ll keep it short, otherwise this will be a very
> long email… ;-).
>
> I was first introduced to Solr and Lucene by Erik Hatcher, and today I
> wonder what my life would be like if he hadn’t taken the time to show me
> some cool code he was working on and explained to me the way to change the
> world was through open source contributions!
>
> I co-founded OpenSource Connections (http://o19s.com) along with Scott
> Stults and Jason Hull in 2005.  We found our niche in Solr consulting after
> I went to the first LuceneRevolution and got inspired (complete with Jerry
> Maguire style manifesto shared with the company). Through consulting, I get
> to help onboard organizations into the Solr community - a thriving, healthy
> ASF is very near & dear to my heart.
>
> I’ve been around this community for a long time, with my first JIRA being
> three digits: SOLR-284.  Today, I’m still contributing to Apache Tika. I’ve
> gotten to meet and spend some significant time with Tim Allison from that
> project and learned a LOT about text!
>
> I was in the right place at the right time and was able to join David
> Smiley as co-author on the first Solr book, we went on and did a total of
> three editions of that book.  Phew!
>
> Once I got to sit on stage as a judge for Stump the Chump, it was Erick,
> Erik, and Eric ;-)
>
> After doing Solr for a good while, I got lucky and met Doug Turnbull on
> the sidewalk one day because he had on a t-shirt that said “My code doesn’t
> have bugs, it has unexpected features”.   Couple of years later he and
> fellow colleague John Berryman published Relevant Search and today I’m
> working in the fascinating intersection of people, Search, and Data Science
> helping build smarter search experiences as a Relevance Strategist. I'm
> excited about bringing relevance use cases 'down to earth'. I also steward
> OSC's contributions to the open source tool Quepid to help fulfill that
> goal.
>
> Oh, and I’ve got a stack of LuceneRevolution and related conference
> t-shirts that my mother turned into a fantastic quilt ;-).
>
> Eric
>
>
>
> On Apr 6, 2020, at 9:39 PM, Shalin Shekhar Mangar 
> wrote:
>
> Congratulations and welcome Eric!
>
> On Mon, Apr 6, 2020 at 5:51 PM Jan Høydahl  wrote:
>
>> Hi all,
>>
>> Please join me in welcoming Eric Pugh as the latest Lucene/Solr committer!
>>
>> Eric has been part of the Solr community for over a decade, as a code
>> contributor, book author, company founder, blogger and mailing list
>> contributor! We look forward to his future contributions!
>>
>> Congratulations and welcome! It is a tradition to introduce yourself with
>> a brief bio, Eric.
>>
>> Jan Høydahl
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>
>
> ___
> *Eric Pugh **| *Founder & CEO | OpenSource Connections, LLC | 434.466.1467
> | http://www.opensourceconnections.com | My Free/Busy
> <http://tinyurl.com/eric-cal>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed
> <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless
> of whether attachments are marked as such.
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: 8.5 release

2020-03-16 Thread Mikhail Khludnev
PM
>>>>>>> *To:* dev@lucene.apache.org
>>>>>>> *Subject:* Re: 8.5 release
>>>>>>>
>>>>>>> I’ve created a branch for the 8.5 release (`branch_8_5`) and pushed
>>>>>>> it to the apache repository.  We’re now at feature freeze, so only bug
>>>>>>> fixes should be pushed to the branch.
>>>>>>>
>>>>>>> I can see from
>>>>>>> https://issues.apache.org/jira/issues/?jql=project%20in%20(SOLR%2C%20LUCENE)%20AND%20status%20in%20(Open%2C%20Reopened)%20AND%20priority%20%3D%20Blocker%20AND%20fixVersion%20%3D%208.5%20ORDER%20BY%20priority%20DESC
>>>>>>> <https://issues.apache.org/jira/issues/?jql=project%20in%20(SOLR,%20LUCENE)%20AND%20status%20in%20(Open,%20Reopened)%20AND%20priority%20=%20Blocker%20AND%20fixVersion%20=%208.5%20ORDER%20BY%20priority%20DESC>
>>>>>>>  that
>>>>>>> we have 4 tickets marked as Blockers for this release.  I plan to build 
>>>>>>> a
>>>>>>> first release candidate next Monday, which gives us a few days to 
>>>>>>> resolve
>>>>>>> these.  If that’s not going to be long enough, please let me know.
>>>>>>>
>>>>>>> Uwe, Steve, can one of you start the Jenkins tasks for the new
>>>>>>> branch?
>>>>>>>
>>>>>>> Thanks, Alan
>>>>>>>
>>>>>>>
>>>>>>> On 3 Mar 2020, at 14:50, Alan Woodward  wrote:
>>>>>>>
>>>>>>> PSA: I’ve had to generate a new GPG key for this release, and it
>>>>>>> takes a while for it to get mirrored to the lucene KEYS file.  I’ll hold
>>>>>>> off cutting the branch until everything is ready, so it will probably 
>>>>>>> now
>>>>>>> be tomorrow UK time before I start the release proper.
>>>>>>>
>>>>>>>
>>>>>>> On 25 Feb 2020, at 07:49, Noble Paul  wrote:
>>>>>>>
>>>>>>> +1
>>>>>>>
>>>>>>> On Wed, Feb 19, 2020 at 9:35 PM Ignacio Vera 
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> +1
>>>>>>>
>>>>>>> On Tue, Feb 18, 2020 at 7:26 PM Jan Høydahl 
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> +1
>>>>>>>
>>>>>>> That should give us time to update release docs for the new website
>>>>>>> too.
>>>>>>>
>>>>>>> Jan Høydahl
>>>>>>>
>>>>>>> 18. feb. 2020 kl. 18:28 skrev Adrien Grand :
>>>>>>>
>>>>>>> 
>>>>>>> +1
>>>>>>>
>>>>>>> On Tue, Feb 18, 2020 at 4:58 PM Alan Woodward 
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> It’s been a while since we released lucene-solr 8.4, and we’ve
>>>>>>> accumulated quite a few nice new features since then.  I’d like to
>>>>>>> volunteer to be a release manager for an 8.5 release.  If there's
>>>>>>> agreement, then I plan to cut the release branch two weeks today, on
>>>>>>> Tuesday 3rd March.
>>>>>>>
>>>>>>> - Alan
>>>>>>> -
>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>>>> 
>>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>>> 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Adrien
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> -
>>>>>>> Noble Paul
>>>>>>>
>>>>>>> -
>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>>>> 
>>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>>> 
>>>>>>>
>>>>>>>
>>>>>>>
>>>> -
>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
>>>> additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>
>
>
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: CHANGES.txt and issue categorization

2020-03-04 Thread Mikhail Khludnev
I'm ok with it. Thank you, David. Will you put it somewhere on wiki?

On Mon, Mar 2, 2020 at 10:07 AM David Smiley 
wrote:

> I'd like us to reflect on how we categorize issues in CHANGES.txt.  We
> have these categories:
> (Lucene) 'API Changes', 'New Features', 'Improvements', 'Optimizations',
> 'Bug Fixes', 'Other'
> (Solr) 'New Features', 'Improvements', 'Optimizations', 'Bug Fixes',
> 'Other Changes'
> (I lifted these from dev-tools/scripts/addVersion.py line 215)
>
> In particular, I'm often surprised at how some of us categorize New
> Features or Improvements that should better be categorized as something
> else.  I think the root cause of these problems may be that we don't have
> JIRA categories that directly align.  Furthermore, our dev practices will
> typically result in a CHANGES.txt being added out of band from the
> code-review process, and thus no peer-review on ideal placement.
> Furthermore the message itself is often not code reviewed but should be.
> Perhaps we can simply get in the habit of adding a JIRA comment (or GH code
> review) what we propose the category & issue summary should be.
>
> Here is my attempt at a definition for _some_ of these categories.  I
> don't pretend to think we all agree 100% but it's up for discussion:
> 
> * New Features:  A user-visible new capability.  Usually opt-in.
>
> * Improvements:  A user-visible improvement to an existing capability that
> somehow expands its ability or that which improves the behavior.  Not a
> refactoring, not an optimization.
>
> * Optimizations: Something is now more efficient.  Usually automatic (not
> opt-in).
>
> * Other:  Anything else: Refactorings, tests, build, docs, etc.  And
> adding log statements.
> 
>
> I recommend the following changes to Lucene 8.5:
>
> These are "Improvements" that I think are better categorized as
> "Optimizations"
> * LUCENE-9211: Add compression for Binary doc value fields. (Mark Harwood)
> * LUCENE-4702: Better compression of terms dictionaries. (Adrien Grand)
> * LUCENE-9228: Sort dvUpdates in the term order before applying if they
> all update a
>   single field to the same value. This optimization can reduce the flush
> time by around
>   20% for the docValues update user cases. (Nhat Nguyen, Adrien Grand,
> Simon Willnauer)
> * LUCENE-9245: Reduce AutomatonTermsEnum memory usage. (Bruno Roustant,
> Robert Muir)
> * LUCENE-9237: Faster UniformSplit intersect TermsEnum. (Bruno Roustant)
>
> These "Improvements" I think are better categorized as "Other":
> * LUCENE-9109: Backport some changes from master (except StackWalker) to
> improve
>   TestSecurityManager (Uwe Schindler)
> * LUCENE-9110: Backport refactored stack analysis in tests to use
> generalized
>   LuceneTestCase methods (Uwe Schindler)
> * LUCENE-9141: Simplify LatLonShapeXQuery API by adding a new abstract
> class called LatLonGeometry. Queries are
>   executed with input objects that extend such interface. (Ignacio Vera)
> * LUCENE-9194: Simplify XYShapeXQuery API by adding a new abstract class
> called XYGeometry. Queries are
>   executed with input objects that extend such interface. (Ignacio Vera)
>
> Maybe this "Other" item should be  "Optimization"? (not sure):
> * LUCENE-9068: FuzzyQuery builds its Automaton up-front (Alan Woodward,
> Mike Drob)
>
> Solr:
>
> "New Features" that maybe should be "Improvements":
>  * SOLR-13892: New "top-level" docValues join implementation (Jason
> Gerlowski, Joel Bernstein)
>  * SOLR-14242: HdfsDirectory now supports indexing geo-points, ranges or
> shapes. (Adrien Grand)
>
> "Improvements" that maybe should be "Optimizations":
> * SOLR-13808: filter in BoolQParser and {"bool":{"filter":..}} in Query
> DSL are cached by default (Mikhail Khludnev)
>
> "Improvements" that maybe should be "Other":
> * SOLR-14114: Add WARN to Solr log that embedded ZK is not supported in
> production (janhoy)
>
> Thoughts?
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>


-- 
Sincerely yours
Mikhail Khludnev


Re: [JENKINS] Lucene-Solr-Tests-master - Build # 4353 - Still Failing

2020-02-27 Thread Mikhail Khludnev
Hey, Whatsup?

On Thu, Feb 27, 2020 at 9:35 PM Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> Build: https://builds.apache.org/job/Lucene-Solr-Tests-master/4353/
>
> All tests passed
>
> Build Log:
> [...truncated 68080 lines...]
> BUILD FAILED
> /home/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-master/build.xml:635:
> The following error occurred while executing this line:
> /home/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-master/build.xml:508:
> The following error occurred while executing this line:
> /home/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-master/build.xml:496:
> Source checkout is modified!!! Offending files:
> * solr/licenses/jaeger-core-1.1.0.jar.sha1
> * solr/licenses/libthrift-0.13.0.jar.sha1
>
> Total time: 72 minutes 3 seconds
> Build step 'Invoke Ant' marked build as failure
> Archiving artifacts
> Recording test results
> Email was triggered for: Failure - Any
> Sending email for trigger: Failure - Any
>
> -
> To unsubscribe, e-mail: builds-unsubscr...@lucene.apache.org
> For additional commands, e-mail: builds-h...@lucene.apache.org



-- 
Sincerely yours
Mikhail Khludnev


Re: [JENKINS] Lucene-Solr-Tests-8.x - Build # 1051 - Still Failing

2020-02-09 Thread Mikhail Khludnev
Pushed 8x fix SOLR-14209. Watching.


On Sun, Feb 9, 2020 at 4:07 PM Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> Build: https://builds.apache.org/job/Lucene-Solr-Tests-8.x/1051/
>
> All tests passed
>
> Build Log:
> [...truncated 12146 lines...]
> [javac] Compiling 1326 source files to
> /home/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-8.x/solr/build/solr-core/classes/java
> [javac]
> /home/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-8.x/solr/core/src/java/org/apache/solr/handler/admin/ZookeeperInfoHandler.java:537:
> error: incompatible types: Charset cannot be converted to String
> [javac]   String href = "admin/zookeeper?detail=true=" +
> URLEncoder.encode(path, StandardCharsets.UTF_8);
> [javac]
>^
> [javac] Note: Some input files use or override a deprecated API.
> [javac] Note: Recompile with -Xlint:deprecation for details.
> [javac] Note: Some input files use unchecked or unsafe operations.
> [javac] Note: Recompile with -Xlint:unchecked for details.
> [javac] Note: Some messages have been simplified; recompile with
> -Xdiags:verbose to get full output
> [javac] 1 error
>
> BUILD FAILED
> /home/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-8.x/build.xml:634:
> The following error occurred while executing this line:
> /home/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-8.x/build.xml:578:
> The following error occurred while executing this line:
> /home/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-8.x/build.xml:59:
> The following error occurred while executing this line:
> /home/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-8.x/solr/build.xml:231:
> The following error occurred while executing this line:
> /home/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-8.x/solr/common-build.xml:550:
> The following error occurred while executing this line:
> /home/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-8.x/solr/common-build.xml:498:
> The following error occurred while executing this line:
> /home/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-8.x/solr/common-build.xml:393:
> The following error occurred while executing this line:
> /home/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-8.x/lucene/common-build.xml:580:
> The following error occurred while executing this line:
> /home/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-8.x/lucene/common-build.xml:2078:
> Compile failed; see the compiler error output for details.
>
> Total time: 11 minutes 4 seconds
> Build step 'Invoke Ant' marked build as failure
> Archiving artifacts
> Recording test results
> Email was triggered for: Failure - Any
> Sending email for trigger: Failure - Any
>
> -----
> To unsubscribe, e-mail: builds-unsubscr...@lucene.apache.org
> For additional commands, e-mail: builds-h...@lucene.apache.org



-- 
Sincerely yours
Mikhail Khludnev


Re: analytics.adoc validation does not pass

2020-01-26 Thread Mikhail Khludnev
Looking into.

On Sun, Jan 26, 2020 at 12:56 PM Dawid Weiss  wrote:

> Hi Cassandra!
>
> I get this when running precommit on gradle build (gradlew precommit):
>
> > Task :validateSourcePatterns
> [ant:groovy] Unescaped symbol "->" on line #43:
> solr/solr-ref-guide/src/analytics.adoc
> [ant:groovy] Unescaped symbol "->" on line #52:
> solr/solr-ref-guide/src/analytics.adoc
>
> I don't know if these are legitimate errors though or something that
> needs to be improved in validation code.
>
> Dawid
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: Congratulations to the new Lucene/Solr PMC Chair, Anshum Gupta!

2020-01-16 Thread Mikhail Khludnev
Congratulations, Anshum!

On Thu, Jan 16, 2020 at 2:23 AM Anshum Gupta  wrote:

> Thank you everyone! :)
>
> On Wed, Jan 15, 2020 at 2:54 PM Uwe Schindler  wrote:
>
>> Moin,
>>
>>
>>
>> Congratulations from one of the previous chairs! You’ll have a hard time
>> (just joking!).
>>
>>
>>
>> Uwe
>>
>>
>>
>> -
>>
>> Uwe Schindler
>>
>> Achterdiek 19, D-28357 Bremen
>>
>> https://www.thetaphi.de
>>
>> eMail: u...@thetaphi.de
>>
>>
>>
>> *From:* Tomás Fernández Löbbe 
>> *Sent:* Wednesday, January 15, 2020 11:28 PM
>> *To:* dev@lucene.apache.org
>> *Subject:* Re: Congratulations to the new Lucene/Solr PMC Chair, Anshum
>> Gupta!
>>
>>
>>
>> Congrats Anshum!!
>>
>>
>>
>> On Wed, Jan 15, 2020 at 2:25 PM David Smiley 
>> wrote:
>>
>> Congrats Anshum!
>>
>>
>> ~ David Smiley
>>
>> Apache Lucene/Solr Search Developer
>>
>> http://www.linkedin.com/in/davidwsmiley
>>
>>
>>
>>
>>
>> On Wed, Jan 15, 2020 at 4:15 PM Cassandra Targett 
>> wrote:
>>
>> Every year, the Lucene PMC rotates the Lucene PMC chair and Apache Vice
>> President position.
>>
>> This year we have nominated and elected Anshum Gupta as the Chair, a
>> decision that the board approved in its January 2020 meeting.
>>
>>
>>
>> Congratulations, Anshum!
>>
>>
>>
>> Cassandra
>>
>>
>>
>>
>
> --
> Anshum Gupta
>


-- 
Sincerely yours
Mikhail Khludnev


Re: State of DIH, concurrency

2020-01-07 Thread Mikhail Khludnev
Hello, aanno2.

Don't start it. Threads were fixed to the certain level as 3.6.1 under
https://issues.apache.org/jira/browse/SOLR-3360
But right after that threads were dropped out of DIH for overal sanity
under https://issues.apache.org/jira/browse/SOLR-3262
If you really need to get certain level of concurrency, declare multiple
DataImportHandlers in solrconfig.xml and submit multiple subrequest sharded
with explicit filters in parallel.

Good luck. You'd better to try any full-fledged ETL rather than bandaiding
DIH.


On Tue, Jan 7, 2020 at 1:50 PM aanno.trash  wrote:

> Hello,
>
> I looked a bit into the code of DIH (solr dataimporthandler and
> dataimporthandler-extra). I wonder what is the state of this code. It is
> in a 'contrib' folder and seems to work (and maintained). But is there
> ongoing development (e.g. additional features)?
>
> The reason I'm asking is that I'm in a project where DIH is used.
> However, the import is very slow, especially into a solr cluster. I
> glanced over the code for my case and it looks like DIH is only
> single-threaded. I guess that changing DIH to support multi-threading on
> the 'root' (top level) entity should result in a dramatic performance
> boost.
>
> Hence I hacked DIH a bit. To get started, I concentrated on the 'tika'
> example case with a bunch of private PDFs and only for a 'full-import'.
> From this (dirty) experiment, a multi-threaded DIH seems to be possible.
> However, some bigger code changes are needed. This is a incomplete list:
>
> * Make VariableResolver immutable and change its interface/contract
> * All EntityProcessors seems to be written with only a single-thread in
> mind. I circumvented the problem by (a) supporting a clone operation and
> (b) cloning the EntityProcessors for each thread.
> * To get the code more handy, I introduced several interfaces where only
> complete abstract classes has been around before (Context, DataSource,
> DIHProperties, EntityProcessor, ...). Perhaps this in not absolutely
> needed but has simplified the refactoring substantially.
>
> So this is my question: Would you consider the contribution of a BIG DIH
> change for merging into the project? Or is DIH just dead and should go
> away soon? And if you would consider the contribution, would it be best
> with several small changes or with a 'big-bang' pull request? Would you
> consider the contribution even if some features of DIH are dropped?
> (From my experiment, a very hot candidate to drop is the
> XPathEntityProcessor.)
>
> Kind regards,
>
> aanno2
>
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: [JENKINS] Lucene-Solr-8.x-Linux (64bit/jdk-13.0.1) - Build # 1684 - Failure!

2019-12-28 Thread Mikhail Khludnev
ns/workspace/Lucene-Solr-8.x-Linux/lucene/test-framework/lib/randomizedtesting-runner-2.7.2.jar:/home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/build/core/classes/java9:/home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/build/core/classes/java:/home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/build/core/classes/test:/home/jenkins/.ivy2/cache/com.carrotsearch.randomizedtesting/junit4-ant/jars/junit4-ant-2.7.2.jar
> com.carrotsearch.ant.tasks.junit4.slave.SlaveMainSafe -eventsfile
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/build/core/test/temp/junit4-J0-20191228_095417_8917447844543437384674.events
> @/home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/build/core/test/temp/junit4-J0-20191228_095417_891226391883297247733.suites
> -stdin
>[junit4] ERROR: JVM J0 ended with an exception: Forked process returned
> with error code: 134. Very likely a JVM crash.  See process stdout at:
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/build/core/test/temp/junit4-J0-20191228_095417_89118188758387783302429.sysout
>[junit4] at
> com.carrotsearch.ant.tasks.junit4.JUnit4.executeSlave(JUnit4.java:1542)
>[junit4] at
> com.carrotsearch.ant.tasks.junit4.JUnit4.access$000(JUnit4.java:123)
>[junit4] at
> com.carrotsearch.ant.tasks.junit4.JUnit4$2.call(JUnit4.java:997)
>[junit4] at
> com.carrotsearch.ant.tasks.junit4.JUnit4$2.call(JUnit4.java:994)
>[junit4] at
> java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>[junit4] at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>[junit4] at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>[junit4] at java.base/java.lang.Thread.run(Thread.java:830)
>
> BUILD FAILED
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/build.xml:634: The following
> error occurred while executing this line:
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/build.xml:578: The following
> error occurred while executing this line:
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/build.xml:59: The following
> error occurred while executing this line:
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/build.xml:50: The
> following error occurred while executing this line:
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/common-build.xml:1590:
> The following error occurred while executing this line:
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/common-build.xml:1117:
> At least one slave process threw an exception, first: Forked process
> returned with error code: 134. Very likely a JVM crash.  See process stdout
> at:
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/build/core/test/temp/junit4-J0-20191228_095417_89118188758387783302429.sysout
>
> Total time: 15 minutes 31 seconds
> Build step 'Invoke Ant' marked build as failure
> Archiving artifacts
> Setting
> ANT_1_8_2_HOME=/home/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2
> [WARNINGS] Skipping publisher since build result is FAILURE
> Recording test results
> Setting
> ANT_1_8_2_HOME=/home/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2
> Email was triggered for: Failure - Any
> Sending email for trigger: Failure - Any
> Setting
> ANT_1_8_2_HOME=/home/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2
> Setting
> ANT_1_8_2_HOME=/home/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2
> Setting
> ANT_1_8_2_HOME=/home/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2
> Setting
> ANT_1_8_2_HOME=/home/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2
> Setting
> ANT_1_8_2_HOME=/home/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2
> Setting
> ANT_1_8_2_HOME=/home/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2
>
> -
> To unsubscribe, e-mail: builds-unsubscr...@lucene.apache.org
> For additional commands, e-mail: builds-h...@lucene.apache.org



-- 
Sincerely yours
Mikhail Khludnev


Re: [VOTE] Release Lucene/Solr 8.4.0 RC1

2019-12-17 Thread Mikhail Khludnev
I've got

  2> NOTE: reproduce with: ant test
-Dtestcase=DimensionalRoutedAliasUpdateProcessorTest
-Dtests.method=testTimeCat -Dtests.seed=D05700662AF3B95B
-Dtests.locale=en-GB -Dtests.timezone=Australia/North -Dt

ests.asserts=true -Dtests.file.encoding=ISO-8859-1

[00:42:59.083] FAILURE 29.4s J1 |
DimensionalRoutedAliasUpdateProcessorTest.testTimeCat <<<

   > Throwable #1: java.lang.AssertionError: expected:<10> but was:<9>

   >at
__randomizedtesting.SeedInfo.seed([D05700662AF3B95B:E9AF41D56AD2F530]:0)

   >at org.junit.Assert.fail(Assert.java:88)

   >at org.junit.Assert.failNotEquals(Assert.java:834)

   >at org.junit.Assert.assertEquals(Assert.java:645)

   >at org.junit.Assert.assertEquals(Assert.java:631)

   >at
org.apache.solr.update.processor.DimensionalRoutedAliasUpdateProcessorTest.assertCatTimeInvariants(DimensionalRoutedAliasUpdateProcessorTest.java:678)

   >at
org.apache.solr.update.processor.DimensionalRoutedAliasUpdateProcessorTest.testTimeCat(DimensionalRoutedAliasUpdateProcessorTest.java:196)

which didn't reproduce to me when I retry.

+0

On Tue, Dec 17, 2019 at 9:23 PM Adrien Grand  wrote:

> Please vote for release candidate 1 for Lucene/Solr 8.4.0
>
> The artifacts can be downloaded from:
>
> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.4.0-RC1-revc91d36f50efb62e55bcc3a1adc0442b207018670
>
> You can run the smoke tester directly with this command:
>
> python3 -u dev-tools/scripts/smokeTestRelease.py \
>
> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.4.0-RC1-revc91d36f50efb62e55bcc3a1adc0442b207018670
>
> The vote will be open for at least 3 working days, i.e. until 2019-12-20
> 19:00 UTC.
>
> [ ] +1  approve
> [ ] +0  no opinion
> [ ] -1  disapprove (and reason why)
>
> Here is my +1
>
> --
> Adrien
>


-- 
Sincerely yours
Mikhail Khludnev


Re: [JENKINS] Lucene-Solr-8.x-Linux (64bit/jdk-12.0.1) - Build # 1434 - Failure!

2019-11-01 Thread Mikhail Khludnev
ss
> SpanMultiTermQueryWrapper
> [javac]
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/queries/src/test/org/apache/lucene/queries/payloads/TestPayloadCheckQuery.java:217:
> warning: [rawtypes] found raw type: SpanMultiTermQueryWrapper
> [javac] SpanMultiTermQueryWrapper nin = new
> SpanMultiTermQueryWrapper(new WildcardQuery(new Term("field", "nin*")));
> [javac] ^
> [javac]   missing type arguments for generic class
> SpanMultiTermQueryWrapper
> [javac]   where Q is a type-variable:
> [javac] Q extends MultiTermQuery declared in class
> SpanMultiTermQueryWrapper
> [javac]
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/queries/src/test/org/apache/lucene/queries/payloads/TestPayloadCheckQuery.java:217:
> warning: [unchecked] unchecked call to SpanMultiTermQueryWrapper(Q) as a
> member of the raw type SpanMultiTermQueryWrapper
> [javac] SpanMultiTermQueryWrapper nin = new
> SpanMultiTermQueryWrapper(new WildcardQuery(new Term("field", "nin*")));
> [javac] ^
> [javac]   where Q is a type-variable:
> [javac] Q extends MultiTermQuery declared in class
> SpanMultiTermQueryWrapper
> [javac]
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/queries/src/test/org/apache/lucene/queries/payloads/TestPayloadScoreQuery.java:198:
> warning: [rawtypes] found raw type: SpanMultiTermQueryWrapper
> [javac] SpanMultiTermQueryWrapper xyz = new
> SpanMultiTermQueryWrapper<>(new WildcardQuery(new Term("field", "xyz*")));
> [javac] ^
> [javac]   missing type arguments for generic class
> SpanMultiTermQueryWrapper
> [javac]   where Q is a type-variable:
> [javac] Q extends MultiTermQuery declared in class
> SpanMultiTermQueryWrapper
> [javac] Note:
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/queries/src/test/org/apache/lucene/queries/payloads/PayloadHelper.java
> uses or overrides a deprecated API.
> [javac] Note: Recompile with -Xlint:deprecation for details.
> [javac] 3 errors
>
> [...truncated 1 lines...]
> BUILD FAILED
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/build.xml:634: The following
> error occurred while executing this line:
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/build.xml:578: The following
> error occurred while executing this line:
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/build.xml:59: The following
> error occurred while executing this line:
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/build.xml:481: The
> following error occurred while executing this line:
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/common-build.xml:2290:
> The following error occurred while executing this line:
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/module-build.xml:67:
> The following error occurred while executing this line:
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/module-build.xml:64:
> The following error occurred while executing this line:
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/common-build.xml:921:
> The following error occurred while executing this line:
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/common-build.xml:933:
> The following error occurred while executing this line:
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/common-build.xml:2074:
> Compile failed; see the compiler error output for details.
>
> Total time: 25 minutes 43 seconds
> Build step 'Invoke Ant' marked build as failure
> Archiving artifacts
> Setting
> ANT_1_8_2_HOME=/home/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2
> [WARNINGS] Skipping publisher since build result is FAILURE
> Recording test results
> Setting
> ANT_1_8_2_HOME=/home/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2
> Email was triggered for: Failure - Any
> Sending email for trigger: Failure - Any
> Setting
> ANT_1_8_2_HOME=/home/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2
> Setting
> ANT_1_8_2_HOME=/home/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2
> Setting
> ANT_1_8_2_HOME=/home/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2
> Setting
> ANT_1_8_2_HOME=/home/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2
> Setting
> ANT_1_8_2_HOME=/home/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2
> Setting
> ANT_1_8_2_HOME=/home/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2
>
> -
> To unsubscribe, e-mail: builds-unsubscr...@lucene.apache.org
> For additional commands, e-mail: builds-h...@lucene.apache.org



-- 
Sincerely yours
Mikhail Khludnev


Re: Issue in consuming Lucene plugin

2019-10-08 Thread Mikhail Khludnev
Hello, Rosa.

You can try to replace them to
org.apache.lucene.queryparser.classic.QueryParser
org.apache.lucene.queryparser.classic.ParseException
org.apache.lucene.search.IndexSearcher
Though, migrating from 2.9 to 7 might have high mileage.

On Tue, Oct 8, 2019 at 5:32 AM Rosa Margarita Casillas 
wrote:

> Hi team,
>
> I am from the Data Studio team, we are consuming the
> *org.apache.lucene.core* plugin in our source code. Where in we updated
> the *org.apache.lucene.core* version from  *2.9.0 to 7.1.0**(supported by
> Photon)*.
> But doing that gives us the compilation error in below statements,
>
>
>
> *import org.apache.lucene.queryParser.ParseException;import
> org.apache.lucene.queryParser.QueryParser;import
> org.apache.lucene.search.Searcher;*
>
>
> *Seems like cant find it*
>
> Can you please let us know how to resolve these imports? does something
> changed about that classes in new Photon content from previous versions?
>
> Thank you,
> Rosa Casillas
>
> - To
> unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
> commands, e-mail: dev-h...@lucene.apache.org



-- 
Sincerely yours
Mikhail Khludnev


Re: 8.3 release

2019-10-02 Thread Mikhail Khludnev
Excuse me. I have to recall this message regarding SOLR-13764.

On Mon, Sep 30, 2019 at 10:56 PM Mikhail Khludnev  wrote:

> Ishan, thanks for update.
> May I propose to hold it for this week, beside of the severe issues you
> mentioned, I'd like to drop pretty neat JSON Query parser for Interval
> Queries https://issues.apache.org/jira/browse/SOLR-13764 this week.
>
> пн, 30 сент. 2019 г., 18:04 Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com>:
>
>> * Due to some unfinished and half merged work in SOLR-13661, we are in
>> a difficult position for a branch cutting today (as I had proposed).
>> I have documented the options on how to deal with that immediately in
>> that issue. I'll work to resolve the situation and cut a branch as
>> soon as possible.
>> * SOLR-13677 is also a blocker for release, but I can proceed with the
>> branch cutting.
>>
>> I'll take a look at the ref guide's simultaneous release as we reach
>> closer to building the artifacts.
>> Thanks,
>> Ishan
>>
>> On Wed, Sep 18, 2019 at 9:06 PM Cassandra Targett 
>> wrote:
>> >
>> > As I’ve mentioned to some of you over the past couple of weeks, I want
>> to propose that we don’t “release” the Ref Guide at all the way we have
>> been doing it.
>> >
>> > It deserves a separate thread, which since it’s come up a few times
>> this week I should start now, but in essence, my idea is to no longer treat
>> the PDF as a release artifact that requires a vote, and publish the HTML as
>> our primary version of the Ref Guide in effectively the same way we publish
>> the javadocs (at the same time as the binary artifacts).
>> >
>> > Instead of highjacking this thread with that discussion since it has
>> several aspects, let me send another mail on it where I can flesh it out
>> more and we can discuss there. I have the mail mostly queued up and ready
>> to go already.
>> >
>> > Cassandra
>> > On Sep 18, 2019, 10:23 AM -0500, Gus Heck , wrote:
>> >
>> > I learned recently that it's actually all  documented here:
>> https://lucene.apache.org/solr/guide/8_1/how-to-contribute.html#ref-guide-publication-process
>> >
>> > On Tue, Sep 17, 2019 at 7:31 PM Ishan Chattopadhyaya <
>> ichattopadhy...@gmail.com> wrote:
>> >>
>> >> Hi Adrien,
>> >> Indeed, meant to write about starting a vote.
>> >>
>> >> @Gus, I'll have to let Cassandra weigh in on this one as I'm not very
>> familiar with the ref guide release process.
>> >>
>> >> Regards,
>> >> Ishan
>> >>
>> >> On Mon, 16 Sep, 2019, 7:28 PM Adrien Grand,  wrote:
>> >>>
>> >>> +1 to start working on 8.3
>> >>>
>> >>> Did you mean "start a vote" when you wrote "release the artifacts"? It
>> >>> got me wondering because I don't think we frequently managed to go
>> >>> from cutting a branch to releasing artifacts in so little time in the
>> >>> past.
>> >>>
>> >>> On Mon, Sep 16, 2019 at 5:52 PM Ishan Chattopadhyaya
>> >>>  wrote:
>> >>> >
>> >>> > Hi all,
>> >>> > We have a lot of unreleased features and fixes. I propose that we
>> cut
>> >>> > a 8.3 branch in two weeks (in order to have sufficient time to bake
>> in
>> >>> > all in-progress features). If there are no objections to doing so, I
>> >>> > can volunteer for the release as an RM and plan for cutting a
>> release
>> >>> > branch on 30 September (and release the artifacts about 3-4 days
>> after
>> >>> > that).
>> >>> >
>> >>> > WDYT?
>> >>> > Regards,
>> >>> > Ishan
>> >>> >
>> >>> >
>> -
>> >>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> >>> > For additional commands, e-mail: dev-h...@lucene.apache.org
>> >>> >
>> >>>
>> >>>
>> >>> --
>> >>> Adrien
>> >>>
>> >>> -
>> >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> >>> For additional commands, e-mail: dev-h...@lucene.apache.org
>> >>>
>> >
>> >
>> > --
>> > http://www.needhamsoftware.com (work)
>> > http://www.the111shift.com (play)
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>

-- 
Sincerely yours
Mikhail Khludnev


Re: filter in JSON Query DSL

2019-10-01 Thread Mikhail Khludnev
Raised  https://issues.apache.org/jira/browse/SOLR-13808. Thanks, Jochen!

On Mon, Sep 30, 2019 at 4:26 PM Mikhail Khludnev  wrote:

> Jochen, right! Sorry for didn't get your point earlier.  {!bool filter=}
> means Lucene filter, not Solr's one. I suppose {!bool cache=true} flag can
> be easily added, but so far there is no laconic syntax for it. Don't
> hesitate to raise a jira for it.
>
> On Mon, Sep 30, 2019 at 3:18 PM Jochen Barth 
> wrote:
>
>> Here the corrected equivalent query, giving the same results (and still
>> much faster) as JsonQueryDSL:
>>
>> +filter(+((_query_:"{!graph from=parent_ids to=id }(meta_title_txt:muller
>> meta_name_txt:muller meta_subject_txt:muller meta_shelflocator_txt:muller)"
>> _query_:"{!graph from=id to=parent_ids  traversalFilter=\"class_s:meta
>> -type_s:multivolume_work -type_s:periodical -type_s:issue
>> -type_s:journal\"}(meta_title_txt:muller meta_name_txt:muller
>> text_ocr_ft:muller text_heidicon_ft:muller text_watermark_ft:muller
>> text_catalogue_ft:muller text_index_ft:muller text_tei_ft:muller
>> text_abstract_ft:muller text_pdf_ft:muller)") ) +class_s:meta )
>> -_query_:"{!join to=id from=parent_ids}(filter(+((_query_:\"{!graph
>> from=parent_ids to=id }(meta_title_txt:muller meta_name_txt:muller
>> meta_subject_txt:muller meta_shelflocator_txt:muller)\" _query_:\"{!graph
>> from=id to=parent_ids  traversalFilter=\\\"class_s:meta
>> -type_s:multivolume_work -type_s:periodical -type_s:issue
>> -type_s:journal\\\"}(meta_title_txt:muller meta_name_txt:muller
>> text_ocr_ft:muller text_heidicon_ft:muller text_watermark_ft:muller
>> text_catalogue_ft:muller text_index_ft:muller text_tei_ft:muller
>> text_abstract_ft:muller text_pdf_ft:muller)\") ) +class_s:meta ))"
>>
>> I am querying the "core" of the above query (the string before
>> »-_query_:"{!join«) for faceting;
>> than the next query is the one above [ like »+(a) -{!join...}(a)« ]
>>
>> Now the second query is running in much less time because the result of
>> term "a" is cached.
>>
>> Caching seems not to work with {boolean=>{must=>"*:*", filter=>...}}.
>>
>> Kind regards,
>> Jochen
>>
>>
>>
>>
>>
>>
>> Am 30.09.19 um 11:02 schrieb Jochen Barth:
>>
>> Ooops... Json is returning 48652 docs, StandardQueryParser 827...
>>
>> Must check this.
>>
>> Sorry,
>>
>> Jochen
>>
>> Am 30.09.19 um 10:39 schrieb Jochen Barth:
>>
>> the *:* in JsonQueryDSL is appearing two times because of two times
>> »filter(...)« in StandardQueryParser.
>>
>>
>>
>> I've did some System.out.println in FastLRU, LRU, LFUCache,
>> here the logging with JsonQueryDSL (solr 8.1.1):
>>
>> Fast-get +*:* #(+(([[meta_title_txt:muller meta_name_txt:muller
>> meta_subject_txt:muller
>> meta_shelflocator_txt:muller],parent_ids=id][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false]
>> [[meta_title_txt:muller meta_name_txt:muller text_ocr_ft:muller
>> text_heidicon_ft:muller text_watermark_ft:muller text_catalogue_ft:muller
>> text_index_ft:muller text_tei_ft:muller text_abstract_ft:muller
>> text_pdf_ft:muller],id=parent_ids] [TraversalFilter: class_s:meta
>> -type_s:multivolume_work -type_s:periodical -type_s:issue
>> -type_s:journal][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false]))
>> +class_s:meta) valLen=null
>>
>> Fast-get DocValuesFieldExistsQuery [field=id] valLen=38
>>
>> Fast-get DocValuesFieldExistsQuery [field=parent_ids] valLen=38
>>
>> Fast-put +*:* #(+(([[meta_title_txt:muller meta_name_txt:muller
>> meta_subject_txt:muller
>> meta_shelflocator_txt:muller],parent_ids=id][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false]
>> [[meta_title_txt:muller meta_name_txt:muller text_ocr_ft:muller
>> text_heidicon_ft:muller text_watermark_ft:muller text_catalogue_ft:muller
>> text_index_ft:muller text_tei_ft:muller text_abstract_ft:muller
>> text_pdf_ft:muller],id=parent_ids] [TraversalFilter: class_s:meta
>> -type_s:multivolume_work -type_s:periodical -type_s:issue
>> -type_s:journal][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false]))
>> +class_s:meta)
>>
>> ...
>>
>> Fast(LRUCache)-get is called only once, but it should have been called 2
>> Times:
>> the first for finding out that this filter is not already cached and the
>> second one for the identical part of the subquery.
>>
>>
&

Re: 8.3 release

2019-09-30 Thread Mikhail Khludnev
Ishan, thanks for update.
May I propose to hold it for this week, beside of the severe issues you
mentioned, I'd like to drop pretty neat JSON Query parser for Interval
Queries https://issues.apache.org/jira/browse/SOLR-13764 this week.

пн, 30 сент. 2019 г., 18:04 Ishan Chattopadhyaya :

> * Due to some unfinished and half merged work in SOLR-13661, we are in
> a difficult position for a branch cutting today (as I had proposed).
> I have documented the options on how to deal with that immediately in
> that issue. I'll work to resolve the situation and cut a branch as
> soon as possible.
> * SOLR-13677 is also a blocker for release, but I can proceed with the
> branch cutting.
>
> I'll take a look at the ref guide's simultaneous release as we reach
> closer to building the artifacts.
> Thanks,
> Ishan
>
> On Wed, Sep 18, 2019 at 9:06 PM Cassandra Targett 
> wrote:
> >
> > As I’ve mentioned to some of you over the past couple of weeks, I want
> to propose that we don’t “release” the Ref Guide at all the way we have
> been doing it.
> >
> > It deserves a separate thread, which since it’s come up a few times this
> week I should start now, but in essence, my idea is to no longer treat the
> PDF as a release artifact that requires a vote, and publish the HTML as our
> primary version of the Ref Guide in effectively the same way we publish the
> javadocs (at the same time as the binary artifacts).
> >
> > Instead of highjacking this thread with that discussion since it has
> several aspects, let me send another mail on it where I can flesh it out
> more and we can discuss there. I have the mail mostly queued up and ready
> to go already.
> >
> > Cassandra
> > On Sep 18, 2019, 10:23 AM -0500, Gus Heck , wrote:
> >
> > I learned recently that it's actually all  documented here:
> https://lucene.apache.org/solr/guide/8_1/how-to-contribute.html#ref-guide-publication-process
> >
> > On Tue, Sep 17, 2019 at 7:31 PM Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com> wrote:
> >>
> >> Hi Adrien,
> >> Indeed, meant to write about starting a vote.
> >>
> >> @Gus, I'll have to let Cassandra weigh in on this one as I'm not very
> familiar with the ref guide release process.
> >>
> >> Regards,
> >> Ishan
> >>
> >> On Mon, 16 Sep, 2019, 7:28 PM Adrien Grand,  wrote:
> >>>
> >>> +1 to start working on 8.3
> >>>
> >>> Did you mean "start a vote" when you wrote "release the artifacts"? It
> >>> got me wondering because I don't think we frequently managed to go
> >>> from cutting a branch to releasing artifacts in so little time in the
> >>> past.
> >>>
> >>> On Mon, Sep 16, 2019 at 5:52 PM Ishan Chattopadhyaya
> >>>  wrote:
> >>> >
> >>> > Hi all,
> >>> > We have a lot of unreleased features and fixes. I propose that we cut
> >>> > a 8.3 branch in two weeks (in order to have sufficient time to bake
> in
> >>> > all in-progress features). If there are no objections to doing so, I
> >>> > can volunteer for the release as an RM and plan for cutting a release
> >>> > branch on 30 September (and release the artifacts about 3-4 days
> after
> >>> > that).
> >>> >
> >>> > WDYT?
> >>> > Regards,
> >>> > Ishan
> >>> >
> >>> > -
> >>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >>> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >>> >
> >>>
> >>>
> >>> --
> >>> Adrien
> >>>
> >>> -
> >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >>> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>>
> >
> >
> > --
> > http://www.needhamsoftware.com (work)
> > http://www.the111shift.com (play)
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: filter in JSON Query DSL

2019-09-30 Thread Mikhail Khludnev
t;bool":{"should":[{"bool":{"should":[{"graph":{"from":"parent_ids","query":"meta_title_txt:muller
> meta_name_txt:muller meta_subject_txt:muller
> meta_shelflocator_txt:muller","to":"id"}},{"graph":{"from":"id","query":"meta_title_txt:muller
> meta_name_txt:muller text_ocr_ft:muller text_heidicon_ft:muller
> text_watermark_ft:muller text_catalogue_ft:muller text_index_ft:muller
> text_tei_ft:muller text_abstract_ft:muller
> text_pdf_ft:muller","to":"parent_ids","traversalFilter":"class_s:meta
> -type_s:multivolume_work -type_s:periodical -type_s:issue
> -type_s:journal"}}]}}]}},"class_s:meta"]}},"must":"*:*","must_not":[{"join":{"from":"parent_ids","query":{"bool":{"filter":{"bool":{"must":[{"bool":{"should":[{"bool":{"should":[{"graph":{"from":"parent_ids","query":"meta_title_txt:muller
> meta_name_txt:muller meta_subject_txt:muller
> meta_shelflocator_txt:muller","to":"id"}},{"graph":{"from":"id","query":"meta_title_txt:muller
> meta_name_txt:muller text_ocr_ft:muller text_heidicon_ft:muller
> text_watermark_ft:muller text_catalogue_ft:muller text_index_ft:muller
> text_tei_ft:muller text_abstract_ft:muller
> text_pdf_ft:muller","to":"parent_ids","traversalFilter":"class_s:meta
> -type_s:multivolume_work -type_s:periodical -type_s:issue
> -type_s:journal"}}]}}]}},"class_s:meta"]}},"must":"*:*"}},"to":"id"}}]}}}
>
> Kind regards,
> Jochen
>
>
>
> Am 29.09.19 um 21:28 schrieb Mikhail Khludnev:
>
> On Sun, Sep 29, 2019 at 8:37 PM Barth, Jochen  
> 
> wrote:
>
>
> Thanks for your hint. The documentation does not say if the result of
> filter is cached here (like fq=...) (I could test this).
>
>
> 'filter' implies caching.
>
>
>
> Is *:* more expensive  (query time) than filter() (*:* not required in
> StandardQueryParser) ?
>
>
> I either doesn't get the question or it isn't worth to worry about.
>
>
>
> Kind regrads,
> Jochen
>
> 
> Von: Mikhail Khludnev  
> Gesendet: Samstag, 28. September 2019 22:58
> An: solr-user
> Betreff: Re: filter in JSON Query DSL
>
> Giving
> https://lucene.apache.org/solr/guide/8_0/other-parsers.html#boolean-query-parser
> something
> like
> '{"query": { "bool": { "must": ["*:*"] , "filter": [
> "meta_subject_txt:globe" ] } } }'
> I'm not sure why to put filter under must they should be siblings.
>
> On Fri, Sep 27, 2019 at 4:34 PM Jochen Barth  
> 
> wrote:
>
>
> Dear reader,
>
> this query works as expected:
>
> curl -XGET http://localhost:8982/solr/Suchindex/query -d '
> {"query": { "bool": { "must": "*:*" } },
> "filter": [ "meta_subject_txt:globe" ] }'
>
> this does not (nor without the curley braces around "filter"):
>
> curl -XGET http://localhost:8982/solr/Suchindex/query -d '
> {"query": { "bool": { "must": [ "*:*", { "filter": [
> "meta_subject_txt:globe" ] } ] } } }'
>
> Is "filter" within deeper queries possible?
>
> I've got some complex queries with a "kernel" somewhat below the top
> level...
>
> Is "canonical" json important to match query cache entry?
>
> Would it help to serialize this queries to standard syntax and then use
> filter(...)?
>
> Kind regards,
>
> Jochen
>
>
>
> --
> Jochen Barth * Universitätsbibliothek Heidelberg, IT * Telefon 06221
> 54-2580
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>
>
>
> --
> Jochen Barth * Universitätsbibliothek Heidelberg, IT * Telefon 06221 54-2580
>
>
> --
> Jochen Barth * Universitätsbibliothek Heidelberg, IT * Telefon 06221 54-2580
>
>
> --
> Jochen Barth * Universitätsbibliothek Heidelberg, IT * Telefon 06221 54-2580
>
>
> --
> Jochen Barth * Universitätsbibliothek Heidelberg, IT * Telefon 06221 54-2580
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: Intervals in Solr json request

2019-09-27 Thread Mikhail Khludnev
Hello, dev.

What do you think about the syntax proposal?
https://cwiki.apache.org/confluence/display/SOLR/SOLR-13764+Discussion+-+Interval+Queries+in+JSON


On Mon, Sep 16, 2019 at 3:05 PM Jason Gerlowski 
wrote:

> Hi Mikhail,
>
> I'm having trouble understanding the exact syntax you're proposing.
> Is there a jira where the syntax is described in a little more detail?
>  If not, would you care to put together a writeup on a jira somewhere?
>  It's hard (for me at least) to weigh in as things are currently.
>
> Best,
>
> Jason
>
> On Sun, Sep 8, 2019 at 3:25 PM Mikhail Khludnev  wrote:
> >
> > Ok. It might be a parser referring to a json object under some new
> property
> >
> > {
> >"query": {
> >"jinterval":"just a name"  // introducing new QPPlugin
> >   },
> >"jparams": {   // introducing new top-level entry
> > "just a name": {
> >   "or":["foo",
> >   "bar",
> >{ "unordered":
> > ["bag",
> >   "baz",
> >   "ban",
> >{ "phrase": ["moo","foo"]}
> >  ]
> > }
> >   ],
> >"field":"text_content"
> >  }
> >  }
> > }
> >
> > Can we consider it as a spec for the new feature?
> >
> > On Sun, Sep 8, 2019 at 12:16 AM Mikhail Khludnev 
> wrote:
> >>
> >> Thanks for your warm responses. I encounter Intervals, and considering
> introducing them in Solr JSON Request API.
> >> Following Query DSL approach gives me something like
> >> "interval":{  "or":["foo",
> >>   "bar",
> >>{"interval": { "unordered":
> >> ["bag",
> >>   "baz",
> >>   "ban",
> >>{ "interval":{ "phrase":
> ["moo","foo"]} }
> >>  ]}
> >>   }
> >> ],
> >>"field":"text_content"}
> >> So, it implies creating {!inteval} query parser, which handles local
> param in a certain way, eg  it shouldn't support "or" and "phrase" an the
> same node.
> >> Not sure how to propagate "filed" to term nodes.
> >>
> >> I'd rather want to have more control over syntax and JsonQueryConverter.
> >>  "interval":{  "or":["foo",
> >>   "bar",
> >>{ "unordered":
> >> ["bag",
> >>   "baz",
> >>   "ban",
> >>{ "phrase": ["moo","foo"]}
> >>      ]}
> >>   }
> >>   ],
> >>"field":"text_content"}
> >>
> >> Any ideas, preferences?
> >>
> >> On Sat, Sep 7, 2019 at 12:03 AM Mikhail Khludnev 
> wrote:
> >>>
> >>> Hello,
> >>>
> >>> Finally we let users to send span queries via XML (yeah) query parser.
> But I feel awkward to invoke XML under Json. Straightforward approach lead
> us to bunch of span[Or|And|Not|Etc] QParser plugins. Are there any more
> elegant ideas?
> >>>
> >>> --
> >>> Sincerely yours
> >>> Mikhail Khludnev
> >>
> >>
> >>
> >> --
> >> Sincerely yours
> >> Mikhail Khludnev
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: [POLL] Should notifications of NEW Jira issues go to dev@?

2019-09-18 Thread Mikhail Khludnev
[ ] Leave it as is - I like quiet
[ ] A mail to dev@ for every new JIRA
[v] One daily digest mail per day with a list of new JIRAs
[ ] Other (explain): ___

On Wed, Sep 18, 2019 at 12:10 PM Jan Høydahl  wrote:

> Hi,
>
> The transition to issues@ and builds@ lists (LUCENE-8951) is now
> completed, and I already enjoy a quieter dev@ folder!
>
> I'd like to check with all of you whether there is interest in getting
> notified here at dev@ about NEW Jira issue created. Currently there is an
> average of 4 new issues per day. The main motivation for this would be for
> those who want to follow new development but not all the
> details/discussions. We could easily configure JIRA to send all [Created]
> mails to dev@ in addition to issues@. Or we could try to have one daily
> digest mail of new issues, whether that's a small bot or a feature in JIRA
> (don't know). Let's to a poll:
>
> [ ] Leave it as is - I like quiet
> [ ] A mail to dev@ for every new JIRA
> [ ] One daily digest mail per day with a list of new JIRAs
> [ ] Other (explain): ___
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: Welcome Atri Sharma as Lucene/Solr committer

2019-09-18 Thread Mikhail Khludnev
Welcome, Atri.

On Wed, Sep 18, 2019 at 10:12 AM Adrien Grand  wrote:

> Hi all,
>
> Please join me in welcoming Atri Sharma as Lucene/ Solr committer!
>
> If you are following activity on Lucene, this name will likely sound
> familiar to you: Atri has been very busy trying to improve Lucene over
> the past months. In particular, Atri recently started improving our
> top-hits optimizations like early termination on sorted indexes and
> WAND, when indexes are searched using multiple threads.
>
> Congratulations and welcome! It is a tradition to introduce yourself
> with a brief bio.
>
> --
> Adrien
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: Intervals in Solr json request

2019-09-16 Thread Mikhail Khludnev
Jason,
Here we go https://issues.apache.org/jira/browse/SOLR-13764


On Mon, Sep 16, 2019 at 3:05 PM Jason Gerlowski 
wrote:

> Hi Mikhail,
>
> I'm having trouble understanding the exact syntax you're proposing.
> Is there a jira where the syntax is described in a little more detail?
>  If not, would you care to put together a writeup on a jira somewhere?
>  It's hard (for me at least) to weigh in as things are currently.
>
> Best,
>
> Jason
>
> On Sun, Sep 8, 2019 at 3:25 PM Mikhail Khludnev  wrote:
> >
> > Ok. It might be a parser referring to a json object under some new
> property
> >
> > {
> >"query": {
> >"jinterval":"just a name"  // introducing new QPPlugin
> >   },
> >"jparams": {   // introducing new top-level entry
> > "just a name": {
> >   "or":["foo",
> >   "bar",
> >{ "unordered":
> > ["bag",
> >   "baz",
> >   "ban",
> >{ "phrase": ["moo","foo"]}
> >  ]
> > }
> >   ],
> >"field":"text_content"
> >  }
> >  }
> > }
> >
> > Can we consider it as a spec for the new feature?
> >
> > On Sun, Sep 8, 2019 at 12:16 AM Mikhail Khludnev 
> wrote:
> >>
> >> Thanks for your warm responses. I encounter Intervals, and considering
> introducing them in Solr JSON Request API.
> >> Following Query DSL approach gives me something like
> >> "interval":{  "or":["foo",
> >>   "bar",
> >>{"interval": { "unordered":
> >> ["bag",
> >>   "baz",
> >>   "ban",
> >>{ "interval":{ "phrase":
> ["moo","foo"]} }
> >>  ]}
> >>   }
> >> ],
> >>"field":"text_content"}
> >> So, it implies creating {!inteval} query parser, which handles local
> param in a certain way, eg  it shouldn't support "or" and "phrase" an the
> same node.
> >> Not sure how to propagate "filed" to term nodes.
> >>
> >> I'd rather want to have more control over syntax and JsonQueryConverter.
> >>  "interval":{  "or":["foo",
> >>   "bar",
> >>{ "unordered":
> >> ["bag",
> >>   "baz",
> >>   "ban",
> >>{ "phrase": ["moo","foo"]}
> >>      ]}
> >>   }
> >>   ],
> >>"field":"text_content"}
> >>
> >> Any ideas, preferences?
> >>
> >> On Sat, Sep 7, 2019 at 12:03 AM Mikhail Khludnev 
> wrote:
> >>>
> >>> Hello,
> >>>
> >>> Finally we let users to send span queries via XML (yeah) query parser.
> But I feel awkward to invoke XML under Json. Straightforward approach lead
> us to bunch of span[Or|And|Not|Etc] QParser plugins. Are there any more
> elegant ideas?
> >>>
> >>> --
> >>> Sincerely yours
> >>> Mikhail Khludnev
> >>
> >>
> >>
> >> --
> >> Sincerely yours
> >> Mikhail Khludnev
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-- 
Sincerely yours
Mikhail Khludnev


[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2019-09-11 Thread Mikhail Khludnev (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927490#comment-16927490
 ] 

Mikhail Khludnev commented on LUCENE-5189:
--

Giving that LUCENE-8585, LUCENE-8374 optimizes for absent values, shouldn't we 
have an IW method to nuke DV at certain docs? 

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
>Priority: Major
> Fix For: 4.6, 6.0
>
> Attachments: LUCENE-5189-4x.patch, LUCENE-5189-4x.patch, 
> LUCENE-5189-no-lost-updates.patch, LUCENE-5189-renames.patch, 
> LUCENE-5189-segdv.patch, LUCENE-5189-updates-order.patch, 
> LUCENE-5189-updates-order.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189_process_events.patch, 
> LUCENE-5189_process_events.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-12490) Query DSL supports for further referring and exclusion in JSON facets

2019-09-10 Thread Mikhail Khludnev (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-12490:

Summary: Query DSL supports for further referring and exclusion in JSON 
facets   (was: introduce json.queries supports DSL for further referring and 
exclusion in JSON facets )

> Query DSL supports for further referring and exclusion in JSON facets 
> --
>
> Key: SOLR-12490
> URL: https://issues.apache.org/jira/browse/SOLR-12490
> Project: Solr
>  Issue Type: Improvement
>  Components: Facet Module, faceting
>    Reporter: Mikhail Khludnev
>Priority: Major
>  Labels: newdev
>
> It's spin off from the 
> [discussion|https://issues.apache.org/jira/browse/SOLR-9685?focusedCommentId=16508720=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16508720].
>  
> h2. Problem
> # after SOLR-9685 we can tag separate clauses in hairish queries like 
> {{parent}}, {{bool}}
> # we can {{domain.excludeTags}}
> # we are looking for child faceting with exclusions, see SOLR-9510, SOLR-8998 
>
> # but we can refer only separate params in {{domain.filter}}, it's not 
> possible to refer separate clauses
> see the first comment



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12490) introduce json.queries supports DSL for further referring and exclusion in JSON facets

2019-09-10 Thread Mikhail Khludnev (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926538#comment-16926538
 ] 

Mikhail Khludnev commented on SOLR-12490:
-

I think we'd rather continue with adding yet another small cut. 

{code}
{
"query" : {...}, 
"params":{
 "childFq":[{ "#color" :"color:black" },
{ "#size" : "size:L" }]
},
"facet":{
   "sku_colors_in_prods":{ "type" : "terms", "field" : "color",
  "domain" : {
   "excludeTags":["top",   "color"],   
   "filter":[ 
  "{!json_param}childFq"  
   ]
   }
   }
}
}
{code}

Ideas are: 
* put json as param value, parser garbles it to meaningless string, but it's 
still available via {{req.getJSON()}}. 
* filter string invokes new query parser which convert json param as query DSL, 
need to decide how to keep {{JsonQueryConverter}} counter.   

Shouldn't be a big deal. Right?  

> introduce json.queries supports DSL for further referring and exclusion in 
> JSON facets 
> ---
>
> Key: SOLR-12490
>     URL: https://issues.apache.org/jira/browse/SOLR-12490
> Project: Solr
>  Issue Type: Improvement
>  Components: Facet Module, faceting
>Reporter: Mikhail Khludnev
>Priority: Major
>  Labels: newdev
>
> It's spin off from the 
> [discussion|https://issues.apache.org/jira/browse/SOLR-9685?focusedCommentId=16508720=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16508720].
>  
> h2. Problem
> # after SOLR-9685 we can tag separate clauses in hairish queries like 
> {{parent}}, {{bool}}
> # we can {{domain.excludeTags}}
> # we are looking for child faceting with exclusions, see SOLR-9510, SOLR-8998 
>
> # but we can refer only separate params in {{domain.filter}}, it's not 
> possible to refer separate clauses
> see the first comment



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-12490) introduce json.queries supports DSL for further referring and exclusion in JSON facets

2019-09-10 Thread Mikhail Khludnev (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev reassigned SOLR-12490:
---

Assignee: (was: Mikhail Khludnev)

> introduce json.queries supports DSL for further referring and exclusion in 
> JSON facets 
> ---
>
> Key: SOLR-12490
> URL: https://issues.apache.org/jira/browse/SOLR-12490
> Project: Solr
>  Issue Type: Improvement
>  Components: Facet Module, faceting
>Reporter: Mikhail Khludnev
>Priority: Major
>  Labels: newdev
>
> It's spin off from the 
> [discussion|https://issues.apache.org/jira/browse/SOLR-9685?focusedCommentId=16508720=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16508720].
>  
> h2. Problem
> # after SOLR-9685 we can tag separate clauses in hairish queries like 
> {{parent}}, {{bool}}
> # we can {{domain.excludeTags}}
> # we are looking for child faceting with exclusions, see SOLR-9510, SOLR-8998 
>
> # but we can refer only separate params in {{domain.filter}}, it's not 
> possible to refer separate clauses
> see the first comment



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-13748) mm (min should match) param for {!bool} query parser

2019-09-09 Thread Mikhail Khludnev (Jira)
Mikhail Khludnev created SOLR-13748:
---

 Summary: mm (min should match) param for {!bool} query parser
 Key: SOLR-13748
 URL: https://issues.apache.org/jira/browse/SOLR-13748
 Project: Solr
  Issue Type: Sub-task
  Components: query parsers
Reporter: Mikhail Khludnev






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Intervals in Solr json request

2019-09-08 Thread Mikhail Khludnev
Ok. It might be a parser referring to a json object under some new property

{
   "query": {
   "jinterval":"just a name"  // introducing new QPPlugin
  },
   "jparams": {   // introducing new top-level entry
"just a name": {
  "or":["foo",
  "bar",
   { "unordered":
["bag",
  "baz",
  "ban",
   { "phrase": ["moo","foo"]}
 ]
}
  ],
       "field":"text_content"
 }
 }
}

Can we consider it as a spec for the new feature?

On Sun, Sep 8, 2019 at 12:16 AM Mikhail Khludnev  wrote:

> Thanks for your warm responses. I encounter Intervals, and considering
> introducing them in Solr JSON Request API.
> Following Query DSL approach gives me something like
> "interval":{  "or":["foo",
>   "bar",
>{"interval": { "unordered":
> ["bag",
>   "baz",
>   "ban",
>{ "interval":{ "phrase":
> ["moo","foo"]} }
>  ]}
>   }
> ],
>"field":"text_content"}
> So, it implies creating {!inteval} query parser, which handles local param
> in a certain way, eg  it shouldn't support "or" and "phrase" an the same
> node.
> Not sure how to propagate "filed" to term nodes.
>
> I'd rather want to have more control over syntax and JsonQueryConverter.
>  "interval":{  "or":["foo",
>   "bar",
>        { "unordered":
> ["bag",
>   "baz",
>   "ban",
>{ "phrase": ["moo","foo"]}
>  ]}
>   }
>   ],
>"field":"text_content"}
>
> Any ideas, preferences?
>
> On Sat, Sep 7, 2019 at 12:03 AM Mikhail Khludnev  wrote:
>
>> Hello,
>>
>> Finally we let users to send span queries via XML (yeah) query parser.
>> But I feel awkward to invoke XML under Json. Straightforward approach lead
>> us to bunch of span[Or|And|Not|Etc] QParser plugins. Are there any more
>> elegant ideas?
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Intervals in Solr json request

2019-09-07 Thread Mikhail Khludnev
Thanks for your warm responses. I encounter Intervals, and considering
introducing them in Solr JSON Request API.
Following Query DSL approach gives me something like
"interval":{  "or":["foo",
  "bar",
   {"interval": { "unordered":
["bag",
  "baz",
  "ban",
   { "interval":{ "phrase":
["moo","foo"]} }
 ]}
  }
],
   "field":"text_content"}
So, it implies creating {!inteval} query parser, which handles local param
in a certain way, eg  it shouldn't support "or" and "phrase" an the same
node.
Not sure how to propagate "filed" to term nodes.

I'd rather want to have more control over syntax and JsonQueryConverter.
 "interval":{  "or":["foo",
  "bar",
   { "unordered":
["bag",
  "baz",
  "ban",
       { "phrase": ["moo","foo"]}
 ]}
  }
  ],
   "field":"text_content"}

Any ideas, preferences?

On Sat, Sep 7, 2019 at 12:03 AM Mikhail Khludnev  wrote:

> Hello,
>
> Finally we let users to send span queries via XML (yeah) query parser. But
> I feel awkward to invoke XML under Json. Straightforward approach lead us
> to bunch of span[Or|And|Not|Etc] QParser plugins. Are there any more
> elegant ideas?
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
Sincerely yours
Mikhail Khludnev


[jira] [Resolved] (SOLR-3666) DataImportHandler status command in SolrCloud does not work properly

2019-09-07 Thread Mikhail Khludnev (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev resolved SOLR-3666.

Resolution: Won't Fix

There's nothing like this in DIH now.

> DataImportHandler status command in SolrCloud does not work properly 
> -
>
> Key: SOLR-3666
> URL: https://issues.apache.org/jira/browse/SOLR-3666
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler, SolrCloud
>Affects Versions: 4.0-ALPHA
>Reporter: Sauvik Sarkar
>Priority: Major
>
> The dataimport?command=status command does not work correctly when invoked on 
> the node not running the DIH in a SolrCloud configuration.
> The expectation is that no matter which node is importing any other node 
> should be able get the import status information.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



(Oh my) Spans in Solr json request

2019-09-06 Thread Mikhail Khludnev
Hello,

Finally we let users to send span queries via XML (yeah) query parser. But
I feel awkward to invoke XML under Json. Straightforward approach lead us
to bunch of span[Or|And|Not|Etc] QParser plugins. Are there any more
elegant ideas?

-- 
Sincerely yours
Mikhail Khludnev


[jira] [Updated] (SOLR-13738) UnifiedHighlighter

2019-09-04 Thread Mikhail Khludnev (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-13738:

Summary: UnifiedHighlighter  (was: RequestHandlerBase ... 
ClassCastException: class ... .lucene.search.IndexSearcher cannot be cast to 
class ... .solr.search.SolrIndexSearcher ...)

> UnifiedHighlighter
> --
>
> Key: SOLR-13738
> URL: https://issues.apache.org/jira/browse/SOLR-13738
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.2
>Reporter: Jochen Barth
>    Priority: Major
>
> Mikhail Khludnev said, this is a bug;
> Here the complete error message for the Query below:
> Just tested wirth 8.1.1: works.
> {quote}
> 2019-08-30 12:40:40.476 ERROR (qtp2116511124-65) [   x:Suchindex] 
> o.a.s.h.RequestHandlerBase java.lang.ClassCastException: class 
> org.apache.lucene.search.IndexSearcher cannot be cast to class 
> org.apache.solr.search.SolrIndexSearcher (or
> g.apache.lucene.search.IndexSearcher and 
> org.apache.solr.search.SolrIndexSearcher are in unnamed module of loader 
> org.eclipse.jetty.webapp.WebAppClassLoader @5ed190be)
> at 
> org.apache.solr.search.join.GraphQuery.createWeight(GraphQuery.java:115)
> at 
> org.apache.lucene.search.uhighlight.FieldOffsetStrategy.createOffsetsEnumsWeightMatcher(FieldOffsetStrategy.java:137)
> at 
> org.apache.lucene.search.uhighlight.FieldOffsetStrategy.createOffsetsEnumFromReader(FieldOffsetStrategy.java:74)
> at 
> org.apache.lucene.search.uhighlight.MemoryIndexOffsetStrategy.getOffsetsEnum(MemoryIndexOffsetStrategy.java:110)
> at 
> org.apache.lucene.search.uhighlight.FieldHighlighter.highlightFieldForDoc(FieldHighlighter.java:76)
> at 
> org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFieldsAsObjects(UnifiedHighlighter.java:641)
> at 
> org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFields(UnifiedHighlighter.java:510)
> at 
> org.apache.solr.highlight.UnifiedSolrHighlighter.doHighlighting(UnifiedSolrHighlighter.java:149)
> at 
> org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:171)
> at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:305)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2578)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:780)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:566)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:423)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:350)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
> at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1678)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1249)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
> at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
> at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:152)
> at 
> org.ecl

[jira] [Updated] (SOLR-13738) UnifiedHighlighter can't highlight GraphQuery

2019-09-04 Thread Mikhail Khludnev (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-13738:

Summary: UnifiedHighlighter can't highlight GraphQuery  (was: 
UnifiedHighlighter)

> UnifiedHighlighter can't highlight GraphQuery
> -
>
> Key: SOLR-13738
> URL: https://issues.apache.org/jira/browse/SOLR-13738
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.2
>Reporter: Jochen Barth
>    Priority: Major
>
> Mikhail Khludnev said, this is a bug;
> Here the complete error message for the Query below:
> Just tested wirth 8.1.1: works.
> {quote}
> 2019-08-30 12:40:40.476 ERROR (qtp2116511124-65) [   x:Suchindex] 
> o.a.s.h.RequestHandlerBase java.lang.ClassCastException: class 
> org.apache.lucene.search.IndexSearcher cannot be cast to class 
> org.apache.solr.search.SolrIndexSearcher (or
> g.apache.lucene.search.IndexSearcher and 
> org.apache.solr.search.SolrIndexSearcher are in unnamed module of loader 
> org.eclipse.jetty.webapp.WebAppClassLoader @5ed190be)
> at 
> org.apache.solr.search.join.GraphQuery.createWeight(GraphQuery.java:115)
> at 
> org.apache.lucene.search.uhighlight.FieldOffsetStrategy.createOffsetsEnumsWeightMatcher(FieldOffsetStrategy.java:137)
> at 
> org.apache.lucene.search.uhighlight.FieldOffsetStrategy.createOffsetsEnumFromReader(FieldOffsetStrategy.java:74)
> at 
> org.apache.lucene.search.uhighlight.MemoryIndexOffsetStrategy.getOffsetsEnum(MemoryIndexOffsetStrategy.java:110)
> at 
> org.apache.lucene.search.uhighlight.FieldHighlighter.highlightFieldForDoc(FieldHighlighter.java:76)
> at 
> org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFieldsAsObjects(UnifiedHighlighter.java:641)
> at 
> org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFields(UnifiedHighlighter.java:510)
> at 
> org.apache.solr.highlight.UnifiedSolrHighlighter.doHighlighting(UnifiedSolrHighlighter.java:149)
> at 
> org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:171)
> at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:305)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2578)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:780)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:566)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:423)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:350)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
> at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1678)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1249)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
> at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
> at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:152)
> at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:1

[jira] [Created] (SOLR-13740) Assert returning ExtendedFileField

2019-09-04 Thread Mikhail Khludnev (Jira)
Mikhail Khludnev created SOLR-13740:
---

 Summary: Assert returning ExtendedFileField
 Key: SOLR-13740
 URL: https://issues.apache.org/jira/browse/SOLR-13740
 Project: Solr
  Issue Type: Test
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Schema and Analysis
Reporter: Mikhail Khludnev


It works, commit sometimes later 
{code}
diff --git 
a/solr/core/src/test/org/apache/solr/schema/ExternalFileFieldSortTest.java 
b/solr/core/src/test/org/apache/solr/schema/ExternalFileFieldSortTest.java
index 632b413..4106e15 100644
--- a/solr/core/src/test/org/apache/solr/schema/ExternalFileFieldSortTest.java
+++ b/solr/core/src/test/org/apache/solr/schema/ExternalFileFieldSortTest.java
@@ -48,8 +48,9 @@
 
 addDocuments();
 assertQ("query",
-req("q", "*:*", "sort", "eff asc"),
+req("q", "*:*", "sort", "eff asc", "fl", "id,field(eff)"),
 "//result/doc[position()=1]/str[.='3']",
+"//result/doc[position()=1]/float[@name='field(eff)' and .='0.001']",
 "//result/doc[position()=2]/str[.='1']",
 "//result/doc[position()=10]/str[.='8']");
   }
{code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-13727) V2Requests: HttpSolrClient replaces first instance of "/solr" with "/api" instead of using regex pattern

2019-09-03 Thread Mikhail Khludnev (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-13727:

Status: Patch Available  (was: Open)

> V2Requests: HttpSolrClient replaces first instance of "/solr" with "/api" 
> instead of using regex pattern
> 
>
> Key: SOLR-13727
> URL: https://issues.apache.org/jira/browse/SOLR-13727
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: clients - java, v2 API
>Affects Versions: 8.2
>Reporter: Megan Carey
>Priority: Major
>  Labels: easyfix, patch
> Attachments: SOLR-13727.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When the HttpSolrClient is formatting a V2Request, it needs to change the 
> endpoint from the default "/solr/..." to "/api/...". It does so by simply 
> calling String.replace, which replaces the first instance of "/solr" in the 
> URL with "/api".
>  
> In the case where the host's address starts with "solr" and the HTTP protocol 
> is appended, this call changes the address for the request. Example:
> if baseUrl is "http://solr-host.com/8983/solr;, this call will change to 
> "http:/api-host.com:8983/solr"
>  
> We should use a regex pattern to ensure that we're replacing the correct 
> portion of the URL.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-13727) V2Requests: HttpSolrClient replaces first instance of "/solr" with "/api" instead of using regex pattern

2019-09-03 Thread Mikhail Khludnev (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-13727:

Attachment: SOLR-13727.patch
Status: Open  (was: Open)

> V2Requests: HttpSolrClient replaces first instance of "/solr" with "/api" 
> instead of using regex pattern
> 
>
> Key: SOLR-13727
> URL: https://issues.apache.org/jira/browse/SOLR-13727
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: clients - java, v2 API
>Affects Versions: 8.2
>Reporter: Megan Carey
>Priority: Major
>  Labels: easyfix, patch
> Attachments: SOLR-13727.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When the HttpSolrClient is formatting a V2Request, it needs to change the 
> endpoint from the default "/solr/..." to "/api/...". It does so by simply 
> calling String.replace, which replaces the first instance of "/solr" in the 
> URL with "/api".
>  
> In the case where the host's address starts with "solr" and the HTTP protocol 
> is appended, this call changes the address for the request. Example:
> if baseUrl is "http://solr-host.com/8983/solr;, this call will change to 
> "http:/api-host.com:8983/solr"
>  
> We should use a regex pattern to ensure that we're replacing the correct 
> portion of the URL.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9505) Extra tests to confirm Atomic Update remove behaviour

2019-09-03 Thread Mikhail Khludnev (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-9505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-9505:
---
Status: Patch Available  (was: Open)

> Extra tests to confirm Atomic Update remove behaviour
> -
>
> Key: SOLR-9505
> URL: https://issues.apache.org/jira/browse/SOLR-9505
> Project: Solr
>  Issue Type: Test
>Affects Versions: 7.0
>Reporter: Tim Owen
>Priority: Minor
> Attachments: SOLR-9505.patch
>
>
> The behaviour of the Atomic Update {{remove}} operation in the code doesn't 
> match the description in the Confluence documentation, which has been 
> questioned already. From looking at the source code, and using curl to 
> confirm, the {{remove}} operation only removes the first occurrence of a 
> value from a multi-valued field, it does not remove all occurrences. The 
> {{removeregex}} operation does remove all, however.
> There are unit tests for Atomic Updates, but they didn't assert this 
> behaviour, so I've added some extra assertions to confirm that, and a couple 
> of extra tests including one that checks that {{removeregex}} does a Regex 
> match of the whole value, not just a find-anywhere operation.
> I think it's the documentation that needs clarifying - the code behaves as 
> expected (assuming {{remove}} was intended to work that way?)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13735) DIH on SolrCloud more than 5 mins causes TimeoutException: Idle timeout expired: 300000/300000 ms

2019-09-03 Thread Mikhail Khludnev (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921416#comment-16921416
 ] 

Mikhail Khludnev commented on SOLR-13735:
-

{{2019-09-01 10:11:27.436 ERROR (qtp1650813924-22) [c:c_member_lots_a s:shard1}}
{{r:core_node3 x:c_collection_shard1_replica_n1] o.a.s.h.RequestHandlerBase}}
{{java.io.IOException: java.util.concurrent.TimeoutException: Idle timeout}}
{{expired: 30/30 ms}}
{{        at}}
{{org.eclipse.jetty.server.HttpInput$ErrorState.noContent(HttpInput.java:1080)}}
{{        at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:313)}}
{{        at}}
{{org.apache.solr.servlet.ServletInputStreamWrapper.read(ServletInputStreamWrapper.java:74)}}
{{        at}}
{{org.apache.commons.io.input.ProxyInputStream.read(ProxyInputStream.java:100)}}
{{        at}}
{{org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:79)}}
{{        at}}
{{org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:88)}}
{{        at}}
{{org.apache.solr.common.util.FastInputStream.peek(FastInputStream.java:60)}}
{{        at}}
{{org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:107)}}
{{        at}}
{{org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:55)}}
{{        at}}
{{org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)}}

> DIH on SolrCloud more than 5 mins causes TimeoutException: Idle timeout 
> expired: 30/30 ms
> -
>
> Key: SOLR-13735
> URL: https://issues.apache.org/jira/browse/SOLR-13735
> Project: Solr
>  Issue Type: Sub-task
>  Components: contrib - DataImportHandler
>Reporter: Mikhail Khludnev
>Priority: Minor
>
> see mail thread linked.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13735) DIH on SolrCloud more than 5 mins causes TimeoutException: Idle timeout expired: 300000/300000 ms

2019-09-03 Thread Mikhail Khludnev (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921417#comment-16921417
 ] 

Mikhail Khludnev commented on SOLR-13735:
-

SOLR-9908 has a test stub to start with. 

> DIH on SolrCloud more than 5 mins causes TimeoutException: Idle timeout 
> expired: 30/30 ms
> -
>
> Key: SOLR-13735
> URL: https://issues.apache.org/jira/browse/SOLR-13735
> Project: Solr
>  Issue Type: Sub-task
>  Components: contrib - DataImportHandler
>Reporter: Mikhail Khludnev
>Priority: Minor
>
> see mail thread linked.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-13735) DIH on SolrCloud more than 5 mins causes TimeoutException: Idle timeout expired: 300000/300000 ms

2019-09-03 Thread Mikhail Khludnev (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-13735:

Description: see mail thread linked.

> DIH on SolrCloud more than 5 mins causes TimeoutException: Idle timeout 
> expired: 30/30 ms
> -
>
> Key: SOLR-13735
> URL: https://issues.apache.org/jira/browse/SOLR-13735
> Project: Solr
>  Issue Type: Sub-task
>  Components: contrib - DataImportHandler
>Reporter: Mikhail Khludnev
>Priority: Minor
>
> see mail thread linked.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-13735) DIH on SolrCloud more than 5 mins causes TimeoutException: Idle timeout expired: 300000/300000 ms

2019-09-03 Thread Mikhail Khludnev (Jira)
Mikhail Khludnev created SOLR-13735:
---

 Summary: DIH on SolrCloud more than 5 mins causes 
TimeoutException: Idle timeout expired: 30/30 ms
 Key: SOLR-13735
 URL: https://issues.apache.org/jira/browse/SOLR-13735
 Project: Solr
  Issue Type: Sub-task
  Components: contrib - DataImportHandler
Reporter: Mikhail Khludnev






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5498) Allow DIH to report its state to ZooKeeper

2019-09-03 Thread Mikhail Khludnev (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921280#comment-16921280
 ] 

Mikhail Khludnev commented on SOLR-5498:


Isn't it covered by ZkPropertiesWriter? 

> Allow DIH to report its state to ZooKeeper
> --
>
> Key: SOLR-5498
> URL: https://issues.apache.org/jira/browse/SOLR-5498
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Affects Versions: 4.5
>Reporter: Rafał Kuć
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 4.9, 6.0
>
> Attachments: SOLR-5498.patch, SOLR-5498_version.patch
>
>
> I thought it may be good to be able for DIH to be fully controllable by Solr 
> in SolrCloud. So when once instance fails another could be automatically 
> started and so on. This issue is the first small step there - it makes 
> SolrCloud report DIH state to ZooKeeper once it is started and remove its 
> state once it is stopped or indexing job failed. In non-cloud mode that 
> functionality is not used. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-13720) Impossible to create effective ToParenBlockJoinQuery in custom QParser

2019-08-29 Thread Mikhail Khludnev (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev reassigned SOLR-13720:
---

Assignee: Mikhail Khludnev

> Impossible to create effective ToParenBlockJoinQuery in custom QParser
> --
>
> Key: SOLR-13720
> URL: https://issues.apache.org/jira/browse/SOLR-13720
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 8.2
>Reporter: Stanislav Livotov
>    Assignee: Mikhail Khludnev
>Priority: Minor
>  Labels: noob
> Fix For: 8.3
>
> Attachments: SOLR-13720.patch
>
>
> According to Solr [ducumentation|#SolrPlugins-QParserPlugin]  QParser is 
> treated as a legal plugin.
>  
> However, it is impossible to create an effective ToParentBlockJoin query 
> without copy-pasting(BitDocIdSetFilterWrapper class and getCachedFilter 
> method from BlockJoinParentQParser) or dirty hacks(like creating 
> org.apache.solr.search.join package with some accessor method to 
> package-private methods in plugin code and adding it in WEB-INF/lib directory 
> in order to be loaded by the same ClassLoader).
> I don't see a truly clean way how to fix it, but at least we can help custom 
> plugin developers to create it a little bit easier by making 
> BlockJoinParentQParser#getCachedFilter public and 
> BlockJoinParentQParser#BitDocIdSetFilterWrapper and providing getter for 
> BitDocIdSetFilterWrapper#filter. 
>  
>  
> In order to create 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-13720) Impossible to create effective ToParenBlockJoinQuery in custom QParser

2019-08-29 Thread Mikhail Khludnev (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-13720:

Description: 
According to Solr [documentation|#SolrPlugins-QParserPlugin]  QParser is 
treated as a legal plugin.

 

However, it is impossible to create an effective ToParentBlockJoin query 
without copy-pasting(BitDocIdSetFilterWrapper class and getCachedFilter method 
from BlockJoinParentQParser) or dirty hacks(like creating 
org.apache.solr.search.join package with some accessor method to 
package-private methods in plugin code and adding it in WEB-INF/lib directory 
in order to be loaded by the same ClassLoader).

I don't see a truly clean way how to fix it, but at least we can help custom 
plugin developers to create it a little bit easier by making 
BlockJoinParentQParser#getCachedFilter public and 
BlockJoinParentQParser#BitDocIdSetFilterWrapper and providing getter for 
BitDocIdSetFilterWrapper#filter. 

 

 

In order to create 

  was:
According to Solr [ducumentation|#SolrPlugins-QParserPlugin]  QParser is 
treated as a legal plugin.

 

However, it is impossible to create an effective ToParentBlockJoin query 
without copy-pasting(BitDocIdSetFilterWrapper class and getCachedFilter method 
from BlockJoinParentQParser) or dirty hacks(like creating 
org.apache.solr.search.join package with some accessor method to 
package-private methods in plugin code and adding it in WEB-INF/lib directory 
in order to be loaded by the same ClassLoader).

I don't see a truly clean way how to fix it, but at least we can help custom 
plugin developers to create it a little bit easier by making 
BlockJoinParentQParser#getCachedFilter public and 
BlockJoinParentQParser#BitDocIdSetFilterWrapper and providing getter for 
BitDocIdSetFilterWrapper#filter. 

 

 

In order to create 


> Impossible to create effective ToParenBlockJoinQuery in custom QParser
> --
>
> Key: SOLR-13720
> URL: https://issues.apache.org/jira/browse/SOLR-13720
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 8.2
>Reporter: Stanislav Livotov
>    Assignee: Mikhail Khludnev
>Priority: Minor
>  Labels: noob
> Fix For: 8.3
>
> Attachments: SOLR-13720.patch
>
>
> According to Solr [documentation|#SolrPlugins-QParserPlugin]  QParser is 
> treated as a legal plugin.
>  
> However, it is impossible to create an effective ToParentBlockJoin query 
> without copy-pasting(BitDocIdSetFilterWrapper class and getCachedFilter 
> method from BlockJoinParentQParser) or dirty hacks(like creating 
> org.apache.solr.search.join package with some accessor method to 
> package-private methods in plugin code and adding it in WEB-INF/lib directory 
> in order to be loaded by the same ClassLoader).
> I don't see a truly clean way how to fix it, but at least we can help custom 
> plugin developers to create it a little bit easier by making 
> BlockJoinParentQParser#getCachedFilter public and 
> BlockJoinParentQParser#BitDocIdSetFilterWrapper and providing getter for 
> BitDocIdSetFilterWrapper#filter. 
>  
>  
> In order to create 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-13720) Impossible to create effective ToParenBlockJoinQuery in custom QParser

2019-08-29 Thread Mikhail Khludnev (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-13720:

Component/s: query parsers

> Impossible to create effective ToParenBlockJoinQuery in custom QParser
> --
>
> Key: SOLR-13720
> URL: https://issues.apache.org/jira/browse/SOLR-13720
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 8.2
>Reporter: Stanislav Livotov
>Priority: Minor
> Fix For: 8.3
>
> Attachments: SOLR-13720.patch
>
>
> According to Solr [ducumentation|#SolrPlugins-QParserPlugin]  QParser is 
> treated as a legal plugin.
>  
> However, it is impossible to create an effective ToParentBlockJoin query 
> without copy-pasting(BitDocIdSetFilterWrapper class and getCachedFilter 
> method from BlockJoinParentQParser) or dirty hacks(like creating 
> org.apache.solr.search.join package with some accessor method to 
> package-private methods in plugin code and adding it in WEB-INF/lib directory 
> in order to be loaded by the same ClassLoader).
> I don't see a truly clean way how to fix it, but at least we can help custom 
> plugin developers to create it a little bit easier by making 
> BlockJoinParentQParser#getCachedFilter public and 
> BlockJoinParentQParser#BitDocIdSetFilterWrapper and providing getter for 
> BitDocIdSetFilterWrapper#filter. 
>  
>  
> In order to create 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-13720) Impossible to create effective ToParenBlockJoinQuery in custom QParser

2019-08-29 Thread Mikhail Khludnev (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-13720:

Labels: noob  (was: )

> Impossible to create effective ToParenBlockJoinQuery in custom QParser
> --
>
> Key: SOLR-13720
> URL: https://issues.apache.org/jira/browse/SOLR-13720
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 8.2
>Reporter: Stanislav Livotov
>Priority: Minor
>  Labels: noob
> Fix For: 8.3
>
> Attachments: SOLR-13720.patch
>
>
> According to Solr [ducumentation|#SolrPlugins-QParserPlugin]  QParser is 
> treated as a legal plugin.
>  
> However, it is impossible to create an effective ToParentBlockJoin query 
> without copy-pasting(BitDocIdSetFilterWrapper class and getCachedFilter 
> method from BlockJoinParentQParser) or dirty hacks(like creating 
> org.apache.solr.search.join package with some accessor method to 
> package-private methods in plugin code and adding it in WEB-INF/lib directory 
> in order to be loaded by the same ClassLoader).
> I don't see a truly clean way how to fix it, but at least we can help custom 
> plugin developers to create it a little bit easier by making 
> BlockJoinParentQParser#getCachedFilter public and 
> BlockJoinParentQParser#BitDocIdSetFilterWrapper and providing getter for 
> BitDocIdSetFilterWrapper#filter. 
>  
>  
> In order to create 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   3   4   5   6   7   8   9   10   >