Re: [JENKINS] Lucene » Lucene-Check-9.2 - Build # 65 - Unstable!

2022-05-26 Thread Greg Miller
I was able to repro this locally and will try to see if I can put
together a fix in the next couple days (or will at least create a Jira
to track once I figure out what's going on).

Cheers,
-g

On Mon, May 23, 2022 at 4:02 AM Apache Jenkins Server
 wrote:
>
> Build: https://ci-builds.apache.org/job/Lucene/job/Lucene-Check-9.2/65/
>
> 1 tests failed.
> FAILED:  org.apache.lucene.search.grouping.TestGroupFacetCollector.testRandom
>
> Error Message:
> java.lang.IndexOutOfBoundsException
>
> Stack Trace:
> java.lang.IndexOutOfBoundsException
> at 
> __randomizedtesting.SeedInfo.seed([91EC8BE9DE2A5BAB:E3A0AEE66F4AEDD8]:0)
> at java.base/java.nio.Buffer.checkBounds(Buffer.java:714)
> at java.base/java.nio.HeapByteBuffer.get(HeapByteBuffer.java:179)
> at 
> org.apache.lucene.core@9.2.0-SNAPSHOT/org.apache.lucene.store.ByteBuffersDataInput.readBytes(ByteBuffersDataInput.java:155)
> at 
> org.apache.lucene.core@9.2.0-SNAPSHOT/org.apache.lucene.store.ByteBuffersIndexInput.readBytes(ByteBuffersIndexInput.java:85)
> at 
> org.apache.lucene.test_framework@9.2.0-SNAPSHOT/org.apache.lucene.tests.store.MockIndexInputWrapper.readBytes(MockIndexInputWrapper.java:149)
> at 
> org.apache.lucene.core@9.2.0-SNAPSHOT/org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$TermsDict.decompressBlock(Lucene90DocValuesProducer.java:1234)
> at 
> org.apache.lucene.core@9.2.0-SNAPSHOT/org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$TermsDict.next(Lucene90DocValuesProducer.java:1092)
> at 
> org.apache.lucene.search.grouping.TermGroupFacetCollector$MV$SegmentResult.nextTerm(TermGroupFacetCollector.java:438)
> at 
> org.apache.lucene.search.grouping.GroupFacetCollector.mergeSegmentResults(GroupFacetCollector.java:97)
> at 
> org.apache.lucene.search.grouping.TestGroupFacetCollector.testRandom(TestGroupFacetCollector.java:429)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
> at 
> org.apache.lucene.test_framework@9.2.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
> at 
> org.apache.lucene.test_framework@9.2.0-SNAPSHOT/org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at 
> org.apache.lucene.test_framework@9.2.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
> at 
> org.apache.lucene.test_framework@9.2.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at 
> org.apache.lucene.test_framework@9.2.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at junit@4.13.1/org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
> at 
> randomizedtesting.runner@2.7.6/com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
> at 
> 

Re: Adding a new PointDocValuesField

2022-05-26 Thread Greg Miller
> Users don't deal with low level docvalues codec APIs, so I see this
"as a user" as irrelevant, sorry. Higher-level classes (e.g. Field
class) could impl it this way as implementation detail.

Hmm, that's a different perspective than I had, but I understand where
you're coming from and I think I agree. I think I'm so used to
directly interacting with doc values that I haven't considered this
point-of-view (that users should really commonly be interacting with
DVs). As long as we provide a higher-level Field class that abstracts
the implementation details, I think I'm on the same page with you
here.

> +1 to build a field class in sandbox, using BDV behind the scenes. I
don't want to add any new DV types, trust me. I am just especially
opinionated against multidimensional stuff pushed down to docvalues
level, when it makes no sense from a DV perspective (column stride
fields). If you have 3 dimensions of numbers, at a low level it would
just  make 3 columns at the end of the day anyway: IMO it would only
make codec code more complicated with no benefit. So that's why I was
listing out other alternatives.

Got it. +1 from me as well. I think we're in agreement. Thanks for the
discussion!

Cheers,
-g

On Thu, May 26, 2022 at 9:04 AM Robert Muir  wrote:
>
> On Thu, May 26, 2022 at 11:49 AM Greg Miller  wrote:
> >
> > I agree that technically it's just as good. I also think it's less
> > clear for a user. The concept of "points" is something we've
> > established in Lucene, so I think it makes sense for users to think
> > about indexing points as a doc value as opposed to having to manage
> > multiple fields for all their dimensions in this sort of unsorted
> > field. But that's just my opinion as a user. But that's maybe a bit
> > philosophical at this point and I think we can "agree to disagree" for
> > now because...
>
> Users don't deal with low level docvalues codec APIs, so I see this
> "as a user" as irrelevant, sorry. Higher-level classes (e.g. Field
> class) could impl it this way as implementation detail.
>
> >
> > ... just to be clear, I'm _not_ suggesting we add a new doc value type
> > at this time. I'm not even necessarily advocating that we ever add it.
> > I think it's perfectly reasonable to define a new Field class that
> > builds on top of BDV (as Marc has done in his PR) that allows users to
> > add "point" fields to their documents that get indexed as doc values
> > (using BDV). This is very similar to LatLonDocValuesField,
> > LongRangeDocValuesField, etc. Is that an acceptable approach to you,
> > or are you advocating that we shouldn't do that and should instead
> > create these new "unsorted" numeric fields now? I'm even fine if we
> > put this in the sandbox module for now while we "kick the tires." In
> > fact, I think I'd advocate for that.
>
> +1 to build a field class in sandbox, using BDV behind the scenes. I
> don't want to add any new DV types, trust me. I am just especially
> opinionated against multidimensional stuff pushed down to docvalues
> level, when it makes no sense from a DV perspective (column stride
> fields). If you have 3 dimensions of numbers, at a low level it would
> just  make 3 columns at the end of the day anyway: IMO it would only
> make codec code more complicated with no benefit. So that's why I was
> listing out other alternatives.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Adding a new PointDocValuesField

2022-05-26 Thread Robert Muir
On Thu, May 26, 2022 at 11:49 AM Greg Miller  wrote:
>
> I agree that technically it's just as good. I also think it's less
> clear for a user. The concept of "points" is something we've
> established in Lucene, so I think it makes sense for users to think
> about indexing points as a doc value as opposed to having to manage
> multiple fields for all their dimensions in this sort of unsorted
> field. But that's just my opinion as a user. But that's maybe a bit
> philosophical at this point and I think we can "agree to disagree" for
> now because...

Users don't deal with low level docvalues codec APIs, so I see this
"as a user" as irrelevant, sorry. Higher-level classes (e.g. Field
class) could impl it this way as implementation detail.

>
> ... just to be clear, I'm _not_ suggesting we add a new doc value type
> at this time. I'm not even necessarily advocating that we ever add it.
> I think it's perfectly reasonable to define a new Field class that
> builds on top of BDV (as Marc has done in his PR) that allows users to
> add "point" fields to their documents that get indexed as doc values
> (using BDV). This is very similar to LatLonDocValuesField,
> LongRangeDocValuesField, etc. Is that an acceptable approach to you,
> or are you advocating that we shouldn't do that and should instead
> create these new "unsorted" numeric fields now? I'm even fine if we
> put this in the sandbox module for now while we "kick the tires." In
> fact, I think I'd advocate for that.

+1 to build a field class in sandbox, using BDV behind the scenes. I
don't want to add any new DV types, trust me. I am just especially
opinionated against multidimensional stuff pushed down to docvalues
level, when it makes no sense from a DV perspective (column stride
fields). If you have 3 dimensions of numbers, at a low level it would
just  make 3 columns at the end of the day anyway: IMO it would only
make codec code more complicated with no benefit. So that's why I was
listing out other alternatives.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Adding a new PointDocValuesField

2022-05-26 Thread Greg Miller
I agree that technically it's just as good. I also think it's less
clear for a user. The concept of "points" is something we've
established in Lucene, so I think it makes sense for users to think
about indexing points as a doc value as opposed to having to manage
multiple fields for all their dimensions in this sort of unsorted
field. But that's just my opinion as a user. But that's maybe a bit
philosophical at this point and I think we can "agree to disagree" for
now because...

... just to be clear, I'm _not_ suggesting we add a new doc value type
at this time. I'm not even necessarily advocating that we ever add it.
I think it's perfectly reasonable to define a new Field class that
builds on top of BDV (as Marc has done in his PR) that allows users to
add "point" fields to their documents that get indexed as doc values
(using BDV). This is very similar to LatLonDocValuesField,
LongRangeDocValuesField, etc. Is that an acceptable approach to you,
or are you advocating that we shouldn't do that and should instead
create these new "unsorted" numeric fields now? I'm even fine if we
put this in the sandbox module for now while we "kick the tires." In
fact, I think I'd advocate for that.

Thanks again for the feedback. It forced a deep examination of this
idea, which I appreciate.

Cheers,
-g

On Wed, May 25, 2022 at 11:41 AM Robert Muir  wrote:
>
> On Wed, May 25, 2022 at 2:08 PM Greg Miller  wrote:
> >
> >
> > I guess with an “unsorted” numeric DV type we could get there with aligned 
> > indices, as you describe, but that seems less appealing than supporting 
> > multi-dim points directly.
> >
>
> Name one technical reason why?
> Unsorted would be exactly just as good, except also more general
> purpose. The number of docvalues types should be kept to a strict
> minimum, and should be generally useful to a variety of common
> use-cases. Each type has a huge maintenance cost, and never goes away.
> Every codec must implement every type.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [Spam] Please unsubscribe

2022-05-26 Thread Ladislav Macoun
Hi Martine,

I would recommend you to unsubscribe from the correct mailing list.

You can see the correct address in the mail headers, i.e. 
List-Unsubscribe: 

Best regards,
Ladislav

> On 26. 5. 2022, at 7:09, mwable  wrote:
> 
> Hi,
> 
> could someone please unsubscribe me? I tried unsubscribing through 
> java-user-unsubscr...@lucene.apache.org several times.
> 
> Thanks!
> 
> Martine


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org