Re: 8.7 Release Blockers

2020-10-26 Thread Adrien Grand
Are there any blockers left?

On Thu, Oct 22, 2020 at 11:14 PM Eric Pugh 
wrote:

> If we get down to just SOLR-14067 holding us up, I think it could be moved
> to the next release.  I’ve made a lot of progress, but it may still take a
> few more days to get to done done.
>
> On Oct 22, 2020, at 4:35 PM, Adrien Grand  wrote:
>
> Can someone help review this PR to get the above blocker resolved?
> https://github.com/apache/lucene-solr/pull/2019
>
> On Thu, Oct 22, 2020 at 4:11 PM Atri Sharma  wrote:
>
>> Reminder: This is still a blocker for 8.7:
>>
>> https://issues.apache.org/jira/browse/SOLR-14354
>>
>> On Tue, Oct 20, 2020 at 1:02 PM Atri Sharma  wrote:
>> >
>> > Hi All,
>> >
>> > Below are the issues marked as release blockers for 8.7. Can the
>> > owners please resolve or move at the earliest?
>> >
>> > https://issues.apache.org/jira/browse/SOLR-14354
>> > https://issues.apache.org/jira/browse/SOLR-13973
>> > https://issues.apache.org/jira/browse/SOLR-14067
>> >
>> > Atri
>> >
>> > --
>> > Regards,
>> >
>> > Atri
>> > Apache Concerted
>>
>>
>>
>> --
>> Regards,
>>
>> Atri
>> Apache Concerted
>>
>> --
>> Regards,
>>
>> Atri
>> Apache Concerted
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>
> --
> Adrien
>
>
> ___
> *Eric Pugh **| *Founder & CEO | OpenSource Connections, LLC | 434.466.1467
> | http://www.opensourceconnections.com | My Free/Busy
> 
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed
> 
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless
> of whether attachments are marked as such.
>
>

-- 
Adrien


[Learning To Rank] Interleaving Support

2020-10-26 Thread Alessandro Benedetti
Hi all,
I have implemented the support for Interleaving in Learning To Rank few
months ago,
the Pull Request has been open to review for a while now:

https://issues.apache.org/jira/browse/SOLR-14560

https://github.com/apache/lucene-solr/pull/1571

Christine has been very kind with her input and I really appreciate that.

I am going to commit the changes soon unless anyone objects/provide some
additional review.
I'll wait one week more and then I'll proceed.

Cheers
--
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
www.sease.io


BadApple report

2020-10-26 Thread Erick Erickson
Still working through the failures on the reference impl, so AFAIK, the tests 
failing large percentages of the time are on that branch.

Processing file (History bit 3): HOSS-2020-10-26.csv
Processing file (History bit 2): HOSS-2020-10-19.csv
Processing file (History bit 1): HOSS-2020-10-12.csv
Processing file (History bit 0): HOSS-2020-10-05.csv


Number of AwaitsFix: 31 Number of BadApples: 3


**Annotated tests that didn't fail in the last 4 weeks.

  **Tests removed from the next two lists because they were specified in 
'doNotEnable' in the properties file
 MoveReplicaHDFSTest.testNormalFailedMove

  **Annotations can be removed from the following tests because they haven't 
failed in the last 4 rollups.

  **Methods: 0


Raw fail count by week totals, most recent week first (corresponds to bits):
Week: 0  had  150 failures
Week: 1  had  174 failures
Week: 2  had  142 failures
Week: 3  had  153 failures


Failures in Hoss' reports in every one of the last 4 rollups.

There were 397 unannotated tests that failed in Hoss' rollups. Ordered by the 
date I downloaded the rollup file, newest->oldest. See above for the dates the 
files were collected 
These tests were NOT BadApple'd or AwaitsFix'd

Failures in the last 4 reports..
   Report   Pct runsfails   test
 0123 100.0   35 35  AssignTest.classMethod
 0123 100.0  255255  
AsyncCallRequestStatusResponseTest.classMethod
 0123   0.8 1916 13  CachingDirectoryFactoryTest.stressTest
 0123 100.0  155155  CollectionsAPIDistributedZkTest.classMethod
 0123   3.8 1960 51  
CollectionsAPIDistributedZkTest.testBadActionNames
 0123   3.8 1960 51  
CollectionsAPIDistributedZkTest.testMissingNumShards
 0123   3.8 1960 51  
CollectionsAPIDistributedZkTest.testMissingRequiredParameters
 0123   3.8 1960 51  
CollectionsAPIDistributedZkTest.testNoConfigSetExist
 0123   3.8 1960 51  
CollectionsAPIDistributedZkTest.testZeroNumShards
 0123 100.0  205205  CollectionsAPISolrJTest.classMethod
 0123 100.0  250250  
ConcurrentUpdateSolrClientMultiCollectionTest.classMethod
 0123 100.0  205205  DeleteNodeTest.classMethod
 0123   1.8 1565 56  HttpPartitionOnCommitTest.test
 0123   1.1 1546 33  HttpPartitionTest.test
 0123 100.0  250250  
JsonRequestApiHeatmapFacetingTest.classMethod
 0123 100.0  250250  JsonRequestApiTest.classMethod
 0123   2.2 1576 53  MultiThreadedOCPTest.test
 0123 100.0  205205  OverseerModifyCollectionTest.classMethod
 0123   1.7  988 11  TestCircuitBreaker.testResponseWithCBTiming
 0123   0.7 1521  6  
TestCustomStream.testDynamicLoadingCustomStream
 0123   1.3 1256 25  TestHdfsCloudBackupRestore.test
 0123   1.1 1509 26  TestLocalFSCloudBackupRestore.test
 0123   1.4 1513 25  TestPackages.testPluginLoading
 0123  13.7 1982233  
TestSTUniformSplitPostingFormat.testCheckIntegrityReadsAllBytes
 0123  26.0 1981451  TestSynonymFilterFactory.testFormat
 0123  26.0 1981451  TestSynonymFilterFactory.testSynonyms
 0123  26.1 1983452  TestSysoutsLimits.OverHardLimit
 0123  26.1 1983452  TestSysoutsLimits.testOverSoftLimit
 0123   0.4 1519  6  TestSystemCollAutoCreate.testAutoCreate
 0123  13.7 1982233  
TestUniformSplitPostingFormat.testCheckIntegrityReadsAllBytes
 0123 100.0  250250  UsingSolrJRefGuideExamplesTest.classMethod
 0123 100.0  250250  ZkConfigFilesTest.classMethod


DO NOT ENABLE LIST:
MoveReplicaHDFSTest.testFailedMove
MoveReplicaHDFSTest.testNormalFailedMove
TestControlledRealTimeReopenThread.testCRTReopen
TestICUNormalizer2CharFilter.testRandomStrings
TestICUTokenizerCJK
TestImpersonationWithHadoopAuth.testForwarding
TestLTRReRankingPipeline.testDifferentTopN
TestRandomChains


DO NOT ANNOTATE LIST
CdcrBidirectionalTest.testBiDir
IndexSizeTriggerTest.testMergeIntegration
IndexSizeTriggerTest.testMixedBounds
IndexSizeTriggerTest.testSplitIntegration
IndexSizeTriggerTest.testTrigger
InfixSuggestersTest.testShutdownDuringBuild
ShardSplitTest.test
ShardSplitTest.testSplitMixedReplicaTypes
ShardSplitTest.testSplitWithChaosMonkey
Test2BPostings.test
TestLatLonShapeQueries.testRandomBig
TestPackedInts.testPackedLongValues
TestRandomChains.testRandomChainsWithLargeStrings
TestTriggerIntegration.testSearchRate

SuppressWarnings count: last week: 4,484, this week: 4,484, delta 0


*** Files with increased @SuppressWarnings annotations:

Suppress count increase in: 

Re: Payloads for each term

2020-10-26 Thread Bruno Roustant
Hi Ankur,
Indeed payloads are the standard way to solve this problem. For light
queries with a few top N results that should be efficient. For multi-term
queries that could become penalizing if you need to access the payloads of
too many terms.
Also, there is an experimental PostingsFormat called
SharedTermsUniformSplit (class named STUniformSplitPostingsFormat) that
would allow you to effectively share the overlapping terms in the index
while having 50 fields. This would solve the index bloat issue, but would
not fully solve the seeks issue. You might want to benchmark this approach
too.

Bruno

Le ven. 23 oct. 2020 à 02:48, Ankur Goel  a écrit :

> Hi Lucene Devs,
>I have a need to store a sparse feature vector on a per term
> basis. The total number of possible dimensions are small (~50) and known at
> indexing time. The feature values will be used in scoring along with corpus
> statistics. It looks like payloads
>  were
> created for this exact same purpose but some workaround is needed to
> minimize the performance penalty as mentioned on the wiki
>  .
>
> An alternative is to override *term frequency* to be a *pointer* in a 
> *Map Feature_Vector>* serialized and stored in *BinaryDocValues*. At query
> time, the matching *docId *will be used to advance the pointer to the
> starting offset of this map*. *The term frequency will be used to perform
> lookup into the serialized map to retrieve the* Feature_Vector. *That's
> my current plan but I haven't benchmarked it.
>
> The problem that I am trying to solve is to *reduce the index bloat* and
> *eliminate* *unnecessary seeks* as currently these ~50 dimensions are
> stored as separate fields in the index with very high term overlap and
> Lucene does not share Terms dictionary across different fields. This itself
> can be a new feature for Lucene but will reqiure lots of work I imagine.
>
> Any ideas are welcome :-)
>
> Thanks
> -Ankur
>


Re: two 8x branches?

2020-10-26 Thread Atri Sharma
And, done.

On Sun, Oct 25, 2020 at 10:30 PM Atri Sharma  wrote:
>
> Yes, I will remove it tomorrow morning my time.
>
> On Sun, 25 Oct 2020 at 10:16 PM, Erick Erickson  
> wrote:
>>
>> We have two 8x branches,
>>
>>
>> origin/branch_8_x
>> origin/branch_8x
>>
>> Can someone remove/rename branch_8_x? I’ve managed to completely screw up my 
>> fork when trying to delete branches, so I’m not feeling confident when it 
>> comes to deleting branches on master...
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
> --
> Regards,
>
> Atri
> Apache Concerted



-- 
Regards,

Atri
Apache Concerted

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Fixing 8.7 Section In solr/CHANGES.txt

2020-10-26 Thread Atri Sharma
Hi All,

I have raised a pull request to fix the merge issues in the 8.7
section of branch_8_7.

Please review asap and edit in the PR if any issues are seen (or let
me know and I will fix them).

https://github.com/apache/lucene-solr/pull/2030

-- 
Regards,

Atri
Apache Concerted

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org