Re: Recent and upcoming deprecations

2020-07-17 Thread Gus Heck
Deprecation announces an intention to remove. One of the main reasons given
in the jira tickets I saw for deprecation sooner rather than later, is to
ensure discussions happen and replacements, migrations and I assume even
possibly decisions not to deprecate after all can be well sorted out and
the community well prepared before action is required. So the goal as I've
understood it (I'm only a committer not a PMC member, so this is just my
opinion/observation) is precisely to create some "splash" get people's
attention and make sure this is discussed thoroughly. Definitely feel free
to contribute your thoughts and concerns in a constructive manner. This is
a community and your voice matters, and it's open source so you also have
the power to contribute.

On Fri, Jul 17, 2020 at 2:56 AM Bernd Fehling <
bernd.fehl...@uni-bielefeld.de> wrote:

> At first glance I see many Deprecations but also many TBD at Package
> location. :-(
>
> To understand this right, you kicked out the code and now waiting
> for the community to take over and reinvent the wheel?
>
> Or are there any recent plans of the PMC members to at least start
> Package locations on github and then let the community take over?
>
> Or what?
>
> I'm following this list for many years but where were the discussions
> about these decisions?
>
> Regards
> Bernd
>
>
> Am 17.07.20 um 07:47 schrieb Ishan Chattopadhyaya:
> > Hi Solr Users,
> > Here is a list of recent and upcoming deprecations in Solr 8.x.
> > https://cwiki.apache.org/confluence/display/SOLR/Deprecations
> >
> > Please feel free to chime in if you have any questions. You can comment
> > here or in the specific JIRA issues.
> >
> > Thanks and regards,
> > Ishan Chattopadhyaya
> >
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: AtomicUpdate on SolrCloud is not working

2020-07-17 Thread Issei Nishigata
I have the same problem in my Solr8.
I think it's because in the first way,
TrimFieldUpdateProcessorFactory and RemoveBlankFieldUpdateProcessorFactory
is not taking effect.

On SolrCloud, TrimFieldUpdateProcessorFactory,
RemoveBlankFieldUpdateProcessorFactory and other processors
only run on the first node that receives an update request.
Consequently, it's necessary to execute TrimFieldUpdateProcessorFactory and
RemoveBlankFieldUpdateProcessorFactory
after giving the document to the replica node using the
DistributedUpdateProcessor,
so we need to use the second way that he described otherwise it won't
operate properly.

But even with this way, both I and he are worried whether it will be cause
of SOLR-8030.
I also want to know about this, does anyone have any comment about this?


Best,
Issei

2020年7月17日(金) 18:34 Jörn Franke :

> What does „not work correctly mean“?
>
> Have you checked that all fields are stored or doc values?
>
> > Am 17.07.2020 um 11:26 schrieb yo tomi :
> >
> > Hi All
> >
> > Sorry, above settings are contrary with each other.
> > Actually, following setting does not work properly.
> > ---
> > 
> > 
> > 
> > 
> > 
> > 
> > ---
> > And follows is working as expected.
> > ---
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > ---
> >
> > Thanks,
> > Yoshiaki
> >
> >
> > 2020年7月17日(金) 16:32 yo tomi :
> >
> >> Hi, All
> >> When I did AtomicUpdate on SolrCloud by the following setting, it does
> not work properly.
> >>
> >> ---
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> ---
> >> When changed as follows and made it work, it became as expected.
> >> ---
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> ---
> >> The later setting and the way of using post-processor could make the
> same result, I though,
> >> but using post-processor, bug of SOLR-8030 makes me not feel like using
> it.
> >> By the latter setting even, is there any possibility of SOLR-8030 to
> become? Seeing the source code, tlog which is from leader comes to Replica
> seems to be processed correctly with UpdateRequestProcessor,
> >> the latter setting had not been the right one for the bug, I
> though.Anyone knows the most appropriate way to configure AtomicUpdate on
> SolrCloud?
> >>
> >> Thanks,
> >> Yoshiaki
> >>
> >>
>


Re: CDCR stress-test issues

2020-07-17 Thread Jörn Franke
Instead of CDCR you may simply duplicate the pipeline across both data centers. 
Then there is no need at each step of the pipeline to replicate (storage to 
storage, index to index etc.).
Instead both pipelines run in different data centers in parallel.

> Am 24.06.2020 um 15:46 schrieb Oakley, Craig (NIH/NLM/NCBI) [C] 
> :
> 
> In attempting to stress-test CDCR (running Solr 7.4), I am running into a 
> couple of issues.
> 
> One is that the tlog files keep accumulating for some nodes in the CDCR 
> system, particularly for the non-Leader nodes in the Source SolrCloud. No 
> quantity of hard commits seem to cause any of these tlog files to be 
> released. This can become a problem upon reboot if there are hundreds of 
> thousands of tlog files, and Solr fails to start (complaining that there are 
> too many open files).
> 
> The tlogs had been accumulating on all the nodes of the CDCR set of 
> SolrClouds until I added these two lines to the solrconfig.xml file (for 
> testing purposes, using numbers much lower than in the examples):
> 5
> 2
> Since then, it is mostly the non-Leader nodes of the Source SolrCloud which 
> accumulates tlog files (the Target SolrCloud does seem to have a tendency to 
> clean up the tlog files, as does the Leader of the Source SolrCloud). If I 
> use ADDREPLICAPROP and REBALANCELEADERS to change which node is the Leader, 
> and if I then start adding more data, the tlogs on the new Leader sometimes 
> will go away, but then the old Leader begins accumulating tlog files. I am 
> dubious whether frequent reassignment of Leadership would be a practical 
> solution.
> 
> I also have several times attempted to simulate a production environment by 
> running several loops simultaneously, each of which inserts multiple records 
> on each iteration of the loop. Several times, I end up with a dozen records 
> on (both replicas of) the Source which never make it to (either replica of) 
> the Target. The Target has thousands of records which were inserted before 
> the missing records, and thousands of records which were inserted after the 
> missing records (and all these records, the replicated and the missing, were 
> inserted by curl commands which only differed in sequential numbers 
> incorporated into the values being inserted).
> 
> I also have a question regarding SOLR-13141: the 11/Feb/19 comment says that 
> the fix for Solr 7.3 had a problem; and the header says "Affects Version/s: 
> 7.5, 7.6": does that indicate that Solr 7.4 is not affected?
> 
> Are  there any suggestions?
> 
> Thanks


RE: CDCR stress-test issues

2020-07-17 Thread Oakley, Craig (NIH/NLM/NCBI) [C]
Yes, I saw that yesterday.

I guess that I was not the only one who noticed the unreliability after all.

-Original Message-
From: Ishan Chattopadhyaya  
Sent: Friday, July 17, 2020 1:17 AM
To: solr-user 
Subject: Re: CDCR stress-test issues

FYI, CDCR support, as it exists in Solr today, has been deprecated in 8.6.
It suffers from serious design flaws and it allows such things to happen
that you observe. While there may be workarounds, it is advisable to not
rely on CDCR in production.

Thanks,
Ishan

On Thu, 2 Jul, 2020, 1:12 am Oakley, Craig (NIH/NLM/NCBI) [C],
 wrote:

> For the record, it is not just Solr7.4 which has the problem. When I start
> afresh with Solr8.5.2, both symptoms persist.
>
> With Solr8.5.2, tlogs accumulate endlessly at the non-Leader nodes of the
> Source SolrCloud and are never released regardless of maxNumLogsToKeep
> setting
>
> And with Solr8.5.2, if four scripts run simultaneously for a few minutes,
> each script running a loop each iteration of which adds batches of 6
> records to the Source SolrCloud, a couple dozen records wind up on the
> Source without ever arriving at the Target SolrCloud (although the Target
> does have records which were added after the missing records).
>
> Does anyone yet have any suggestion how to get CDCR to work properly?
>
>
> -Original Message-
> From: Oakley, Craig (NIH/NLM/NCBI) [C] 
> Sent: Wednesday, June 24, 2020 9:46 AM
> To: solr-user@lucene.apache.org
> Subject: CDCR stress-test issues
>
> In attempting to stress-test CDCR (running Solr 7.4), I am running into a
> couple of issues.
>
> One is that the tlog files keep accumulating for some nodes in the CDCR
> system, particularly for the non-Leader nodes in the Source SolrCloud. No
> quantity of hard commits seem to cause any of these tlog files to be
> released. This can become a problem upon reboot if there are hundreds of
> thousands of tlog files, and Solr fails to start (complaining that there
> are too many open files).
>
> The tlogs had been accumulating on all the nodes of the CDCR set of
> SolrClouds until I added these two lines to the solrconfig.xml file (for
> testing purposes, using numbers much lower than in the examples):
> 5
> 2
> Since then, it is mostly the non-Leader nodes of the Source SolrCloud
> which accumulates tlog files (the Target SolrCloud does seem to have a
> tendency to clean up the tlog files, as does the Leader of the Source
> SolrCloud). If I use ADDREPLICAPROP and REBALANCELEADERS to change which
> node is the Leader, and if I then start adding more data, the tlogs on the
> new Leader sometimes will go away, but then the old Leader begins
> accumulating tlog files. I am dubious whether frequent reassignment of
> Leadership would be a practical solution.
>
> I also have several times attempted to simulate a production environment
> by running several loops simultaneously, each of which inserts multiple
> records on each iteration of the loop. Several times, I end up with a dozen
> records on (both replicas of) the Source which never make it to (either
> replica of) the Target. The Target has thousands of records which were
> inserted before the missing records, and thousands of records which were
> inserted after the missing records (and all these records, the replicated
> and the missing, were inserted by curl commands which only differed in
> sequential numbers incorporated into the values being inserted).
>
> I also have a question regarding SOLR-13141: the 11/Feb/19 comment says
> that the fix for Solr 7.3 had a problem; and the header says "Affects
> Version/s: 7.5, 7.6": does that indicate that Solr 7.4 is not affected?
>
> Are  there any suggestions?
>
> Thanks
>


AW: Elevation with distributed search causes NPE

2020-07-17 Thread Marc Linden
There it is: https://issues.apache.org/jira/browse/SOLR-14662. I'd love to dig 
deeper into that and provide a patch for it, but I'm fully occupied with our 
own project.

Thanks and best regards,
Marc

-Ursprüngliche Nachricht-
Von: Erick Erickson 
Gesendet: Donnerstag, 16. Juli 2020 15:06
An: solr-user@lucene.apache.org
Betreff: Re: Elevation with distributed search causes NPE

Can you raise a JIRA? If you’re ambitious, you can add a patch too ;)

> On Jul 16, 2020, at 2:52 AM, Marc Linden  
> wrote:
>
> Thanks Erick for your fast response.
>
> I've checked out adding the sort param and yes that vanished the problem but 
> it also disables elevation if I'm not mistaken. So after adding 
> forceElevation=true to the query then a ClassCastException was thrown:
>
> http://localhost:9983/solr/core1/select?q=elevatedTerm&lowercaseOperat
> ors=false&df=text_en&defType=edismax&fq=lang:en&shards=localhost:9983/
> solr/core1,localhost:9983/solr/core2&fl=[elevated],[shard],area,id&row
> s=10&start=0&sort=area%20asc&forceElevation=true
>
> java.lang.ClassCastException: java.lang.Integer cannot be cast to
> java.lang.String  at
> org.apache.solr.schema.FieldType.unmarshalStringSortValue(FieldType.ja
> va:1229)  at
> org.apache.solr.schema.StrField.unmarshalSortValue(StrField.java:122)
>  at
> org.apache.solr.handler.component.QueryComponent.unmarshalSortValues(Q
> ueryComponent.java:1092)  at
> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryCompone
> nt.java:917)  at
> org.apache.solr.handler.component.QueryComponent.handleRegularResponse
> s(QueryComponent.java:613)  at
> org.apache.solr.handler.component.QueryComponent.handleResponses(Query
> Component.java:592)  at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(Sear
> chHandler.java:431)  at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
> rBase.java:199)  at
> org.apache.solr.core.SolrCore.execute(SolrCore.java:2578)
>  at
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:780)
>  ...
>
> Best regards,
> Marc
>
> -Ursprüngliche Nachricht-
> Von: Erick Erickson 
> Gesendet: Mittwoch, 15. Juli 2020 14:32
> An: solr-user@lucene.apache.org
> Betreff: Re: Elevation with distributed search causes NPE
>
> Hmmm, looking at the code that line looks like this:
>
> sortSpec.getSort().getSort();
>
> I’m curious what happens if you specify a sort on the query? If that makes 
> the problem go away, it’s a smoking gun.
>
> Whether or not adding sorting makes the problem go away, this looks like 
> something that’s a legitimate JIRA, please go ahead and raise one.
>
> Best,
> Erick
>
>> On Jul 15, 2020, at 4:34 AM, Marc Linden  
>> wrote:
>>
>> Hi all,
>>
>> I'm facing the problem that Solr is throwing a NullPointerException when 
>> performing a distributed search with multiple shards having elevation 
>> configured where one or more shards do have elevated results but others do 
>> not.
>>
>> We are using Solr 8.2 and have the QueryElevationComponent configured with 
>> "last-components" of the default search handler "/select". But the problem 
>> also occurs when using the explicit "/elevate" search handler.
>>  ...
>> 
>> elevator
>> 
>> 
>> ...
>> >>
>>  > name="queryFieldType">string
>> elevate.xml
>> 
>>
>> ### Steps to reproduce:
>>
>> (1) Add entries to the elevate.xml of each core to elevate a specific 
>> document for the text "searchTerm":
>>
>>  core1:
>>  
>> ...
>>   
>>  core2:
>>  
>> ...
>>   
>>
>> (2) Execute query (we use port 9983):
>> http://localhost:9983/solr/web/select?q=elevatedTerm&lowercaseOperato
>> r
>> s=false&df=text_en&defType=edismax&fq=lang:en&shards=localhost:9983/s
>> o
>> lr/core1,localhost:9983/solr/core2&fl=[elevated],[shard],area,id&rows
>> =
>> 10&start=0
>>
>> Now as both shards have elevated documents for the requested "searchTerm" 
>> the search results are as expected:
>>
>> response: {
>> numFound: 5192,
>> start: 0,
>> maxScore: 1.9032197,
>> docs: [{
>> area: "press",
>> id: "core1docId1",
>> [elevated]: true,
>> [shard]: "localhost:9983/solr/core1"
>> }, {
>> area: "products",
>> id: "core2docId1",
>> [elevated]: true,
>> [shard]: "localhost:9983/solr/core2"
>> }, {
>> area: "press",
>> id: "core1docId2",
>> [elevated]: false,
>> [shard]: "localhost:9983/solr/core1"
>> },
>> ...
>>
>> (3) Remove the elevation entry for that "searchTerm" from one of the
>> cores, e.g. via comment
>>
>>  core2:
>>  
>> ...
>> 
>>  
>>
>>
>> (4) Reload the modified core:
>> http://localhost:9983/solr/admin/cores?action=RELOAD&core=core2
>>
>> (5) Request same query again and you get the NPE:
>>
>> error: {
>> trace: "java.lang.NullPointerException
>>  at 
>> org.apache.solr.handler.component.QueryComponent.unmarshalSortValues(QueryComponent.java:1068)
>>  at 
>> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:917)
>>  at 
>> org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryCo

Re: AtomicUpdate on SolrCloud is not working

2020-07-17 Thread Jörn Franke
What does „not work correctly mean“?

Have you checked that all fields are stored or doc values?

> Am 17.07.2020 um 11:26 schrieb yo tomi :
> 
> Hi All
> 
> Sorry, above settings are contrary with each other.
> Actually, following setting does not work properly.
> ---
> 
> 
> 
> 
> 
> 
> ---
> And follows is working as expected.
> ---
> 
> 
> 
> 
> 
> 
> 
> ---
> 
> Thanks,
> Yoshiaki
> 
> 
> 2020年7月17日(金) 16:32 yo tomi :
> 
>> Hi, All
>> When I did AtomicUpdate on SolrCloud by the following setting, it does not 
>> work properly.
>> 
>> ---
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> ---
>> When changed as follows and made it work, it became as expected.
>> ---
>> 
>> 
>> 
>> 
>> 
>> 
>> ---
>> The later setting and the way of using post-processor could make the same 
>> result, I though,
>> but using post-processor, bug of SOLR-8030 makes me not feel like using it.
>> By the latter setting even, is there any possibility of SOLR-8030 to become? 
>> Seeing the source code, tlog which is from leader comes to Replica seems to 
>> be processed correctly with UpdateRequestProcessor,
>> the latter setting had not been the right one for the bug, I though.Anyone 
>> knows the most appropriate way to configure AtomicUpdate on SolrCloud?
>> 
>> Thanks,
>> Yoshiaki
>> 
>> 


Re: AtomicUpdate on SolrCloud is not working

2020-07-17 Thread yo tomi
Hi All

Sorry, above settings are contrary with each other.
Actually, following setting does not work properly.
---

 
 
 
 

---
And follows is working as expected.
---

 
 
 
 
 

---

Thanks,
Yoshiaki


2020年7月17日(金) 16:32 yo tomi :

> Hi, All
> When I did AtomicUpdate on SolrCloud by the following setting, it does not 
> work properly.
>
> ---
> 
>  
>  
>  
>  
>  
> 
> ---
> When changed as follows and made it work, it became as expected.
> ---
> 
>  
>  
>  
>  
> 
> ---
> The later setting and the way of using post-processor could make the same 
> result, I though,
> but using post-processor, bug of SOLR-8030 makes me not feel like using it.
> By the latter setting even, is there any possibility of SOLR-8030 to become? 
> Seeing the source code, tlog which is from leader comes to Replica seems to 
> be processed correctly with UpdateRequestProcessor,
> the latter setting had not been the right one for the bug, I though.Anyone 
> knows the most appropriate way to configure AtomicUpdate on SolrCloud?
>
> Thanks,
> Yoshiaki
>
>


Re: [ANNOUNCE] Apache Solr 8.6.0 released

2020-07-17 Thread Noble Paul
matthew,
In the future too, it's still going to be
"curl solr:8983/collection/dataimport?blah"

It's just that, you will have to run 2 or 3 extra commands for once
before you run that curl command

On Fri, Jul 17, 2020 at 3:50 PM Ishan Chattopadhyaya
 wrote:
>
> For any further discussion on the deprecations, please find a thread
> "Recent and upcoming deprecations" [0] and we can discuss there.
> Thanks,
> Ishan
>
> [0] -
> https://www.mail-archive.com/solr-user@lucene.apache.org/msg151762.html
>
> On Fri, Jul 17, 2020 at 8:50 AM matthew sporleder 
> wrote:
>
> > I hear all of that and agree, obviously, but "curl
> > solr:8983/collection/dataimport?blah" in cron was *pretty freaking
> > easy* ;)
> >
> > Not sure why "pull" is elevated to "anti-pattern"; data is data is data
> >
> > On Thu, Jul 16, 2020 at 8:49 PM Ishan Chattopadhyaya
> >  wrote:
> > >
> > > Thanks Aroop for your feedback. We shall try to ensure continuity of
> > > functionality via packages. Your help in those efforts would be greatly
> > > appreciated as well. Let us take this discussion to SOLR-14660.
> > >
> > > > Is there a replacement for DIH?
> > > DIH is available as a community supported package. However, it is an
> > > anti-pattern for a search engine to be pulling data from outside.
> > Instead,
> > > please consider writing separate indexing programs that pull data from
> > the
> > > database systems and index into Solr. It is not only a good practice, but
> > > also more efficient in terms of throughput. For more information on this,
> > > please start another thread in solr-users@ list, and more people can
> > > suggest best alternatives here.
> > >
> > >
> > > On Fri, Jul 17, 2020 at 5:50 AM matthew sporleder 
> > > wrote:
> > >
> > > > Is there a replacement for DIH?
> > > >
> > > > On Wed, Jul 15, 2020 at 10:08 AM Ishan Chattopadhyaya
> > > >  wrote:
> > > > >
> > > > > Dear Solr Users,
> > > > >
> > > > > In this release (Solr 8.6), we have deprecated the following:
> > > > >
> > > > >   1. Data Import Handler
> > > > >
> > > > >   2. HDFS support
> > > > >
> > > > >   3. Cross Data Center Replication (CDCR)
> > > > >
> > > > >
> > > > >
> > > > > All of these are scheduled to be removed in a future 9.x release.
> > > > >
> > > > > It was decided that these components did not meet the standards of
> > > > quality
> > > > > and support that we wish to ensure for all components we ship. Some
> > of
> > > > > these also relied on design patterns that we no longer recommend for
> > use
> > > > in
> > > > > critical production environments.
> > > > >
> > > > > If you rely on these features, you are encouraged to try out
> > community
> > > > > supported versions of these, where available [0]. Where such
> > community
> > > > > support is not available, we encourage you to participate in the
> > > > migration
> > > > > of these components into community supported packages and help
> > continue
> > > > the
> > > > > development. We envision that using packages for these components via
> > > > > package manager will actually make it easier for users to use such
> > > > features.
> > > > >
> > > > > Regards,
> > > > >
> > > > > Ishan Chattopadhyaya
> > > > >
> > > > > (On behalf of the Apache Lucene/Solr PMC)
> > > > >
> > > > > [0] -
> > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/SOLR/Community+supported+packages+for+Solr
> > > > >
> > > > > On Wed, Jul 15, 2020 at 2:30 PM Bruno Roustant <
> > bruno.roust...@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > The Lucene PMC is pleased to announce the release of Apache Solr
> > 8.6.0.
> > > > > >
> > > > > >
> > > > > > Solr is the popular, blazing fast, open source NoSQL search
> > platform
> > > > from
> > > > > > the Apache Lucene project. Its major features include powerful
> > > > full-text
> > > > > > search, hit highlighting, faceted search, dynamic clustering,
> > database
> > > > > > integration, rich document handling, and geospatial search. Solr is
> > > > highly
> > > > > > scalable, providing fault tolerant distributed search and
> > indexing, and
> > > > > > powers the search and navigation features of many of the world's
> > > > largest
> > > > > > internet sites.
> > > > > >
> > > > > >
> > > > > > Solr 8.6.0 is available for immediate download at:
> > > > > >
> > > > > >
> > > > > >   
> > > > > >
> > > > > >
> > > > > > ### Solr 8.6.0 Release Highlights:
> > > > > >
> > > > > >
> > > > > >  * Cross-Collection Join Queries: Join queries can now work
> > > > > > cross-collection, even when shared or when spanning nodes.
> > > > > >
> > > > > >  * Search: Performance improvement for some types of queries when
> > exact
> > > > > > hit count isn't needed by using BlockMax WAND algorithm.
> > > > > >
> > > > > >  * Streaming Expression: Percentiles and standard deviation
> > > > aggregations
> > > > > > added to stats, facet and time series.  Streaming expressions
> > added to
> > > > > > /export

AtomicUpdate on SolrCloud is not working

2020-07-17 Thread yo tomi
Hi, All
When I did AtomicUpdate on SolrCloud by the following setting, it does
not work properly.

---

 
 
 
 
 

---
When changed as follows and made it work, it became as expected.
---

 
 
 
 

---
The later setting and the way of using post-processor could make the
same result, I though,
but using post-processor, bug of SOLR-8030 makes me not feel like using it.
By the latter setting even, is there any possibility of SOLR-8030 to
become? Seeing the source code, tlog which is from leader comes to
Replica seems to be processed correctly with UpdateRequestProcessor,
the latter setting had not been the right one for the bug, I
though.Anyone knows the most appropriate way to configure AtomicUpdate
on SolrCloud?

Thanks,
Yoshiaki