Re: [ANNOUNCE] Apache Solr 8.6.0 released

2020-07-16 Thread Ishan Chattopadhyaya
For any further discussion on the deprecations, please find a thread
"Recent and upcoming deprecations" [0] and we can discuss there.
Thanks,
Ishan

[0] -
https://www.mail-archive.com/solr-user@lucene.apache.org/msg151762.html

On Fri, Jul 17, 2020 at 8:50 AM matthew sporleder 
wrote:

> I hear all of that and agree, obviously, but "curl
> solr:8983/collection/dataimport?blah" in cron was *pretty freaking
> easy* ;)
>
> Not sure why "pull" is elevated to "anti-pattern"; data is data is data
>
> On Thu, Jul 16, 2020 at 8:49 PM Ishan Chattopadhyaya
>  wrote:
> >
> > Thanks Aroop for your feedback. We shall try to ensure continuity of
> > functionality via packages. Your help in those efforts would be greatly
> > appreciated as well. Let us take this discussion to SOLR-14660.
> >
> > > Is there a replacement for DIH?
> > DIH is available as a community supported package. However, it is an
> > anti-pattern for a search engine to be pulling data from outside.
> Instead,
> > please consider writing separate indexing programs that pull data from
> the
> > database systems and index into Solr. It is not only a good practice, but
> > also more efficient in terms of throughput. For more information on this,
> > please start another thread in solr-users@ list, and more people can
> > suggest best alternatives here.
> >
> >
> > On Fri, Jul 17, 2020 at 5:50 AM matthew sporleder 
> > wrote:
> >
> > > Is there a replacement for DIH?
> > >
> > > On Wed, Jul 15, 2020 at 10:08 AM Ishan Chattopadhyaya
> > >  wrote:
> > > >
> > > > Dear Solr Users,
> > > >
> > > > In this release (Solr 8.6), we have deprecated the following:
> > > >
> > > >   1. Data Import Handler
> > > >
> > > >   2. HDFS support
> > > >
> > > >   3. Cross Data Center Replication (CDCR)
> > > >
> > > >
> > > >
> > > > All of these are scheduled to be removed in a future 9.x release.
> > > >
> > > > It was decided that these components did not meet the standards of
> > > quality
> > > > and support that we wish to ensure for all components we ship. Some
> of
> > > > these also relied on design patterns that we no longer recommend for
> use
> > > in
> > > > critical production environments.
> > > >
> > > > If you rely on these features, you are encouraged to try out
> community
> > > > supported versions of these, where available [0]. Where such
> community
> > > > support is not available, we encourage you to participate in the
> > > migration
> > > > of these components into community supported packages and help
> continue
> > > the
> > > > development. We envision that using packages for these components via
> > > > package manager will actually make it easier for users to use such
> > > features.
> > > >
> > > > Regards,
> > > >
> > > > Ishan Chattopadhyaya
> > > >
> > > > (On behalf of the Apache Lucene/Solr PMC)
> > > >
> > > > [0] -
> > > >
> > >
> https://cwiki.apache.org/confluence/display/SOLR/Community+supported+packages+for+Solr
> > > >
> > > > On Wed, Jul 15, 2020 at 2:30 PM Bruno Roustant <
> bruno.roust...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > The Lucene PMC is pleased to announce the release of Apache Solr
> 8.6.0.
> > > > >
> > > > >
> > > > > Solr is the popular, blazing fast, open source NoSQL search
> platform
> > > from
> > > > > the Apache Lucene project. Its major features include powerful
> > > full-text
> > > > > search, hit highlighting, faceted search, dynamic clustering,
> database
> > > > > integration, rich document handling, and geospatial search. Solr is
> > > highly
> > > > > scalable, providing fault tolerant distributed search and
> indexing, and
> > > > > powers the search and navigation features of many of the world's
> > > largest
> > > > > internet sites.
> > > > >
> > > > >
> > > > > Solr 8.6.0 is available for immediate download at:
> > > > >
> > > > >
> > > > >   
> > > > >
> > > > >
> > > > > ### Solr 8.6.0 Release Highlights:
> > > > >
> > > > >
> > > > >  * Cross-Collection Join Queries: Join queries can now work
> > > > > cross-collection, even when shared or when spanning nodes.
> > > > >
> > > > >  * Search: Performance improvement for some types of queries when
> exact
> > > > > hit count isn't needed by using BlockMax WAND algorithm.
> > > > >
> > > > >  * Streaming Expression: Percentiles and standard deviation
> > > aggregations
> > > > > added to stats, facet and time series.  Streaming expressions
> added to
> > > > > /export handler.  Drill Streaming Expression for efficient and
> accurate
> > > > > high cardinality aggregation.
> > > > >
> > > > >  * Package manager: Support for cluster (CoreContainer) level
> plugins.
> > > > >
> > > > >  * Health Check: HealthCheckHandler can now require that all cores
> are
> > > > > healthy before returning OK.
> > > > >
> > > > >  * Zookeeper read API: A read API at /api/cluster/zk/* to fetch
> raw ZK
> > > > > data and view contents of a ZK directory.
> > > > >
> > > > >  * Admin UI: New panel with 

Recent and upcoming deprecations

2020-07-16 Thread Ishan Chattopadhyaya
Hi Solr Users,
Here is a list of recent and upcoming deprecations in Solr 8.x.
https://cwiki.apache.org/confluence/display/SOLR/Deprecations

Please feel free to chime in if you have any questions. You can comment
here or in the specific JIRA issues.

Thanks and regards,
Ishan Chattopadhyaya


Re: CDCR stress-test issues

2020-07-16 Thread Ishan Chattopadhyaya
FYI, CDCR support, as it exists in Solr today, has been deprecated in 8.6.
It suffers from serious design flaws and it allows such things to happen
that you observe. While there may be workarounds, it is advisable to not
rely on CDCR in production.

Thanks,
Ishan

On Thu, 2 Jul, 2020, 1:12 am Oakley, Craig (NIH/NLM/NCBI) [C],
 wrote:

> For the record, it is not just Solr7.4 which has the problem. When I start
> afresh with Solr8.5.2, both symptoms persist.
>
> With Solr8.5.2, tlogs accumulate endlessly at the non-Leader nodes of the
> Source SolrCloud and are never released regardless of maxNumLogsToKeep
> setting
>
> And with Solr8.5.2, if four scripts run simultaneously for a few minutes,
> each script running a loop each iteration of which adds batches of 6
> records to the Source SolrCloud, a couple dozen records wind up on the
> Source without ever arriving at the Target SolrCloud (although the Target
> does have records which were added after the missing records).
>
> Does anyone yet have any suggestion how to get CDCR to work properly?
>
>
> -Original Message-
> From: Oakley, Craig (NIH/NLM/NCBI) [C] 
> Sent: Wednesday, June 24, 2020 9:46 AM
> To: solr-user@lucene.apache.org
> Subject: CDCR stress-test issues
>
> In attempting to stress-test CDCR (running Solr 7.4), I am running into a
> couple of issues.
>
> One is that the tlog files keep accumulating for some nodes in the CDCR
> system, particularly for the non-Leader nodes in the Source SolrCloud. No
> quantity of hard commits seem to cause any of these tlog files to be
> released. This can become a problem upon reboot if there are hundreds of
> thousands of tlog files, and Solr fails to start (complaining that there
> are too many open files).
>
> The tlogs had been accumulating on all the nodes of the CDCR set of
> SolrClouds until I added these two lines to the solrconfig.xml file (for
> testing purposes, using numbers much lower than in the examples):
> 5
> 2
> Since then, it is mostly the non-Leader nodes of the Source SolrCloud
> which accumulates tlog files (the Target SolrCloud does seem to have a
> tendency to clean up the tlog files, as does the Leader of the Source
> SolrCloud). If I use ADDREPLICAPROP and REBALANCELEADERS to change which
> node is the Leader, and if I then start adding more data, the tlogs on the
> new Leader sometimes will go away, but then the old Leader begins
> accumulating tlog files. I am dubious whether frequent reassignment of
> Leadership would be a practical solution.
>
> I also have several times attempted to simulate a production environment
> by running several loops simultaneously, each of which inserts multiple
> records on each iteration of the loop. Several times, I end up with a dozen
> records on (both replicas of) the Source which never make it to (either
> replica of) the Target. The Target has thousands of records which were
> inserted before the missing records, and thousands of records which were
> inserted after the missing records (and all these records, the replicated
> and the missing, were inserted by curl commands which only differed in
> sequential numbers incorporated into the values being inserted).
>
> I also have a question regarding SOLR-13141: the 11/Feb/19 comment says
> that the fix for Solr 7.3 had a problem; and the header says "Affects
> Version/s: 7.5, 7.6": does that indicate that Solr 7.4 is not affected?
>
> Are  there any suggestions?
>
> Thanks
>


How do I use dismax or edismax to rank using 60% tf-idf and 40% a numeric field?

2020-07-16 Thread Russell Jurney
Hello Solarians,

I know how to boost a query and I see the methods for tf and idf in
streaming scripting. What I don’t know is how to incorporate these things
together at a specific percentage of the ranking function.

How do I write a query to use dismax or edismax to rank using 60% tf-idf
score and 40% the value of a numeric field?

Thanks,
Russ
-- 

Thanks,
Russell Jurney @rjurney 
russell.jur...@gmail.com LI  FB
 datasyndrome.com


Re: [ANNOUNCE] Apache Solr 8.6.0 released

2020-07-16 Thread matthew sporleder
I hear all of that and agree, obviously, but "curl
solr:8983/collection/dataimport?blah" in cron was *pretty freaking
easy* ;)

Not sure why "pull" is elevated to "anti-pattern"; data is data is data

On Thu, Jul 16, 2020 at 8:49 PM Ishan Chattopadhyaya
 wrote:
>
> Thanks Aroop for your feedback. We shall try to ensure continuity of
> functionality via packages. Your help in those efforts would be greatly
> appreciated as well. Let us take this discussion to SOLR-14660.
>
> > Is there a replacement for DIH?
> DIH is available as a community supported package. However, it is an
> anti-pattern for a search engine to be pulling data from outside. Instead,
> please consider writing separate indexing programs that pull data from the
> database systems and index into Solr. It is not only a good practice, but
> also more efficient in terms of throughput. For more information on this,
> please start another thread in solr-users@ list, and more people can
> suggest best alternatives here.
>
>
> On Fri, Jul 17, 2020 at 5:50 AM matthew sporleder 
> wrote:
>
> > Is there a replacement for DIH?
> >
> > On Wed, Jul 15, 2020 at 10:08 AM Ishan Chattopadhyaya
> >  wrote:
> > >
> > > Dear Solr Users,
> > >
> > > In this release (Solr 8.6), we have deprecated the following:
> > >
> > >   1. Data Import Handler
> > >
> > >   2. HDFS support
> > >
> > >   3. Cross Data Center Replication (CDCR)
> > >
> > >
> > >
> > > All of these are scheduled to be removed in a future 9.x release.
> > >
> > > It was decided that these components did not meet the standards of
> > quality
> > > and support that we wish to ensure for all components we ship. Some of
> > > these also relied on design patterns that we no longer recommend for use
> > in
> > > critical production environments.
> > >
> > > If you rely on these features, you are encouraged to try out community
> > > supported versions of these, where available [0]. Where such community
> > > support is not available, we encourage you to participate in the
> > migration
> > > of these components into community supported packages and help continue
> > the
> > > development. We envision that using packages for these components via
> > > package manager will actually make it easier for users to use such
> > features.
> > >
> > > Regards,
> > >
> > > Ishan Chattopadhyaya
> > >
> > > (On behalf of the Apache Lucene/Solr PMC)
> > >
> > > [0] -
> > >
> > https://cwiki.apache.org/confluence/display/SOLR/Community+supported+packages+for+Solr
> > >
> > > On Wed, Jul 15, 2020 at 2:30 PM Bruno Roustant  > >
> > > wrote:
> > >
> > > > The Lucene PMC is pleased to announce the release of Apache Solr 8.6.0.
> > > >
> > > >
> > > > Solr is the popular, blazing fast, open source NoSQL search platform
> > from
> > > > the Apache Lucene project. Its major features include powerful
> > full-text
> > > > search, hit highlighting, faceted search, dynamic clustering, database
> > > > integration, rich document handling, and geospatial search. Solr is
> > highly
> > > > scalable, providing fault tolerant distributed search and indexing, and
> > > > powers the search and navigation features of many of the world's
> > largest
> > > > internet sites.
> > > >
> > > >
> > > > Solr 8.6.0 is available for immediate download at:
> > > >
> > > >
> > > >   
> > > >
> > > >
> > > > ### Solr 8.6.0 Release Highlights:
> > > >
> > > >
> > > >  * Cross-Collection Join Queries: Join queries can now work
> > > > cross-collection, even when shared or when spanning nodes.
> > > >
> > > >  * Search: Performance improvement for some types of queries when exact
> > > > hit count isn't needed by using BlockMax WAND algorithm.
> > > >
> > > >  * Streaming Expression: Percentiles and standard deviation
> > aggregations
> > > > added to stats, facet and time series.  Streaming expressions added to
> > > > /export handler.  Drill Streaming Expression for efficient and accurate
> > > > high cardinality aggregation.
> > > >
> > > >  * Package manager: Support for cluster (CoreContainer) level plugins.
> > > >
> > > >  * Health Check: HealthCheckHandler can now require that all cores are
> > > > healthy before returning OK.
> > > >
> > > >  * Zookeeper read API: A read API at /api/cluster/zk/* to fetch raw ZK
> > > > data and view contents of a ZK directory.
> > > >
> > > >  * Admin UI: New panel with security info in admin UI's dashboard.
> > > >
> > > >  * Query DSL: Support for {param:ref} and {bool: {excludeTags:""}}
> > > >
> > > >  * Ref Guide: Major redesign of Solr's documentation.
> > > >
> > > >
> > > > Please read CHANGES.txt for a full list of new features and changes:
> > > >
> > > >
> > > >   
> > > >
> > > >
> > > > Solr 8.6.0 also includes features, optimizations  and bugfixes in the
> > > > corresponding Apache Lucene release:
> > > >
> > > >
> > > >   
> > > 

Post request body not showing in Solr logs

2020-07-16 Thread SayantiGmail
Hi

Good morning

We are unable to view the body for post request in solr logs rather just the 
Variable name shows up.Is there a way the post body can be shown in solr 
(8.4.1)logs.

Regards 
Sayanti 



Re: [ANNOUNCE] Apache Solr 8.6.0 released

2020-07-16 Thread Ishan Chattopadhyaya
Thanks Aroop for your feedback. We shall try to ensure continuity of
functionality via packages. Your help in those efforts would be greatly
appreciated as well. Let us take this discussion to SOLR-14660.

> Is there a replacement for DIH?
DIH is available as a community supported package. However, it is an
anti-pattern for a search engine to be pulling data from outside. Instead,
please consider writing separate indexing programs that pull data from the
database systems and index into Solr. It is not only a good practice, but
also more efficient in terms of throughput. For more information on this,
please start another thread in solr-users@ list, and more people can
suggest best alternatives here.


On Fri, Jul 17, 2020 at 5:50 AM matthew sporleder 
wrote:

> Is there a replacement for DIH?
>
> On Wed, Jul 15, 2020 at 10:08 AM Ishan Chattopadhyaya
>  wrote:
> >
> > Dear Solr Users,
> >
> > In this release (Solr 8.6), we have deprecated the following:
> >
> >   1. Data Import Handler
> >
> >   2. HDFS support
> >
> >   3. Cross Data Center Replication (CDCR)
> >
> >
> >
> > All of these are scheduled to be removed in a future 9.x release.
> >
> > It was decided that these components did not meet the standards of
> quality
> > and support that we wish to ensure for all components we ship. Some of
> > these also relied on design patterns that we no longer recommend for use
> in
> > critical production environments.
> >
> > If you rely on these features, you are encouraged to try out community
> > supported versions of these, where available [0]. Where such community
> > support is not available, we encourage you to participate in the
> migration
> > of these components into community supported packages and help continue
> the
> > development. We envision that using packages for these components via
> > package manager will actually make it easier for users to use such
> features.
> >
> > Regards,
> >
> > Ishan Chattopadhyaya
> >
> > (On behalf of the Apache Lucene/Solr PMC)
> >
> > [0] -
> >
> https://cwiki.apache.org/confluence/display/SOLR/Community+supported+packages+for+Solr
> >
> > On Wed, Jul 15, 2020 at 2:30 PM Bruno Roustant  >
> > wrote:
> >
> > > The Lucene PMC is pleased to announce the release of Apache Solr 8.6.0.
> > >
> > >
> > > Solr is the popular, blazing fast, open source NoSQL search platform
> from
> > > the Apache Lucene project. Its major features include powerful
> full-text
> > > search, hit highlighting, faceted search, dynamic clustering, database
> > > integration, rich document handling, and geospatial search. Solr is
> highly
> > > scalable, providing fault tolerant distributed search and indexing, and
> > > powers the search and navigation features of many of the world's
> largest
> > > internet sites.
> > >
> > >
> > > Solr 8.6.0 is available for immediate download at:
> > >
> > >
> > >   
> > >
> > >
> > > ### Solr 8.6.0 Release Highlights:
> > >
> > >
> > >  * Cross-Collection Join Queries: Join queries can now work
> > > cross-collection, even when shared or when spanning nodes.
> > >
> > >  * Search: Performance improvement for some types of queries when exact
> > > hit count isn't needed by using BlockMax WAND algorithm.
> > >
> > >  * Streaming Expression: Percentiles and standard deviation
> aggregations
> > > added to stats, facet and time series.  Streaming expressions added to
> > > /export handler.  Drill Streaming Expression for efficient and accurate
> > > high cardinality aggregation.
> > >
> > >  * Package manager: Support for cluster (CoreContainer) level plugins.
> > >
> > >  * Health Check: HealthCheckHandler can now require that all cores are
> > > healthy before returning OK.
> > >
> > >  * Zookeeper read API: A read API at /api/cluster/zk/* to fetch raw ZK
> > > data and view contents of a ZK directory.
> > >
> > >  * Admin UI: New panel with security info in admin UI's dashboard.
> > >
> > >  * Query DSL: Support for {param:ref} and {bool: {excludeTags:""}}
> > >
> > >  * Ref Guide: Major redesign of Solr's documentation.
> > >
> > >
> > > Please read CHANGES.txt for a full list of new features and changes:
> > >
> > >
> > >   
> > >
> > >
> > > Solr 8.6.0 also includes features, optimizations  and bugfixes in the
> > > corresponding Apache Lucene release:
> > >
> > >
> > >   
> > >
> > >
> > > Note: The Apache Software Foundation uses an extensive mirroring
> network
> > > for
> > >
> > > distributing releases. It is possible that the mirror you are using may
> > > not have
> > >
> > > replicated the release yet. If that is the case, please try another
> mirror.
> > >
> > > This also applies to Maven access.
> > >
>


Re: [ANNOUNCE] Apache Solr 8.6.0 released

2020-07-16 Thread matthew sporleder
Is there a replacement for DIH?

On Wed, Jul 15, 2020 at 10:08 AM Ishan Chattopadhyaya
 wrote:
>
> Dear Solr Users,
>
> In this release (Solr 8.6), we have deprecated the following:
>
>   1. Data Import Handler
>
>   2. HDFS support
>
>   3. Cross Data Center Replication (CDCR)
>
>
>
> All of these are scheduled to be removed in a future 9.x release.
>
> It was decided that these components did not meet the standards of quality
> and support that we wish to ensure for all components we ship. Some of
> these also relied on design patterns that we no longer recommend for use in
> critical production environments.
>
> If you rely on these features, you are encouraged to try out community
> supported versions of these, where available [0]. Where such community
> support is not available, we encourage you to participate in the migration
> of these components into community supported packages and help continue the
> development. We envision that using packages for these components via
> package manager will actually make it easier for users to use such features.
>
> Regards,
>
> Ishan Chattopadhyaya
>
> (On behalf of the Apache Lucene/Solr PMC)
>
> [0] -
> https://cwiki.apache.org/confluence/display/SOLR/Community+supported+packages+for+Solr
>
> On Wed, Jul 15, 2020 at 2:30 PM Bruno Roustant 
> wrote:
>
> > The Lucene PMC is pleased to announce the release of Apache Solr 8.6.0.
> >
> >
> > Solr is the popular, blazing fast, open source NoSQL search platform from
> > the Apache Lucene project. Its major features include powerful full-text
> > search, hit highlighting, faceted search, dynamic clustering, database
> > integration, rich document handling, and geospatial search. Solr is highly
> > scalable, providing fault tolerant distributed search and indexing, and
> > powers the search and navigation features of many of the world's largest
> > internet sites.
> >
> >
> > Solr 8.6.0 is available for immediate download at:
> >
> >
> >   
> >
> >
> > ### Solr 8.6.0 Release Highlights:
> >
> >
> >  * Cross-Collection Join Queries: Join queries can now work
> > cross-collection, even when shared or when spanning nodes.
> >
> >  * Search: Performance improvement for some types of queries when exact
> > hit count isn't needed by using BlockMax WAND algorithm.
> >
> >  * Streaming Expression: Percentiles and standard deviation aggregations
> > added to stats, facet and time series.  Streaming expressions added to
> > /export handler.  Drill Streaming Expression for efficient and accurate
> > high cardinality aggregation.
> >
> >  * Package manager: Support for cluster (CoreContainer) level plugins.
> >
> >  * Health Check: HealthCheckHandler can now require that all cores are
> > healthy before returning OK.
> >
> >  * Zookeeper read API: A read API at /api/cluster/zk/* to fetch raw ZK
> > data and view contents of a ZK directory.
> >
> >  * Admin UI: New panel with security info in admin UI's dashboard.
> >
> >  * Query DSL: Support for {param:ref} and {bool: {excludeTags:""}}
> >
> >  * Ref Guide: Major redesign of Solr's documentation.
> >
> >
> > Please read CHANGES.txt for a full list of new features and changes:
> >
> >
> >   
> >
> >
> > Solr 8.6.0 also includes features, optimizations  and bugfixes in the
> > corresponding Apache Lucene release:
> >
> >
> >   
> >
> >
> > Note: The Apache Software Foundation uses an extensive mirroring network
> > for
> >
> > distributing releases. It is possible that the mirror you are using may
> > not have
> >
> > replicated the release yet. If that is the case, please try another mirror.
> >
> > This also applies to Maven access.
> >


Re: [ANNOUNCE] Apache Solr 8.6.0 released

2020-07-16 Thread Aroop Ganguly
Just to highlight the usage and importance of some of the items here.

1. HDFS Backup/Restore is integral to our system architecture, index 
distribution and Disaster Recovery (system used by 1000+ users internally)
2. HDFS Directory factory, Embedded Solr, these items too are very important 
for offline index generation at large scale (~5-10TB big source data)

We really use these items and they are very relevant for companies that run 
Solr to augment Big Data analysis.

I just wanted to mention this in case these features’ need and value to 
customers/users were not represented as yet.

I do support the cleansing of the core as long these items are still available 
via dedicated module/plug-ins.

Thanks for the discussion around this. I hope the PMC community will consider 
these things and guide accordingly.

Regards
Aroop

> On Jul 16, 2020, at 1:30 AM, Anshum Gupta  wrote:
> 
> Thanks for the feedback, Colvin. We'll certainly try and do something
> around making the deprecations more visible and easier to track for all
> users. I'm not sure if 'news' is the right section, but I think it might be
> good to have a section on the website for users to look at and get a better
> idea about.
> 
> The PMC and committers are certainly aware of the importance of not
> dropping desirable features but let's not hijack the release announcement
> thread for this discussion.
> 
> On Thu, Jul 16, 2020 at 1:09 AM Colvin Cowie 
> wrote:
> 
>> Perhaps the deprecation notices should feature on
>> https://lucene.apache.org/solr/news.html ? Because right now, they're not
>> *very
>> *visible in the changes.
>> 
>> On Thu, 16 Jul 2020 at 01:18, Aroop Ganguly > .invalid>
>> wrote:
>> 
>>> May we ask what in hdfs support is being deprecated? Is Hdfs backup and
>>> restore being deprecated ?
>>> 
>>> Sent from my iPhone
>>> 
 On Jul 15, 2020, at 3:41 PM, Houston Putman 
>>> wrote:
 
 To address your concern Bernd,
 
 The goal of this deprecation is not to remove the functionality
>> entirely.
 The primary purpose is to remove the code from Solr Core. Before
>>> removing a
 feature we aim to either:
 
  - Move the code to another repository, and have it be loadable via a
  plugin
  - Replace the feature with something more stable and/or scalable.
  (Likely loadable via a plugin or run in a separate application)
 
 I understand your frustration, but the ultimate goal here is to make
>> Solr
 more customizable and plugable (and therefore learner by default). This
>>> way
 the base Solr functionality can be as bug-free and performant as
>>> possible,
 and any extra features can be added on top as needed.
 
 We would appreciate feedback for how the community would prefer these
 features be provided in the future, so that we make the transition
>>> smoother
 and the end product better.
 
 - Houston
 
> On Wed, Jul 15, 2020 at 5:51 PM Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com> wrote:
> 
> On Wed, 15 Jul, 2020, 8:37 pm Bernd Fehling, <
> bernd.fehl...@uni-bielefeld.de>
> wrote:
> 
>> 
>> 
>> Am 15.07.20 um 16:07 schrieb Ishan Chattopadhyaya:
>>> Dear Solr Users,
>>> 
>>> In this release (Solr 8.6), we have deprecated the following:
>>> 
>>> 1. Data Import Handler
>>> 
>>> 2. HDFS support
>>> 
>>> 3. Cross Data Center Replication (CDCR)
>>> 
>> 
>> Seriously? :-(
>> 
> 
> Please see SOLR-14022.
> 
> 
>> So next steps will be kicking out Cloud and go back to single node or
> what?
>> 
> 
> Not something we've considered yet.
> 
> 
>> Why don't you just freeze the whole Solr development and switch to
> Elastic?
>> 
> 
> Not something we've considered yet.
> 
> 
>> 
>>> 
>>> 
>>> All of these are scheduled to be removed in a future 9.x release.
>>> 
>>> It was decided that these components did not meet the standards of
>> quality
>>> and support that we wish to ensure for all components we ship. Some
>> of
>>> these also relied on design patterns that we no longer recommend for
> use
>> in
>>> critical production environments.
>>> 
>>> If you rely on these features, you are encouraged to try out
>> community
>>> supported versions of these, where available [0]. Where such
>> community
>>> support is not available, we encourage you to participate in the
>> migration
>>> of these components into community supported packages and help
>>> continue
>> the
>>> development. We envision that using packages for these components
>> via
>>> package manager will actually make it easier for users to use such
>> features.
>>> 
>>> Regards,
>>> 
>>> Ishan Chattopadhyaya
>>> 
>>> (On behalf of the Apache Lucene/Solr PMC)
>>> 
>>> [0] -
>>> 
>> 
> 
>>> 
>> 

Querying solr using many QueryParser in one call

2020-07-16 Thread harjags
Hi All,

Below are question regarding querying solr using many QueryParser in one
call.

We have need to do a search by keyword and also include few specific
documents to result. We don't want to use elevator component as that would
put those mandatory documents to the top of the result. We would like to mix
those mandatory documents with organic keyword lookup result set and also
make sure those mandatory documents take part in other scoring mechanism
like bq's.On top of this we would also need to classify documents matched by
keyword lookup against mandatory docs.We ended up doing the below solr query
param to achieve it.

*fl*=id,title,isTermMatch:exists(query({!type=edismax qf=$qf v=blah})),score
*q*=({!edismax qf=$qf v=$searchQuery mm=$mm}) OR ({!edismax qf=$qf
v=$docIdQuery mm=0 sow=true}) 
*docIdQuery*=5985612 6339445 5357348
*searchQuery*=blah

Below are my question.
1.As you can see we are calling three query parser in one call what would be
the performance implication of the search?
2.As you can see two of those queries. the one in q and one in fl is the
same. would query result cache help?
3.In general what is the implications on performance when we do a search
calling multiple query parser in a single call.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Querying solr using many QueryParser in one call

2020-07-16 Thread harjag...@gmail.com
Hi All,
Below are question regarding querying solr using many QueryParser in one
call.
We have need to do a search by keyword and also include few specific
documents to result. We don't want to use elevator component as that would
put those mandatory documents to the top of the result. We would like to mix
those mandatory documents with organic keyword lookup result set and also
make sure those mandatory documents take part in other scoring mechanism
like bq's.On top of this we would also need to classify documents matched by
keyword lookup against mandatory docs.We ended up doing the below solr query
param to achieve it.

fl=id,title,isTermMatch:exists(query({!type=edismax qf=$qf v=blah})),score
q=({!edismax qf=$qf v=$searchQuery mm=$mm}) OR ({!edismax qf=$qf
v=$docIdQuery mm=0 sow=true})
docIdQuery=5985612 6339445 5357348
searchQuery=blah

Below are my question
1.As you can see we are calling three query parser in one call what would be
the performance implication of the search?
2.As you can see two of those queries. the one in q and one in fl is the
same. would query result cache help?
3.In general what is the implications on performance when we do a search
calling multiple query parser in a single call?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Cannot read ZK Kerberos conf when enabling java security manager on 8.6

2020-07-16 Thread Jörn Franke
The solution would be probably a policy file shipped with Solr that allows
the ZK jar to create a logincontext. I suggest that Solr ships it otherwise
one would need to adapt it for every Solr update manually to include the
version of the ZK jar.

On Thu, Jul 16, 2020 at 8:15 PM Jörn Franke  wrote:

> I believe it is a bug in Solr because we need to create a policy to allow
> creating a login context:
> See here chapter "Running the Code with a Security Manager"
>
> http://www.informatik.hs-furtwangen.de/doku/java/j2sdk-1_4_1-doc/guide/security/jaas/tutorials/LoginConfigFile.html
>
> Please confirm and I will create a JIRA issue for Solr
>
> On Thu, Jul 16, 2020 at 8:06 PM Jörn Franke  wrote:
>
>> Hallo,
>>
>> I am using Solr 8.6.0.
>> When activating the Java security manager then Solr cannot use anymore
>> the jaas-client conf specified via java.security.auth.login.conf with
>> Zookeeper. We have configured Kerberos authentication for Zookeeper.
>> When disabling java security manager it works perfectly fine.
>>
>> The exact error message is : „No JAAS configuration section named
>> 'Client' was found“. Somehow it seems that the Java security manager blocks
>> access to that file .
>> The directory for the file is in the -Dsolr.allowPaths
>>  Could this be a bug or is it a misconfiguration?
>>
>>
>> Thank you.
>>
>> Best regards
>
>


Re: Cannot read ZK Kerberos conf when enabling java security manager on 8.6

2020-07-16 Thread Jörn Franke
I believe it is a bug in Solr because we need to create a policy to allow
creating a login context:
See here chapter "Running the Code with a Security Manager"
http://www.informatik.hs-furtwangen.de/doku/java/j2sdk-1_4_1-doc/guide/security/jaas/tutorials/LoginConfigFile.html

Please confirm and I will create a JIRA issue for Solr

On Thu, Jul 16, 2020 at 8:06 PM Jörn Franke  wrote:

> Hallo,
>
> I am using Solr 8.6.0.
> When activating the Java security manager then Solr cannot use anymore the
> jaas-client conf specified via java.security.auth.login.conf with
> Zookeeper. We have configured Kerberos authentication for Zookeeper.
> When disabling java security manager it works perfectly fine.
>
> The exact error message is : „No JAAS configuration section named 'Client'
> was found“. Somehow it seems that the Java security manager blocks access
> to that file .
> The directory for the file is in the -Dsolr.allowPaths
>  Could this be a bug or is it a misconfiguration?
>
>
> Thank you.
>
> Best regards


Cannot read ZK Kerberos conf when enabling java security manager on 8.6

2020-07-16 Thread Jörn Franke
Hallo,

I am using Solr 8.6.0.
When activating the Java security manager then Solr cannot use anymore the 
jaas-client conf specified via java.security.auth.login.conf with Zookeeper. We 
have configured Kerberos authentication for Zookeeper. 
When disabling java security manager it works perfectly fine.

The exact error message is : „No JAAS configuration section named 'Client' was 
found“. Somehow it seems that the Java security manager blocks access to that 
file .
The directory for the file is in the -Dsolr.allowPaths 
 Could this be a bug or is it a misconfiguration?


Thank you.

Best regards 

Re: Solr fails to start with G1 GC

2020-07-16 Thread Walter Underwood
Instead of editing bin/solr, you should be able to set GC_TUNE in 
solr.in.sh, as I showed in my post below.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jul 16, 2020, at 7:52 AM, krishan goyal  wrote:
> 
> The issue was figured out by starting solr with the -f parameter which
> starts solr in foreground and provides the errors if any
> 
> Got an error - "Conflicting collector combinations in option list; please
> refer to the release notes for the combinations allowed"
> 
> Turns out bin/solr file starts with CMS by default and had to disable that
> to resolve the conflict.
> 
> 
> On Wed, Jul 15, 2020 at 10:20 PM Walter Underwood 
> wrote:
> 
>> I don’t see a heap size specified, so it is probably trying to run with
>> a 512 Megabyte heap. That might just not work with the 32M region
>> size.
>> 
>> Here are the options we have been using for 3+ years on about 150 hosts.
>> 
>> SOLR_HEAP=8g
>> # Use G1 GC  -- wunder 2017-01-23
>> # Settings from https://wiki.apache.org/solr/ShawnHeisey
>> GC_TUNE=" \
>> -XX:+UseG1GC \
>> -XX:+ParallelRefProcEnabled \
>> -XX:G1HeapRegionSize=8m \
>> -XX:MaxGCPauseMillis=200 \
>> -XX:+UseLargePages \
>> -XX:+AggressiveOpts \
>> "
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Jul 15, 2020, at 4:24 AM, krishan goyal 
>> wrote:
>>> 
>>> Hi,
>>> 
>>> I am using Solr 7.7
>>> 
>>> I am trying to start my solr server with G1 GC instead of the default CMS
>>> but the solr service doesn't get up.
>>> 
>>> The command I use to start solr is
>>> 
>>> bin/solr start -p 25280 -a "-Dsolr.solr.home=
>>> -Denable.slave=true -Denable.master=false -XX:+UseG1GC
>>> -XX:MaxGCPauseMillis=500 -XX:+UnlockExperimentalVMOptions
>>> -XX:G1MaxNewSizePercent=30 -XX:G1NewSizePercent=5
>> -XX:G1HeapRegionSize=32M
>>> -XX:InitiatingHeapOccupancyPercent=70"
>>> 
>>> I have tried various permutations of the start command by dropping /
>> adding
>>> other parameters but it doesn't work. However starts up just fine with
>>> just "-Dsolr.solr.home= -Denable.slave=true
>>> -Denable.master=false" and starts up with the default CMS collector
>>> 
>>> I don't get any useful error logs too. It waits for default 180 secs and
>>> then prints
>>> 
>>> Warning: Available entropy is low. As a result, use of the UUIDField,
>> SSL,
>>> or any other features that require
>>> RNG might not work properly. To check for the amount of available
>> entropy,
>>> use 'cat /proc/sys/kernel/random/entropy_avail'.
>>> 
>>> Waiting up to 180 seconds to see Solr running on port 25280 [|]  Still
>> not
>>> seeing Solr listening on 25280 after 180 seconds!
>>> 2020-07-15 07:07:52.042 INFO  (coreCloseExecutor-60-thread-6) [
>>> x:coreName] o.a.s.c.SolrCore [coreName]  CLOSING SolrCore
>>> org.apache.solr.core.SolrCore@7cc638d8
>>> 2020-07-15 07:07:52.099 INFO  (coreCloseExecutor-60-thread-6) [
>>> x:coreName] o.a.s.m.SolrMetricManager Closing metric reporters for
>>> registry=solr.core.coreName, tag=7cc638d8
>>> 2020-07-15 07:07:52.100 INFO  (coreCloseExecutor-60-thread-6) [
>>> x:coreName] o.a.s.m.r.SolrJmxReporter Closing reporter
>>> [org.apache.solr.metrics.reporters.SolrJmxReporter@5216981f: rootName =
>>> null, domain = solr.core.coreName, service url = null, agent id = null]
>> for
>>> registry solr.core.coreName /
>> com.codahale.metrics.MetricRegistry@32988ddf
>>> 2020-07-15 07:07:52.173 INFO  (ShutdownMonitor) [   ]
>>> o.a.s.m.SolrMetricManager Closing metric reporters for
>> registry=solr.node,
>>> tag=null
>>> 2020-07-15 07:07:52.173 INFO  (ShutdownMonitor) [   ]
>>> o.a.s.m.r.SolrJmxReporter Closing reporter
>>> [org.apache.solr.metrics.reporters.SolrJmxReporter@28952dea: rootName =
>>> null, domain = solr.node, service url = null, agent id = null] for
>> registry
>>> solr.node / com.codahale.metrics.MetricRegistry@655f4a3f
>>> 2020-07-15 07:07:52.175 INFO  (ShutdownMonitor) [   ]
>>> o.a.s.m.SolrMetricManager Closing metric reporters for registry=solr.jvm,
>>> tag=null
>>> 2020-07-15 07:07:52.175 INFO  (ShutdownMonitor) [   ]
>>> o.a.s.m.r.SolrJmxReporter Closing reporter
>>> [org.apache.solr.metrics.reporters.SolrJmxReporter@69c6161d: rootName =
>>> null, domain = solr.jvm, service url = null, agent id = null] for
>> registry
>>> solr.jvm / com.codahale.metrics.MetricRegistry@1252ce77
>>> 2020-07-15 07:07:52.176 INFO  (ShutdownMonitor) [   ]
>>> o.a.s.m.SolrMetricManager Closing metric reporters for
>> registry=solr.jetty,
>>> tag=null
>>> 2020-07-15 07:07:52.176 INFO  (ShutdownMonitor) [   ]
>>> o.a.s.m.r.SolrJmxReporter Closing reporter
>>> [org.apache.solr.metrics.reporters.SolrJmxReporter@3aefae67: rootName =
>>> null, domain = solr.jetty, service url = null, agent id = null] for
>>> registry solr.jetty / com.codahale.metrics.MetricRegistry@3a538ecd
>> 
>> 



Re: Solr fails to start with G1 GC

2020-07-16 Thread krishan goyal
The issue was figured out by starting solr with the -f parameter which
starts solr in foreground and provides the errors if any

Got an error - "Conflicting collector combinations in option list; please
refer to the release notes for the combinations allowed"

Turns out bin/solr file starts with CMS by default and had to disable that
to resolve the conflict.


On Wed, Jul 15, 2020 at 10:20 PM Walter Underwood 
wrote:

> I don’t see a heap size specified, so it is probably trying to run with
> a 512 Megabyte heap. That might just not work with the 32M region
> size.
>
> Here are the options we have been using for 3+ years on about 150 hosts.
>
> SOLR_HEAP=8g
> # Use G1 GC  -- wunder 2017-01-23
> # Settings from https://wiki.apache.org/solr/ShawnHeisey
> GC_TUNE=" \
> -XX:+UseG1GC \
> -XX:+ParallelRefProcEnabled \
> -XX:G1HeapRegionSize=8m \
> -XX:MaxGCPauseMillis=200 \
> -XX:+UseLargePages \
> -XX:+AggressiveOpts \
> "
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Jul 15, 2020, at 4:24 AM, krishan goyal 
> wrote:
> >
> > Hi,
> >
> > I am using Solr 7.7
> >
> > I am trying to start my solr server with G1 GC instead of the default CMS
> > but the solr service doesn't get up.
> >
> > The command I use to start solr is
> >
> > bin/solr start -p 25280 -a "-Dsolr.solr.home=
> > -Denable.slave=true -Denable.master=false -XX:+UseG1GC
> > -XX:MaxGCPauseMillis=500 -XX:+UnlockExperimentalVMOptions
> > -XX:G1MaxNewSizePercent=30 -XX:G1NewSizePercent=5
> -XX:G1HeapRegionSize=32M
> > -XX:InitiatingHeapOccupancyPercent=70"
> >
> > I have tried various permutations of the start command by dropping /
> adding
> > other parameters but it doesn't work. However starts up just fine with
> > just "-Dsolr.solr.home= -Denable.slave=true
> > -Denable.master=false" and starts up with the default CMS collector
> >
> > I don't get any useful error logs too. It waits for default 180 secs and
> > then prints
> >
> > Warning: Available entropy is low. As a result, use of the UUIDField,
> SSL,
> > or any other features that require
> > RNG might not work properly. To check for the amount of available
> entropy,
> > use 'cat /proc/sys/kernel/random/entropy_avail'.
> >
> > Waiting up to 180 seconds to see Solr running on port 25280 [|]  Still
> not
> > seeing Solr listening on 25280 after 180 seconds!
> > 2020-07-15 07:07:52.042 INFO  (coreCloseExecutor-60-thread-6) [
> > x:coreName] o.a.s.c.SolrCore [coreName]  CLOSING SolrCore
> > org.apache.solr.core.SolrCore@7cc638d8
> > 2020-07-15 07:07:52.099 INFO  (coreCloseExecutor-60-thread-6) [
> > x:coreName] o.a.s.m.SolrMetricManager Closing metric reporters for
> > registry=solr.core.coreName, tag=7cc638d8
> > 2020-07-15 07:07:52.100 INFO  (coreCloseExecutor-60-thread-6) [
> > x:coreName] o.a.s.m.r.SolrJmxReporter Closing reporter
> > [org.apache.solr.metrics.reporters.SolrJmxReporter@5216981f: rootName =
> > null, domain = solr.core.coreName, service url = null, agent id = null]
> for
> > registry solr.core.coreName /
> com.codahale.metrics.MetricRegistry@32988ddf
> > 2020-07-15 07:07:52.173 INFO  (ShutdownMonitor) [   ]
> > o.a.s.m.SolrMetricManager Closing metric reporters for
> registry=solr.node,
> > tag=null
> > 2020-07-15 07:07:52.173 INFO  (ShutdownMonitor) [   ]
> > o.a.s.m.r.SolrJmxReporter Closing reporter
> > [org.apache.solr.metrics.reporters.SolrJmxReporter@28952dea: rootName =
> > null, domain = solr.node, service url = null, agent id = null] for
> registry
> > solr.node / com.codahale.metrics.MetricRegistry@655f4a3f
> > 2020-07-15 07:07:52.175 INFO  (ShutdownMonitor) [   ]
> > o.a.s.m.SolrMetricManager Closing metric reporters for registry=solr.jvm,
> > tag=null
> > 2020-07-15 07:07:52.175 INFO  (ShutdownMonitor) [   ]
> > o.a.s.m.r.SolrJmxReporter Closing reporter
> > [org.apache.solr.metrics.reporters.SolrJmxReporter@69c6161d: rootName =
> > null, domain = solr.jvm, service url = null, agent id = null] for
> registry
> > solr.jvm / com.codahale.metrics.MetricRegistry@1252ce77
> > 2020-07-15 07:07:52.176 INFO  (ShutdownMonitor) [   ]
> > o.a.s.m.SolrMetricManager Closing metric reporters for
> registry=solr.jetty,
> > tag=null
> > 2020-07-15 07:07:52.176 INFO  (ShutdownMonitor) [   ]
> > o.a.s.m.r.SolrJmxReporter Closing reporter
> > [org.apache.solr.metrics.reporters.SolrJmxReporter@3aefae67: rootName =
> > null, domain = solr.jetty, service url = null, agent id = null] for
> > registry solr.jetty / com.codahale.metrics.MetricRegistry@3a538ecd
>
>


Re: Elevation with distributed search causes NPE

2020-07-16 Thread Erick Erickson
Can you raise a JIRA? If you’re ambitious, you can add a patch too ;)

> On Jul 16, 2020, at 2:52 AM, Marc Linden  
> wrote:
> 
> Thanks Erick for your fast response.
> 
> I've checked out adding the sort param and yes that vanished the problem but 
> it also disables elevation if I'm not mistaken. So after adding 
> forceElevation=true to the query then a ClassCastException was thrown:
> 
> http://localhost:9983/solr/core1/select?q=elevatedTerm=false=text_en=edismax=lang:en=localhost:9983/solr/core1,localhost:9983/solr/core2=[elevated],[shard],area,id=10=0=area%20asc=true
> 
> java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> java.lang.String
>  at 
> org.apache.solr.schema.FieldType.unmarshalStringSortValue(FieldType.java:1229)
>  at org.apache.solr.schema.StrField.unmarshalSortValue(StrField.java:122)
>  at 
> org.apache.solr.handler.component.QueryComponent.unmarshalSortValues(QueryComponent.java:1092)
>  at 
> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:917)
>  at 
> org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:613)
>  at 
> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:592)
>  at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:431)
>  at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
>  at org.apache.solr.core.SolrCore.execute(SolrCore.java:2578)
>  at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:780)
>  ...
> 
> Best regards,
> Marc
> 
> -Ursprüngliche Nachricht-
> Von: Erick Erickson 
> Gesendet: Mittwoch, 15. Juli 2020 14:32
> An: solr-user@lucene.apache.org
> Betreff: Re: Elevation with distributed search causes NPE
> 
> Hmmm, looking at the code that line looks like this:
> 
> sortSpec.getSort().getSort();
> 
> I’m curious what happens if you specify a sort on the query? If that makes 
> the problem go away, it’s a smoking gun.
> 
> Whether or not adding sorting makes the problem go away, this looks like 
> something that’s a legitimate JIRA, please go ahead and raise one.
> 
> Best,
> Erick
> 
>> On Jul 15, 2020, at 4:34 AM, Marc Linden  
>> wrote:
>> 
>> Hi all,
>> 
>> I'm facing the problem that Solr is throwing a NullPointerException when 
>> performing a distributed search with multiple shards having elevation 
>> configured where one or more shards do have elevated results but others do 
>> not.
>> 
>> We are using Solr 8.2 and have the QueryElevationComponent configured with 
>> "last-components" of the default search handler "/select". But the problem 
>> also occurs when using the explicit "/elevate" search handler.
>>  ...
>> 
>> elevator
>> 
>> 
>> ...
>> >> 
>>  > name="queryFieldType">string
>> elevate.xml
>> 
>> 
>> ### Steps to reproduce:
>> 
>> (1) Add entries to the elevate.xml of each core to elevate a specific 
>> document for the text "searchTerm":
>> 
>>  core1:
>>  
>> ...
>> 
>>  
>>  core2:
>>  
>> ...
>> 
>>  
>> 
>> (2) Execute query (we use port 9983):
>> http://localhost:9983/solr/web/select?q=elevatedTerm
>> s=false=text_en=edismax=lang:en=localhost:9983/so
>> lr/core1,localhost:9983/solr/core2=[elevated],[shard],area,id=
>> 10=0
>> 
>> Now as both shards have elevated documents for the requested "searchTerm" 
>> the search results are as expected:
>> 
>> response: {
>> numFound: 5192,
>> start: 0,
>> maxScore: 1.9032197,
>> docs: [{
>> area: "press",
>> id: "core1docId1",
>> [elevated]: true,
>> [shard]: "localhost:9983/solr/core1"
>> }, {
>> area: "products",
>> id: "core2docId1",
>> [elevated]: true,
>> [shard]: "localhost:9983/solr/core2"
>> }, {
>> area: "press",
>> id: "core1docId2",
>> [elevated]: false,
>> [shard]: "localhost:9983/solr/core1"
>> },
>> ...
>> 
>> (3) Remove the elevation entry for that "searchTerm" from one of the
>> cores, e.g. via comment
>> 
>>  core2:
>>  
>> ...
>> 
>>  
>> 
>> 
>> (4) Reload the modified core:
>> http://localhost:9983/solr/admin/cores?action=RELOAD=core2
>> 
>> (5) Request same query again and you get the NPE:
>> 
>> error: {
>> trace: "java.lang.NullPointerException
>>  at 
>> org.apache.solr.handler.component.QueryComponent.unmarshalSortValues(QueryComponent.java:1068)
>>  at 
>> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:917)
>>  at 
>> org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:613)
>>  at 
>> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:592)
>>  at 
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:431)
>>  at 
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
>>  at org.apache.solr.core.SolrCore.execute(SolrCore.java:2578)
>>  at 
>> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:780)
>>  

Re: Concurrent query execution and Solr

2020-07-16 Thread Mauro Asprea
I would want to know about this too 

—
Mauro Asprea
E-Mail: mauroasp...@gmail.com
Mobile: +34 654 297 582

> El 14 jul 2020, a las 18:33, André Widhani  escribió:
> 
> Hi,
> 
> Does anybody know if work is in progress to make Lucene's concurrent query
> execution accessible through Solr? I am talking about this:
> http://blog.mikemccandless.com/2019/10/concurrent-query-execution-in-apache.html
> 
> I find this compelling in particular since the changes in LUCENE-7976 /
> Solr 7.5 where, even after an optimize, you end up with a number of almost
> equally sized segments. And for those who would go to Solr Cloud for
> parallel query execution only because they have other means of redundancy
> in place, this is a nice way to avoid additional complexity with ZooKeeper.
> 
> Thanks,
> André


Re: [ANNOUNCE] Apache Solr 8.6.0 released

2020-07-16 Thread Anshum Gupta
Thanks for the feedback, Colvin. We'll certainly try and do something
around making the deprecations more visible and easier to track for all
users. I'm not sure if 'news' is the right section, but I think it might be
good to have a section on the website for users to look at and get a better
idea about.

The PMC and committers are certainly aware of the importance of not
dropping desirable features but let's not hijack the release announcement
thread for this discussion.

On Thu, Jul 16, 2020 at 1:09 AM Colvin Cowie 
wrote:

> Perhaps the deprecation notices should feature on
> https://lucene.apache.org/solr/news.html ? Because right now, they're not
> *very
> *visible in the changes.
>
> On Thu, 16 Jul 2020 at 01:18, Aroop Ganguly  .invalid>
> wrote:
>
> > May we ask what in hdfs support is being deprecated? Is Hdfs backup and
> > restore being deprecated ?
> >
> > Sent from my iPhone
> >
> > > On Jul 15, 2020, at 3:41 PM, Houston Putman 
> > wrote:
> > >
> > > To address your concern Bernd,
> > >
> > > The goal of this deprecation is not to remove the functionality
> entirely.
> > > The primary purpose is to remove the code from Solr Core. Before
> > removing a
> > > feature we aim to either:
> > >
> > >   - Move the code to another repository, and have it be loadable via a
> > >   plugin
> > >   - Replace the feature with something more stable and/or scalable.
> > >   (Likely loadable via a plugin or run in a separate application)
> > >
> > > I understand your frustration, but the ultimate goal here is to make
> Solr
> > > more customizable and plugable (and therefore learner by default). This
> > way
> > > the base Solr functionality can be as bug-free and performant as
> > possible,
> > > and any extra features can be added on top as needed.
> > >
> > > We would appreciate feedback for how the community would prefer these
> > > features be provided in the future, so that we make the transition
> > smoother
> > > and the end product better.
> > >
> > > - Houston
> > >
> > >> On Wed, Jul 15, 2020 at 5:51 PM Ishan Chattopadhyaya <
> > >> ichattopadhy...@gmail.com> wrote:
> > >>
> > >> On Wed, 15 Jul, 2020, 8:37 pm Bernd Fehling, <
> > >> bernd.fehl...@uni-bielefeld.de>
> > >> wrote:
> > >>
> > >>>
> > >>>
> > >>> Am 15.07.20 um 16:07 schrieb Ishan Chattopadhyaya:
> >  Dear Solr Users,
> > 
> >  In this release (Solr 8.6), we have deprecated the following:
> > 
> >   1. Data Import Handler
> > 
> >   2. HDFS support
> > 
> >   3. Cross Data Center Replication (CDCR)
> > 
> > >>>
> > >>> Seriously? :-(
> > >>>
> > >>
> > >> Please see SOLR-14022.
> > >>
> > >>
> > >>> So next steps will be kicking out Cloud and go back to single node or
> > >> what?
> > >>>
> > >>
> > >> Not something we've considered yet.
> > >>
> > >>
> > >>> Why don't you just freeze the whole Solr development and switch to
> > >> Elastic?
> > >>>
> > >>
> > >> Not something we've considered yet.
> > >>
> > >>
> > >>>
> > 
> > 
> >  All of these are scheduled to be removed in a future 9.x release.
> > 
> >  It was decided that these components did not meet the standards of
> > >>> quality
> >  and support that we wish to ensure for all components we ship. Some
> of
> >  these also relied on design patterns that we no longer recommend for
> > >> use
> > >>> in
> >  critical production environments.
> > 
> >  If you rely on these features, you are encouraged to try out
> community
> >  supported versions of these, where available [0]. Where such
> community
> >  support is not available, we encourage you to participate in the
> > >>> migration
> >  of these components into community supported packages and help
> > continue
> > >>> the
> >  development. We envision that using packages for these components
> via
> >  package manager will actually make it easier for users to use such
> > >>> features.
> > 
> >  Regards,
> > 
> >  Ishan Chattopadhyaya
> > 
> >  (On behalf of the Apache Lucene/Solr PMC)
> > 
> >  [0] -
> > 
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/SOLR/Community+supported+packages+for+Solr
> > 
> >  On Wed, Jul 15, 2020 at 2:30 PM Bruno Roustant <
> > >> bruno.roust...@gmail.com
> > 
> >  wrote:
> > 
> > > The Lucene PMC is pleased to announce the release of Apache Solr
> > >> 8.6.0.
> > >
> > >
> > > Solr is the popular, blazing fast, open source NoSQL search
> platform
> > >>> from
> > > the Apache Lucene project. Its major features include powerful
> > >> full-text
> > > search, hit highlighting, faceted search, dynamic clustering,
> > database
> > > integration, rich document handling, and geospatial search. Solr is
> > >>> highly
> > > scalable, providing fault tolerant distributed search and indexing,
> > >> and
> > > powers the search and navigation features of many of the world's
> > >> largest
> > > 

Re: [ANNOUNCE] Apache Solr 8.6.0 released

2020-07-16 Thread Colvin Cowie
Perhaps the deprecation notices should feature on
https://lucene.apache.org/solr/news.html ? Because right now, they're not *very
*visible in the changes.

On Thu, 16 Jul 2020 at 01:18, Aroop Ganguly 
wrote:

> May we ask what in hdfs support is being deprecated? Is Hdfs backup and
> restore being deprecated ?
>
> Sent from my iPhone
>
> > On Jul 15, 2020, at 3:41 PM, Houston Putman 
> wrote:
> >
> > To address your concern Bernd,
> >
> > The goal of this deprecation is not to remove the functionality entirely.
> > The primary purpose is to remove the code from Solr Core. Before
> removing a
> > feature we aim to either:
> >
> >   - Move the code to another repository, and have it be loadable via a
> >   plugin
> >   - Replace the feature with something more stable and/or scalable.
> >   (Likely loadable via a plugin or run in a separate application)
> >
> > I understand your frustration, but the ultimate goal here is to make Solr
> > more customizable and plugable (and therefore learner by default). This
> way
> > the base Solr functionality can be as bug-free and performant as
> possible,
> > and any extra features can be added on top as needed.
> >
> > We would appreciate feedback for how the community would prefer these
> > features be provided in the future, so that we make the transition
> smoother
> > and the end product better.
> >
> > - Houston
> >
> >> On Wed, Jul 15, 2020 at 5:51 PM Ishan Chattopadhyaya <
> >> ichattopadhy...@gmail.com> wrote:
> >>
> >> On Wed, 15 Jul, 2020, 8:37 pm Bernd Fehling, <
> >> bernd.fehl...@uni-bielefeld.de>
> >> wrote:
> >>
> >>>
> >>>
> >>> Am 15.07.20 um 16:07 schrieb Ishan Chattopadhyaya:
>  Dear Solr Users,
> 
>  In this release (Solr 8.6), we have deprecated the following:
> 
>   1. Data Import Handler
> 
>   2. HDFS support
> 
>   3. Cross Data Center Replication (CDCR)
> 
> >>>
> >>> Seriously? :-(
> >>>
> >>
> >> Please see SOLR-14022.
> >>
> >>
> >>> So next steps will be kicking out Cloud and go back to single node or
> >> what?
> >>>
> >>
> >> Not something we've considered yet.
> >>
> >>
> >>> Why don't you just freeze the whole Solr development and switch to
> >> Elastic?
> >>>
> >>
> >> Not something we've considered yet.
> >>
> >>
> >>>
> 
> 
>  All of these are scheduled to be removed in a future 9.x release.
> 
>  It was decided that these components did not meet the standards of
> >>> quality
>  and support that we wish to ensure for all components we ship. Some of
>  these also relied on design patterns that we no longer recommend for
> >> use
> >>> in
>  critical production environments.
> 
>  If you rely on these features, you are encouraged to try out community
>  supported versions of these, where available [0]. Where such community
>  support is not available, we encourage you to participate in the
> >>> migration
>  of these components into community supported packages and help
> continue
> >>> the
>  development. We envision that using packages for these components via
>  package manager will actually make it easier for users to use such
> >>> features.
> 
>  Regards,
> 
>  Ishan Chattopadhyaya
> 
>  (On behalf of the Apache Lucene/Solr PMC)
> 
>  [0] -
> 
> >>>
> >>
> https://cwiki.apache.org/confluence/display/SOLR/Community+supported+packages+for+Solr
> 
>  On Wed, Jul 15, 2020 at 2:30 PM Bruno Roustant <
> >> bruno.roust...@gmail.com
> 
>  wrote:
> 
> > The Lucene PMC is pleased to announce the release of Apache Solr
> >> 8.6.0.
> >
> >
> > Solr is the popular, blazing fast, open source NoSQL search platform
> >>> from
> > the Apache Lucene project. Its major features include powerful
> >> full-text
> > search, hit highlighting, faceted search, dynamic clustering,
> database
> > integration, rich document handling, and geospatial search. Solr is
> >>> highly
> > scalable, providing fault tolerant distributed search and indexing,
> >> and
> > powers the search and navigation features of many of the world's
> >> largest
> > internet sites.
> >
> >
> > Solr 8.6.0 is available for immediate download at:
> >
> >
> >  
> >
> >
> > ### Solr 8.6.0 Release Highlights:
> >
> >
> > * Cross-Collection Join Queries: Join queries can now work
> > cross-collection, even when shared or when spanning nodes.
> >
> > * Search: Performance improvement for some types of queries when
> >> exact
> > hit count isn't needed by using BlockMax WAND algorithm.
> >
> > * Streaming Expression: Percentiles and standard deviation
> >> aggregations
> > added to stats, facet and time series.  Streaming expressions added
> to
> > /export handler.  Drill Streaming Expression for efficient and
> >> accurate
> > high cardinality aggregation.
> >
> 

RE: Disk usage with useDocValuesAsStored

2020-07-16 Thread Gael Jourdan-Weil
Ok, makes sense.
Thanks for your answer Erick.

Gaël


De : Erick Erickson 
Envoyé : mercredi 15 juillet 2020 22:53
À : solr-user@lucene.apache.org 
Objet : Re: Disk usage with useDocValuesAsStored

You’re off track a bit. useDocValuesAsStored has no effect on the size on disk. 
It’s purely a runtime option that pulls the data to return from either the 
stored or docValues parts of the index. If you change the definition and 
reindex, you should see significant differences in the size of your index, 
particularly the “*.fdt/*.fdx” and “*.dvd*I.dvm” files, where stored and 
docValues are kept respectively.

However, it’s also apples and oranges. Specifically, using docValues as stored 
will _not_ necessarily return the fields the same way they were sent in the 
multiValued case. The docValues data is kept as a SORTED_SET, which means it’s 
both lexically sorted and deduplicated. So input like “a” “z” “h” “a” will 
return “a” “h” “z”.

Best,
Erick

> On Jul 15, 2020, at 1:35 PM, Gael Jourdan-Weil 
>  wrote:
>
> Hello,
>
> I was wondering if we can expect significant disk usage reduction (index 
> size) if we move from fields defined as "docValues=true + stored=true" to 
> "docValues=true + stored=false" (with useDocValuesAsStored=true as default in 
> both cases)?
>
> Considering the use case we are targeting is only Streaming Expression with 
> /export handler, I also understand that we might also set 
> useDocValuesAsStored=false from what is described at 
> https://lucene.apache.org/solr/guide/8_4/docvalues.html.
> If so, would setting useDocValuesAsStored=false help reduce the index size as 
> well?
>
> We will obviously try it and see by ourselves the results but I was wondering 
> if you already have an idea about it.
> Also if you have any good link to how data are physically stored depending on 
> the fields options (indexed/stored/docValues), this could really be 
> interesting.
>
> Thanks,
> Gaël



AW: Elevation with distributed search causes NPE

2020-07-16 Thread Marc Linden
Thanks Erick for your fast response.

I've checked out adding the sort param and yes that vanished the problem but it 
also disables elevation if I'm not mistaken. So after adding 
forceElevation=true to the query then a ClassCastException was thrown:

http://localhost:9983/solr/core1/select?q=elevatedTerm=false=text_en=edismax=lang:en=localhost:9983/solr/core1,localhost:9983/solr/core2=[elevated],[shard],area,id=10=0=area%20asc=true

java.lang.ClassCastException: java.lang.Integer cannot be cast to 
java.lang.String
  at 
org.apache.solr.schema.FieldType.unmarshalStringSortValue(FieldType.java:1229)
  at org.apache.solr.schema.StrField.unmarshalSortValue(StrField.java:122)
  at 
org.apache.solr.handler.component.QueryComponent.unmarshalSortValues(QueryComponent.java:1092)
  at 
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:917)
  at 
org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:613)
  at 
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:592)
  at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:431)
  at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:2578)
  at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:780)
  ...

Best regards,
Marc

-Ursprüngliche Nachricht-
Von: Erick Erickson 
Gesendet: Mittwoch, 15. Juli 2020 14:32
An: solr-user@lucene.apache.org
Betreff: Re: Elevation with distributed search causes NPE

Hmmm, looking at the code that line looks like this:

sortSpec.getSort().getSort();

I’m curious what happens if you specify a sort on the query? If that makes the 
problem go away, it’s a smoking gun.

Whether or not adding sorting makes the problem go away, this looks like 
something that’s a legitimate JIRA, please go ahead and raise one.

Best,
Erick

> On Jul 15, 2020, at 4:34 AM, Marc Linden  
> wrote:
>
> Hi all,
>
> I'm facing the problem that Solr is throwing a NullPointerException when 
> performing a distributed search with multiple shards having elevation 
> configured where one or more shards do have elevated results but others do 
> not.
>
> We are using Solr 8.2 and have the QueryElevationComponent configured with 
> "last-components" of the default search handler "/select". But the problem 
> also occurs when using the explicit "/elevate" search handler.
>   ...
> 
>  elevator
> 
>  
>  ...
>   >
>   name="queryFieldType">string
> elevate.xml
>  
>
> ### Steps to reproduce:
>
> (1) Add entries to the elevate.xml of each core to elevate a specific 
> document for the text "searchTerm":
>
>   core1:
>   
> ...
> 
>   
>   core2:
>   
> ...
> 
>   
>
> (2) Execute query (we use port 9983):
> http://localhost:9983/solr/web/select?q=elevatedTerm
> s=false=text_en=edismax=lang:en=localhost:9983/so
> lr/core1,localhost:9983/solr/core2=[elevated],[shard],area,id=
> 10=0
>
> Now as both shards have elevated documents for the requested "searchTerm" the 
> search results are as expected:
>
> response: {
> numFound: 5192,
> start: 0,
> maxScore: 1.9032197,
> docs: [{
> area: "press",
> id: "core1docId1",
> [elevated]: true,
> [shard]: "localhost:9983/solr/core1"
> }, {
> area: "products",
> id: "core2docId1",
> [elevated]: true,
> [shard]: "localhost:9983/solr/core2"
> }, {
> area: "press",
> id: "core1docId2",
> [elevated]: false,
> [shard]: "localhost:9983/solr/core1"
> },
> ...
>
> (3) Remove the elevation entry for that "searchTerm" from one of the
> cores, e.g. via comment
>
>   core2:
>   
> ...
> 
>   
>
>
> (4) Reload the modified core:
> http://localhost:9983/solr/admin/cores?action=RELOAD=core2
>
> (5) Request same query again and you get the NPE:
>
> error: {
> trace: "java.lang.NullPointerException
>   at 
> org.apache.solr.handler.component.QueryComponent.unmarshalSortValues(QueryComponent.java:1068)
>   at 
> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:917)
>   at 
> org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:613)
>   at 
> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:592)
>   at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:431)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2578)
>   at 
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:780)
>   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:566)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:423)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:350)
>   at 
>