SynonymFilterFactory deprecated, documentation and search

2020-07-29 Thread Jayadevan Maymala
Hi all,

We have been using SynonymFilterFactory with Solr 7.3. It seems to be
working,
Going through the documentation for 8.6, I noticed that it was deprecated a
long time ago, probably before 7.3
The documentation at this url, for version 8.6 -
https://lucene.apache.org/solr/guide/8_6/field-type-definitions-and-properties.html
does give  as an
example.
Two doubts -
Does a deprecated class continue working?
Shouldn't the documentation be updated to modify the example?

A request - if the documentation at the url mentioned above has a search,
that will really help. I could find only a Page Title lookup.

Regards,
Jayadevan


Forwarding a request to a stopped instance

2020-07-29 Thread Taisuke Miyazaki
Current Configuration
Solr Version: 7.5.0
Operating mode: solrcloud
Number of shards: 1

Configuration of nodes:
1 tlog reader (static)
One tlog replica (static)
Multiple pull replicas (started dynamically by AutoScalingGroup in aws)


Startup activity :

1. instance launched by AutoScalingGroup of AWS
2. Solr starts in systemd.
3. systemd's ExecStartPost will join the cluster as a pull type replica by
throwing a request such as the following
curl -sSf "
http://localhost:8983/solr/admin/collections?action=ADDREPLICA&collection=${COLLECTION}&shard=shard1&node=${IP}:8983_solr
&type=pull

Stop activity:
1. systemd's ExecStop stops solr after throwing the following request: curl
-Ss "${COLLECTION}&shard=shard1&replica=${CORE}
curl -Ss "
http://localhost:8983/solr/admin/collections?action=DELETEREPLICA&collection=${COLLECTION}&shard=shard1&replica=${CORE}
"


Problem :
The request is forwarded to another replica because the instance is started
and cannot process itself until the replication completes. (Right?)
In most cases, this is not a problem, but the problem occurs when the
instance is started and stopped at the same time.

- Instance startup forwards the request to another replica
- Removing replicas by stopping the instance
When all of these things work at the same time, a 500 series error occurs
in the newly started instance. The errors are as follows.
``.
org.apache.solr.common.SolrException: Error trying to proxy request for url

This happens because the instance that is launched forwards the request to
the stopped instance before it receives a notification from ZooKeeper.

I'm trying to figure out how to do the following at the moment, but are
there any other workarounds that look good?  Or will the problem be solved
by upgrading?
- DELETEREPLICA and then wait a few seconds to stop solr

Thank you.

Translated with www.DeepL.com/Translator (free version)


Re: solr query returns items with spaces removed

2020-07-29 Thread Erick Erickson
In high throughput situations that can be a problem. The entire
packet has to be assembled and transmitted over the network. This
can cause grief in many situations.

 Not to mention that for “regular” queries, say using the /select or /query
handlers and assuming you’re getting
one or more stored-but-not-docValues fields, that means 2+m seeks
of the disk, decompressing 2+m 16K blocks, creating the entier 2M+
packet in memory and transmitting it to the client.

This doesn’t apply to, say, the /export handler upon which streaming
is built.

For a low-query-volume situations where there are just a few simultaneous
queries you can get away with it. But it’s still an anti-pattern.

Again, though, none of that is relevant if you’re using anything built on
the /export handler, which includes almost all of the streaming capabilities.

Best,
Erick

> On Jul 29, 2020, at 4:59 PM, David Hastings  
> wrote:
> 
> "Oh, and returning 100K docs is an anti-pattern, if you really need that
> many docs consider cursorMark and/or Streaming."
> 
> er, i routinely ask for 2+ million records into a single file based on a
> query.  I mean not into a web application or anything, its meant to be
> processed after the fact, but solr has no issue doing this
> 
> 
> 
> On Wed, Jul 29, 2020 at 4:53 PM Erick Erickson 
> wrote:
> 
>> I don’t think there’s really a canned way to do what you’re asking. A
>> custom DocTransformer would probably do the trick though.
>> 
>> You could also create a custom QueryComponent that examined the docs being
>> returned and inserted a blank field for a selected number of fields
>> (possibly configurable in solrconfig.xml).
>> 
>> Oh, and returning 100K docs is an anti-pattern, if you really need that
>> many docs consider cursorMark and/or Streaming.
>> 
>> Best,
>> Erick
>> 
>>> On Jul 29, 2020, at 2:55 PM, Teresa McMains 
>> wrote:
>>> 
>>> Thanks so much.  Is there any other way to return the data value if it
>> exists, otherwise an empty string?  I'm integrating this with a 3rd party
>> app which I can't change. When the field is null it isn't showing up in the
>> output.
>>> 
>>> -Original Message-
>>> From: Erick Erickson 
>>> Sent: Wednesday, July 29, 2020 12:49 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: solr query returns items with spaces removed
>>> 
>>> The “def” function goes after the _indexed_ value, so that’s what you’re
>> getting back. Try just specifying “fl=INSTRUCTIONS”, and if the value is
>> stored that should return the original field value before any analysis is
>> done.
>>> 
>>> Why are you using the def function? If the field is absent from the doc,
>> nothing will be returned for that field, not even the name. Are you trying
>> to insure that a blank field is returned if the field isn’t in the
>> document? You can handle that on the client side if so…
>>> 
>>> Best,
>>> Erick
>>> 
 On Jul 29, 2020, at 10:34 AM, Teresa McMains 
>> wrote:
 
 _20_Instructions_And_Notes:def(INSTRUCTIONS,%22%22)
>>> 
>> 
>> 



Re: solr query returns items with spaces removed

2020-07-29 Thread David Hastings
"Oh, and returning 100K docs is an anti-pattern, if you really need that
many docs consider cursorMark and/or Streaming."

er, i routinely ask for 2+ million records into a single file based on a
query.  I mean not into a web application or anything, its meant to be
processed after the fact, but solr has no issue doing this



On Wed, Jul 29, 2020 at 4:53 PM Erick Erickson 
wrote:

> I don’t think there’s really a canned way to do what you’re asking. A
> custom DocTransformer would probably do the trick though.
>
> You could also create a custom QueryComponent that examined the docs being
> returned and inserted a blank field for a selected number of fields
> (possibly configurable in solrconfig.xml).
>
> Oh, and returning 100K docs is an anti-pattern, if you really need that
> many docs consider cursorMark and/or Streaming.
>
> Best,
> Erick
>
> > On Jul 29, 2020, at 2:55 PM, Teresa McMains 
> wrote:
> >
> > Thanks so much.  Is there any other way to return the data value if it
> exists, otherwise an empty string?  I'm integrating this with a 3rd party
> app which I can't change. When the field is null it isn't showing up in the
> output.
> >
> > -Original Message-
> > From: Erick Erickson 
> > Sent: Wednesday, July 29, 2020 12:49 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: solr query returns items with spaces removed
> >
> > The “def” function goes after the _indexed_ value, so that’s what you’re
> getting back. Try just specifying “fl=INSTRUCTIONS”, and if the value is
> stored that should return the original field value before any analysis is
> done.
> >
> > Why are you using the def function? If the field is absent from the doc,
> nothing will be returned for that field, not even the name. Are you trying
> to insure that a blank field is returned if the field isn’t in the
> document? You can handle that on the client side if so…
> >
> > Best,
> > Erick
> >
> >> On Jul 29, 2020, at 10:34 AM, Teresa McMains 
> wrote:
> >>
> >> _20_Instructions_And_Notes:def(INSTRUCTIONS,%22%22)
> >
>
>


Re: solr query returns items with spaces removed

2020-07-29 Thread Erick Erickson
I don’t think there’s really a canned way to do what you’re asking. A custom 
DocTransformer would probably do the trick though.

You could also create a custom QueryComponent that examined the docs being 
returned and inserted a blank field for a selected number of fields (possibly 
configurable in solrconfig.xml).

Oh, and returning 100K docs is an anti-pattern, if you really need that many 
docs consider cursorMark and/or Streaming.

Best,
Erick

> On Jul 29, 2020, at 2:55 PM, Teresa McMains  wrote:
> 
> Thanks so much.  Is there any other way to return the data value if it 
> exists, otherwise an empty string?  I'm integrating this with a 3rd party app 
> which I can't change. When the field is null it isn't showing up in the 
> output.
> 
> -Original Message-
> From: Erick Erickson  
> Sent: Wednesday, July 29, 2020 12:49 PM
> To: solr-user@lucene.apache.org
> Subject: Re: solr query returns items with spaces removed
> 
> The “def” function goes after the _indexed_ value, so that’s what you’re 
> getting back. Try just specifying “fl=INSTRUCTIONS”, and if the value is 
> stored that should return the original field value before any analysis is 
> done.
> 
> Why are you using the def function? If the field is absent from the doc, 
> nothing will be returned for that field, not even the name. Are you trying to 
> insure that a blank field is returned if the field isn’t in the document? You 
> can handle that on the client side if so…
> 
> Best,
> Erick
> 
>> On Jul 29, 2020, at 10:34 AM, Teresa McMains  
>> wrote:
>> 
>> _20_Instructions_And_Notes:def(INSTRUCTIONS,%22%22)
> 



RE: solr query returns items with spaces removed

2020-07-29 Thread Teresa McMains
Thanks so much.  Is there any other way to return the data value if it exists, 
otherwise an empty string?  I'm integrating this with a 3rd party app which I 
can't change. When the field is null it isn't showing up in the output.

-Original Message-
From: Erick Erickson  
Sent: Wednesday, July 29, 2020 12:49 PM
To: solr-user@lucene.apache.org
Subject: Re: solr query returns items with spaces removed

The “def” function goes after the _indexed_ value, so that’s what you’re 
getting back. Try just specifying “fl=INSTRUCTIONS”, and if the value is stored 
that should return the original field value before any analysis is done.

Why are you using the def function? If the field is absent from the doc, 
nothing will be returned for that field, not even the name. Are you trying to 
insure that a blank field is returned if the field isn’t in the document? You 
can handle that on the client side if so…

Best,
Erick

> On Jul 29, 2020, at 10:34 AM, Teresa McMains  
> wrote:
> 
> _20_Instructions_And_Notes:def(INSTRUCTIONS,%22%22)



Re: solr query returns items with spaces removed

2020-07-29 Thread Erick Erickson
The “def” function goes after the _indexed_ value, so that’s what you’re 
getting back. Try just specifying “fl=INSTRUCTIONS”, and if the value is stored 
that should return the original field value before any analysis is done.

Why are you using the def function? If the field is absent from the doc, 
nothing will be returned for that field, not even the name. Are you trying to 
insure that a blank field is returned if the field isn’t in the document? You 
can handle that on the client side if so…

Best,
Erick

> On Jul 29, 2020, at 10:34 AM, Teresa McMains  
> wrote:
> 
> _20_Instructions_And_Notes:def(INSTRUCTIONS,%22%22)



solr query returns items with spaces removed

2020-07-29 Thread Teresa McMains
I am sure I'm doing something silly. Basically it looks like my data is being 
altered upon search.

This is my fieldType:



























I have a string field called "INSTRUCTIONS" using this field type that looks 
like this:

ABC_D= PAYMENT FOR CONTRACT AX3764-MP-000-37

With a URL like the one below, I return a bunch of columns of data:

/solr/aml/select?q=TRANSACTION_REFERENCE_NUMBER%3A%22${transactionReferenceNumber}%22&fq=doc_type%3Atrxn&wt=json&fl=_1_Trigger:def(TRIGGER_IND,%22N%22),_2_Transaction_No:TRANSACTION_REFERENCE_NUMBER,_3_Date:TRANSACTION_DATE,_4_Amount:CURRENCY_AMOUNT,_20_Instructions_And_Notes:def(INSTRUCTIONS,%22%22),_21_Transaction_Type:def(TRANSACTION_CDI_DESC,%22%22)&rows=10

But the data being returned for "INSTRUCTIONS" looks like this:
ABCDPAYMENTFORCONTRACTAX3764MP00037

All spaces and special characters removed.  I thought the field Type filters 
would impact the index and the query lookup but not the data.
What's even weirder is that other fields that also use this field type (like 
transaction reference number) do not show the same behavior.

For example a transaction_reference_number like 123-456-7890 is returned 
correctly.


Can anyone please help me understand or troubleshoot?

Thank you so much,
Teresa




Re: Production sizing and scaling guidelines -- Solr

2020-07-29 Thread Prashant Jyoti
Thanks for that Colvin. Even though it's a bit dated but it sure does help
in getting an idea.

I definitely remember seeing a list of these sorts of blogs somewhere a
> long time ago... don't know where though
>
By any chance you stumble upon it, please feel free to share even at a
later date.

On Tue, Jul 28, 2020 at 8:32 PM Colvin Cowie 
wrote:

> Maybe not the most up to date or relevant example for your usage but
>
> https://sbdevel.wordpress.com/2016/11/30/70tb-16b-docs-4-machines-1-solrcloud/
> is one that sticks in my mind
> I definitely remember seeing a list of these sorts of blogs somewhere a
> long time ago... don't know where though
>
> On Tue, 28 Jul 2020 at 13:50, Prashant Jyoti  wrote:
>
> > Thanks Erick.
> >
> > 1> does Solr do what you want? You’re talking about reporting, and Solr
> is
> > > primarily a search engine. That said, it has tons of analytics
> > capabilities
> > > built in. Depends on what “reporting” means in your situation.
> > >
> > There is a reporting UI which has various criteria the user can filter
> on,
> > the data for this UI will be indexed to and fetched from Solr. These are
> > basically call logs of the user's interaction with tech support. The
> > documents would be at max a few MBs in size.
> >
> > > 2> how expensive is it?
> >
> > I am looking at what kind of a setup is considered okay to handle, let's
> > say, average loads to start with (I am not considering billions of
> > documents/day to be an average load at my place, that would be
> higher-end),
> > with the scope of scaling as and when the load increases.
> >
> > I went through the linked article in your answer and understand your
> > viewpoint, but that said even I am looking for some averages ;)
> > Unable to find any authentic blogs which detail their usage of Solr.
> >
> > On Tue, Jul 28, 2020 at 5:19 PM Erick Erickson 
> > wrote:
> >
> > > Here’s a list of some sites using Solr:
> > > https://cwiki.apache.org/confluence/display/solr/PublicServers
> > >
> > > It’s not really what you’re looking for though, it doesn’t really have
> > the
> > > details you’d like.
> > >
> > > There are two dimensions here:
> > >
> > > 1> does Solr do what you want? You’re talking about reporting, and Solr
> > is
> > > primarily a search engine. That said, it has tons of analytics
> > capabilities
> > > built in. Depends on what “reporting” means in your situation.
> > >
> > > 2> how expensive is it? Here “expensive” means hardware and support.
> > > Unfortunately that’s un-answerable. This is really “the sizing
> question”,
> > > and there are too many variables to work with. If you want some backup
> > for
> > > why this is an unfair question to answer in the abstract, see:
> > >
> >
> https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> > >
> > > What I’d recommend is to ask for enough resources to create a PoC on an
> > > existing bit of hardware, your workstation/laptop would do. For a PoC,
> > > there’s no reason to even have 3 Zookeepers, I routinely run with just
> > one
> > > (although I do use an external-to-Solr zookeeper). I’d start with two
> > > shards, leader-only, just to be sure you take into account how
> SolrCloud
> > > works. I wouldn’t get fancy here, just take your first guess at how it
> > will
> > > all work and index a bunch of documents (say 10,000,000) and see if you
> > can
> > > get Solr to create the data for your reports. At that point, you have
> > some
> > > data to work with, i.e. how big your indexes are, whether Solr’s
> > > capabilities meet your functional requirements etc.
> > >
> > > You can infer that I consider 10,000,000 documents a small Solr
> > > installation, with the caveat that if the docs are each gigabytes in
> > length
> > > all bets are off. I’ve worked with clients who index billions of
> > > documents/day (yes billion) admittedly they had a very large hardware
> > > budget ;).  I’ve seen 300M docs (each reasonably complex and a few K
> > each)
> > > fit comfortably on a machine with 12G allocated to Solr (64G total
> > physical
> > > memory IIRC).
> > >
> > > So, It Depends (tm)...
> > >
> > > Good luck!
> > > Erick
> > >
> > > > On Jul 28, 2020, at 7:26 AM, Prashant Jyoti 
> > > wrote:
> > > >
> > > > Hi,
> > > > I wanted to check if anybody has any references for tech companies'
> > blogs
> > > > detailing their Solr setup in production. I am more interested in
> > storage
> > > > and scaling guidelines. I intend to use Solr for one of my projects
> at
> > > > work(back-end for a reporting tool) and need to convince higher
> > > management
> > > > that it is indeed the right solution. I have gone through the
> material
> > > > available in the Solr reference guide, I am looking for some details
> > > from a
> > > > working production setup.
> > > >
> > > > Thanks!
> > > > --
> > > > Regards,
> > > > Prashant.
> > >
> > >
> >
> > --
> > Regards,
> > Prashant.
> >
>


-- 
Regards,
Prashant.


Re: JsonLayout breaks logging?

2020-07-29 Thread t spam
>
>
> Hi Naz and other solr-users (now with solr-user in to),
>
> Excuse my ignorance here (just getting started) but let's take the
> techproducts example. As you proposed I included the latest jackson-core
> and jackson-databind jars in the "solr install dir/lib/" directory:
>
> [tijmen@solr-1 solr-7.7.3]$ ls -la lib/
> total 1732
> drwxrwxr-x.  2 tijmen tijmen  72 Jul 25 08:06 .
> drwxr-xr-x. 10 tijmen tijmen 212 Jul 25 07:45 ..
> -rw-rw-r--.  1 tijmen tijmen  351575 Jul 25 07:48 jackson-core-2.11.1.jar
> -rw-rw-r--.  1 tijmen tijmen 1419800 Jul 25 07:48
> jackson-databind-2.11.1.jar
>
> I then added a lib directive to include the jackson jars
> in: example/techproducts/solr/techproducts/conf/solrconfig.xml
>
>regex=".*\.jar" />
>regex="solr-cell-\d.*\.jar" />
>
>regex=".*\.jar" />
>regex="solr-clustering-\d.*\.jar" />
>
>regex=".*\.jar" />
>regex="solr-langid-\d.*\.jar" />
>
>regex="solr-ltr-\d.*\.jar" />
>
>regex=".*\.jar" />
>regex="solr-velocity-\d.*\.jar" />
>
>/>
>
> I start solr using:
>
> [tijmen@solr-1 solr-7.7.3]$ ./bin/solr stop -e techproducts
>
> Unfortunately I get the same result. Solr starts but no logging.
>
> Whenever I remove the JsonLayout from the log4j2.xml it starts logging as
> expected.
>
> Thanks,
>
> Tijmen
>
> On Fri, Jul 24, 2020 at 6:48 PM Naz S  wrote:
>
>> Hi Tijmen,
>>
>> If you use maven, for example, you can add dependencies in pom.xml.
>>
>> For example,
>> 
>> 
>> com.fasterxml.jackson.core
>> jackson-core
>> 2.11.1
>> 
>> 
>> com.fasterxml.jackson.core
>> jackson-databind
>> 2.11.1
>> 
>> 
>>
>> On Fri, Jul 24, 2020 at 1:37 PM t spam  wrote:
>>
>>> Hi Naz,
>>>
>>> Could you give me some directions in where or how I should provide these
>>> dependencies? I can see these dependencies are already in various places by
>>> default:
>>>
>>> [tijmen@solr-1 solr-7.7.3]$ find . -name jackson*
>>> ./contrib/clustering/lib/jackson-annotations-2.9.8.jar
>>> ./contrib/clustering/lib/jackson-databind-2.9.8.jar
>>> ./contrib/prometheus-exporter/lib/jackson-annotations-2.9.8.jar
>>> ./contrib/prometheus-exporter/lib/jackson-core-2.9.8.jar
>>> ./contrib/prometheus-exporter/lib/jackson-databind-2.9.8.jar
>>> ./contrib/prometheus-exporter/lib/jackson-jq-0.0.8.jar
>>> ./licenses/jackson-annotations-2.9.8.jar.sha1
>>> ./licenses/jackson-annotations-LICENSE-ASL.txt
>>> ./licenses/jackson-annotations-NOTICE.txt
>>> ./licenses/jackson-core-2.9.8.jar.sha1
>>> ./licenses/jackson-core-LICENSE-ASL.txt
>>> ./licenses/jackson-core-NOTICE.txt
>>> ./licenses/jackson-core-asl-1.9.13.jar.sha1
>>> ./licenses/jackson-core-asl-LICENSE-ASL.txt
>>> ./licenses/jackson-core-asl-NOTICE.txt
>>> ./licenses/jackson-databind-2.9.8.jar.sha1
>>> ./licenses/jackson-databind-LICENSE-ASL.txt
>>> ./licenses/jackson-databind-NOTICE.txt
>>> ./licenses/jackson-dataformat-smile-2.9.8.jar.sha1
>>> ./licenses/jackson-dataformat-smile-LICENSE-ASL.txt
>>> ./licenses/jackson-dataformat-smile-NOTICE.txt
>>> ./licenses/jackson-jq-0.0.8.jar.sha1
>>> ./licenses/jackson-jq-LICENSE-ASL.txt
>>> ./licenses/jackson-jq-NOTICE.txt
>>> ./licenses/jackson-mapper-asl-1.9.13.jar.sha1
>>> ./licenses/jackson-mapper-asl-LICENSE-ASL.txt
>>> ./licenses/jackson-mapper-asl-NOTICE.txt
>>> ./server/solr-webapp/webapp/WEB-INF/lib/jackson-annotations-2.9.8.jar
>>> ./server/solr-webapp/webapp/WEB-INF/lib/jackson-core-2.9.8.jar
>>> ./server/solr-webapp/webapp/WEB-INF/lib/jackson-core-asl-1.9.13.jar
>>> ./server/solr-webapp/webapp/WEB-INF/lib/jackson-databind-2.9.8.jar
>>>
>>> ./server/solr-webapp/webapp/WEB-INF/lib/jackson-dataformat-smile-2.9.8.jar
>>> ./server/solr-webapp/webapp/WEB-INF/lib/jackson-mapper-asl-1.9.13.jar
>>>
>>> Thanks for your time.
>>>
>>> Tijmen
>>>
>>> On Fri, Jul 24, 2020 at 1:16 PM Naz S  wrote:
>>>

 You should explicitly provide the jackson dependencies: jackson-core,
 jackson-databind and/or jackson-annotations.

 On Fri, Jul 24, 2020 at 8:24 AM t spam  wrote:

> Hi,
>
> I'm having difficulty configuring JsonLayout for appenders. I have the
> following in my log4j2.xml:
>
> 
> 
>   
>
> 
>   
> 
>   %d{-MM-dd HH:mm:ss.SSS} %-5p (%t) [%X{collection}
> %X{shard}
> %X{replica} %X{core}] %c{1.} %m%n
> 
>   
> 
>
>  name="RollingFile"
> fileName="${sys:solr.log.dir}/solr.log"
> filePattern="${sys:solr.log.dir}/solr.log.%i" >
>   
>   
> 
> 
>   
>   
> 
>
>  name="SlowFile"
> fileName="${sys:solr.log.dir}/solr_slow_requests.log"
> filePattern="${sys:solr.log.dir}/solr_slow_requests.log.%i" >
>   
> 
>   %d{-MM-dd HH:mm:ss.SSS} %-5p (%t) [%X{collection}
> %X{shard}
> %X{replica} %X{core}] %c{1.} %m%n
> 
>