Re: Mismatch between replication API & index.properties

2019-07-31 Thread Aman Tandon
Yes, that is what my understanding is but if you see the Replication
handler response it is saying it is referring to the index folder not to
the one shown in index.properties. Due to that confusion I am not able to
delete the folder.

Is this some bug or default behavior where irrespective of the
index.properties it will always shows the index folder only.

Solr version - 6.6.2

On Wed, Jul 31, 2019, 21:17 jai dutt  wrote:

> It's correct behaviour , Solr put replica index file in this format only
> and you can find latest index pointing in index.properties file. Usually
> afer successful full replication Solr remove old timestamp dir.
>
> On Wed, 31 Jul, 2019, 8:02 PM Aman Tandon, 
> wrote:
>
> > Hi,
> >
> > We are having a situation where whole disk space is full and in server
> > where we are seeing the multiple index directories ending with the
> > timestamp. Upon checking the index.properties file for a particular shard
> > replica, it is not referring to the folder name *index *but when I am
> using
> > the replication API I am seeing it is pointing to *index *folder. Am I
> > missing something? Kindly advise.
> >
> > *directory*
> >
> >
> >
> > *drwxrwxr-x. 2 fusion fusion 69632 Jul 30 23:24 indexdrwxrwxr-x. 2 fusion
> > fusion 28672 Jul 31 03:02 index.20190731005047763drwxrwxr-x. 2 fusion
> > fusion  4096 Jul 31 10:20 index.20190731095757917*
> > -rw-rw-r--. 1 fusion fusion78  Jul 31 03:02 index.properties
> > -rw-rw-r--. 1 fusion fusion   296 Jul 31 09:56 replication.properties
> > drwxrwxr-x. 2 fusion fusion  4096 Jan 16  2019 snapshot_metadata
> > drwxrwxr-x. 2 fusion fusion  4096 Jul 30 23:24 tlog
> >
> > *index.properties*
> >
> > #index.properties
> > #Wed Jul 31 03:02:12 EDT 2019
> > index=index.20190731005047763
> >
> > *REPLICATION API STATUS*
> >
> > 
> > 280.56 GB
> > 
> > */opt/solr/x_shard4_replica3/data/index/*
> > 
> > ...
> > true
> > false
> > 1564543395563
> > 98884
> > ...
> > ...
> >
> > Regards,
> > Aman
> >
>


Indexing information on number of attachments and their names in EML file

2019-07-31 Thread Zheng Lin Edwin Yeo
Hi,

Would like to check, Is there anyway which we can detect the number of
attachments and their names during indexing of EML files in Solr, and index
those information into Solr?

Currently, Solr is able to use Tika and Tesseract OCR to extract the
contents of the attachments. However, I could not find the information
about the number of attachments in the EML file and what are their filename.

I am using Solr 7.6.0 in production, and also trying out on the new Solr
8.2.0.

Regards,
Edwin


Re: Solr 8.2.0 having issue with ZooKeeper 3.5.5

2019-07-31 Thread Zheng Lin Edwin Yeo
Yes. You can get my full solr.log from the link below. The error is there
when I tried to create collection1 (around line 170 to 300) .

https://drive.google.com/open?id=1qkMLTRJ4eDSFwbqr15wSqjbg4dJV-bGN

Regards,
Edwin


On Wed, 31 Jul 2019 at 18:39, Jan Høydahl  wrote:

> Please look for the full log file solr.log in your Solr server, and share
> it via some file sharing service or gist or similar for us to be able to
> decipher the collection create error.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 31. jul. 2019 kl. 08:33 skrev Zheng Lin Edwin Yeo  >:
> >
> > Hi,
> >
> > Regarding the issue, I have tried to put the following in zoo.cfg under
> > ZooKeeper:
> > 4lw.commands.whitelist=mntr,conf,ruok
> >
> > But it is still showing this error.
> > *"Errors: - membership: Check 4lq.commands.whitelist setting in zookeeper
> > configuration file."*
> >
> > As I am using SolrCloud, the collection config can still be loaded to
> > ZooKeeper as per normal. But if I tried to create a collection, I will
> get
> > the following error:
> >
> > {
> >  "responseHeader":{
> >"status":400,
> >"QTime":686},
> >  "failure":{
> >"192.168.1.2:8983
> _solr":"org.apache.solr.client.solrj.SolrServerException:IOException
> > occurred when talking to server at:http://192.168.1.2:8983/solr;,
> >"192.168.1.2:8984
> _solr":"org.apache.solr.client.solrj.SolrServerException:IOException
> > occurred when talking to server at:http://192.168.1.2:8984/solr"},
> >  "Operation create caused
> >
> exception:":"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> > Underlying core creation failed while creating collection: collection1",
> >  "exception":{f
> >"msg":"Underlying core creation failed while creating collection:
> > collection1",
> >"rspCode":400},
> >  "error":{
> >"metadata":[
> >  "error-class","org.apache.solr.common.SolrException",
> >  "root-error-class","org.apache.solr.common.SolrException"],
> >"msg":"Underlying core creation failed while creating collection:
> > collection1",
> >"code":400}}
> >
> > Is there anything which I may have missed out?
> >
> > Regards,
> > Edwin
> >
> > On Tue, 30 Jul 2019 at 10:05, Zheng Lin Edwin Yeo 
> > wrote:
> >
> >> Hi,
> >>
> >> I am using the new Solr 8.2.0 with SolrCloud and external ZooKeeper
> 3.5.5.
> >>
> >> However, after adding in the line under zoo.cfg
> >> *4lw.commands.whitelist=**
> >>
> >> I get the error under Cloud -> ZK Status in Solr
> >> *"Errors: - membership: Check 4lq.commands.whitelist setting in
> zookeeper
> >> configuration file."*
> >>
> >> I have noticed that the issue is cause by adding the "conf" in the
> >> whitelist. But if I do not add the "conf" to the whitelist, I will get
> the
> >> following error:
> >> *"Errors: - conf is not executed because it is not in the whitelist.
> Check
> >> 4lw.commands.whitelist setting in zookeeper configuration file."*
> >>
> >> What could be the issue that cause this error, and how can we resolve
> it.
> >>
> >> Thank you.
> >>
> >> Regards,
> >> Edwin
> >>
>
>


Re: Solr 8.2.0 having issue with ZooKeeper 3.5.5

2019-07-31 Thread Zheng Lin Edwin Yeo
Yes, I have restarted both Solr and ZooKeeper after the changes. In fact I
have tried to restart the whole system, but the problem still persists.

Below is my configuration for zoo.cfg.

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=C:\\solr\\zookeeper-3.5.5\\zookeeper1\\dataDir
# the port at which the clients will connect
clientPort=2181

server.1=localhost:2888:3888
server.2=localhost:2889:3889
server.3=localhost:2890:3890

4lw.commands.whitelist=mntr,conf,ruok


Regards,
Edwin

On Wed, 31 Jul 2019 at 19:04, Jörn Franke  wrote:

> Updated correct zoo.cfg? Did you restart zookeeper after config change ?
>
> > Am 30.07.2019 um 04:05 schrieb Zheng Lin Edwin Yeo  >:
> >
> > Hi,
> >
> > I am using the new Solr 8.2.0 with SolrCloud and external ZooKeeper
> 3.5.5.
> >
> > However, after adding in the line under zoo.cfg
> > *4lw.commands.whitelist=**
> >
> > I get the error under Cloud -> ZK Status in Solr
> > *"Errors: - membership: Check 4lq.commands.whitelist setting in zookeeper
> > configuration file."*
> >
> > I have noticed that the issue is cause by adding the "conf" in the
> > whitelist. But if I do not add the "conf" to the whitelist, I will get
> the
> > following error:
> > *"Errors: - conf is not executed because it is not in the whitelist.
> Check
> > 4lw.commands.whitelist setting in zookeeper configuration file."*
> >
> > What could be the issue that cause this error, and how can we resolve it.
> >
> > Thank you.
> >
> > Regards,
> > Edwin
>


[CVE-2019-0193] Apache Solr, Remote Code Execution via DataImportHandler

2019-07-31 Thread David Smiley
The DataImportHandler, an optional but popular module to pull in data from
databases and other sources, has a feature in which the whole DIH
configuration can come from a request's "dataConfig" parameter. The debug
mode of the DIH admin screen uses this to allow convenient debugging /
development of a DIH config. Since a DIH config can contain scripts, this
parameter is a security risk. Starting with version 8.2.0 of Solr, use of
this parameter requires setting the Java System property
"enable.dih.dataConfigParam" to true.

Mitigations:
* Upgrade to 8.2.0 or later, which is secure by default.
* or, edit solrconfig.xml to configure all DataImportHandler usages with an
"invariants" section listing the "dataConfig" parameter set to am empty
string.
* Ensure your network settings are configured so that only trusted traffic
communicates with Solr, especially to the DIH request handler.  This is a
best practice to all of Solr.

Credits:
* Michael Stepankin (JPMorgan Chase)

References:
* https://issues.apache.org/jira/browse/SOLR-13669
* https://cwiki.apache.org/confluence/display/solr/SolrSecurity

Please direct any replies as either comments in the JIRA issue above or to
solr-user@lucene.apache.org


Re: Mismatch between replication API & index.properties

2019-07-31 Thread jai dutt
It's correct behaviour , Solr put replica index file in this format only
and you can find latest index pointing in index.properties file. Usually
afer successful full replication Solr remove old timestamp dir.

On Wed, 31 Jul, 2019, 8:02 PM Aman Tandon,  wrote:

> Hi,
>
> We are having a situation where whole disk space is full and in server
> where we are seeing the multiple index directories ending with the
> timestamp. Upon checking the index.properties file for a particular shard
> replica, it is not referring to the folder name *index *but when I am using
> the replication API I am seeing it is pointing to *index *folder. Am I
> missing something? Kindly advise.
>
> *directory*
>
>
>
> *drwxrwxr-x. 2 fusion fusion 69632 Jul 30 23:24 indexdrwxrwxr-x. 2 fusion
> fusion 28672 Jul 31 03:02 index.20190731005047763drwxrwxr-x. 2 fusion
> fusion  4096 Jul 31 10:20 index.20190731095757917*
> -rw-rw-r--. 1 fusion fusion78  Jul 31 03:02 index.properties
> -rw-rw-r--. 1 fusion fusion   296 Jul 31 09:56 replication.properties
> drwxrwxr-x. 2 fusion fusion  4096 Jan 16  2019 snapshot_metadata
> drwxrwxr-x. 2 fusion fusion  4096 Jul 30 23:24 tlog
>
> *index.properties*
>
> #index.properties
> #Wed Jul 31 03:02:12 EDT 2019
> index=index.20190731005047763
>
> *REPLICATION API STATUS*
>
> 
> 280.56 GB
> 
> */opt/solr/x_shard4_replica3/data/index/*
> 
> ...
> true
> false
> 1564543395563
> 98884
> ...
> ...
>
> Regards,
> Aman
>


Re: Dataimport problem

2019-07-31 Thread Alexandre Rafalovitch
I wonder if you have some sort of JDBC pool enabled and/or the number
of worker threads is configured differently. Compare tomcat level
configuration and/or try thread dump of the java runtime when you are
stuck.

Or maybe something similar on the Postgres side.

Regards,
   Alex.

On Wed, 31 Jul 2019 at 10:36, Srinivas Kashyap  wrote:
>
> Hi,
> Hi,
>
> 1)Have you tried running _just_ your SQL queries to see how long they take to 
> respond and whether it responds with the full result set of batches
>
> The 9th request returns only 2 rows. This behaviour is happening for all the 
> cores which have more than 8 SQL requests. But the same is working fine with 
> AWS hosting. Really baffled.
>
> Thanks and Regards,
> Srinivas Kashyap
>
> -Original Message-
> From: Erick Erickson 
> Sent: 31 July 2019 08:00 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Dataimport problem
>
> This code is a little old, but should give you a place to start:
>
> https://lucidworks.com/post/indexing-with-solrj/
>
> As for DIH, my guess is that when you moved to Azure, your connectivity to 
> the DB changed, possibly the driver Solr uses etc., and your SQL query in 
> step 9 went from, maybe, batching rows to returning the entire result set or 
> similar weirdness. Have you tried running _just_ your SQL queries to see how 
> long they take to respond and whether it responds with the full result set of 
> batches?
>
> Best,
> Erick
>
> > On Jul 31, 2019, at 10:18 AM, Srinivas Kashyap  
> > wrote:
> >
> > Hi,
> >
> > 1) Solr on Tomcat has not been an option for quite a while. So, you must be 
> > running an old version of Solr. Which one?
> >
> > We are using Solr 5.2.1(WAR based deployment so)
> >
> >
> > 5) DIH is not actually recommended for production, more for exploration; 
> > you may want to consider moving to a stronger architecture given the 
> > complexity of your needs
> >
> > Can you please give pointers to look into, We are using DIH for production 
> > and facing few issues. We need to start phasing out
> >
> >
> > Thanks and Regards,
> > Srinivas Kashyap
> >
> > -Original Message-
> > From: Alexandre Rafalovitch 
> > Sent: 31 July 2019 07:41 PM
> > To: solr-user 
> > Subject: Re: Dataimport problem
> >
> > A couple of things:
> > 1) Solr on Tomcat has not been an option for quite a while. So, you must be 
> > running an old version of Solr. Which one?
> > 2) Compare that you have the same Solr config. In Admin UI, there will be 
> > all O/S variables passed to the Java runtime, I would check them 
> > side-by-side
> > 3) You can enable Dataimport(DIH) debug in Admin UI, so perhaps you can run 
> > a subset (1?) of the queries and see the difference
> > 4) Worst case, you may want to track this in between Solr and DB by using 
> > network analyzer (e.g. Wireshark). That may show you the actual queries, 
> > timing, connection issues, etc
> > 5) DIH is not actually recommended for production, more for exploration; 
> > you may want to consider moving to a stronger architecture given the 
> > complexity of your needs
> >
> > Regards,
> >   Alex.
> >
> > On Wed, 31 Jul 2019 at 10:04, Srinivas Kashyap  
> > wrote:
> >>
> >> Hello,
> >>
> >> We are trying to run Solr(Tomcat) on Azure instance and postgres being the 
> >> DB. When I run full import(my core has 18 SQL queries), for some reason, 
> >> the requests will go till 9 and it gets hung for eternity.
> >>
> >> But the same setup, solr(tomcat) and postgres database works fine with AWS 
> >> hosting.
> >>
> >> Am I missing some configuration? Please let me know.
> >>
> >> Thanks and Regards,
> >> Srinivas Kashyap
> >> 
> 
> DISCLAIMER:
> E-mails and attachments from Bamboo Rose, LLC are confidential.
> If you are not the intended recipient, please notify the sender immediately 
> by replying to the e-mail, and then delete it without making copies or using 
> it in any way.
> No representation is made that this email or any attachments are free of 
> viruses. Virus scanning is recommended and is the responsibility of the 
> recipient.


RE: Dataimport problem

2019-07-31 Thread Srinivas Kashyap
Hi,
Hi,

1)Have you tried running _just_ your SQL queries to see how long they take to 
respond and whether it responds with the full result set of batches

The 9th request returns only 2 rows. This behaviour is happening for all the 
cores which have more than 8 SQL requests. But the same is working fine with 
AWS hosting. Really baffled.

Thanks and Regards,
Srinivas Kashyap

-Original Message-
From: Erick Erickson 
Sent: 31 July 2019 08:00 PM
To: solr-user@lucene.apache.org
Subject: Re: Dataimport problem

This code is a little old, but should give you a place to start:

https://lucidworks.com/post/indexing-with-solrj/

As for DIH, my guess is that when you moved to Azure, your connectivity to the 
DB changed, possibly the driver Solr uses etc., and your SQL query in step 9 
went from, maybe, batching rows to returning the entire result set or similar 
weirdness. Have you tried running _just_ your SQL queries to see how long they 
take to respond and whether it responds with the full result set of batches?

Best,
Erick

> On Jul 31, 2019, at 10:18 AM, Srinivas Kashyap  
> wrote:
>
> Hi,
>
> 1) Solr on Tomcat has not been an option for quite a while. So, you must be 
> running an old version of Solr. Which one?
>
> We are using Solr 5.2.1(WAR based deployment so)
>
>
> 5) DIH is not actually recommended for production, more for exploration; you 
> may want to consider moving to a stronger architecture given the complexity 
> of your needs
>
> Can you please give pointers to look into, We are using DIH for production 
> and facing few issues. We need to start phasing out
>
>
> Thanks and Regards,
> Srinivas Kashyap
>
> -Original Message-
> From: Alexandre Rafalovitch 
> Sent: 31 July 2019 07:41 PM
> To: solr-user 
> Subject: Re: Dataimport problem
>
> A couple of things:
> 1) Solr on Tomcat has not been an option for quite a while. So, you must be 
> running an old version of Solr. Which one?
> 2) Compare that you have the same Solr config. In Admin UI, there will be all 
> O/S variables passed to the Java runtime, I would check them side-by-side
> 3) You can enable Dataimport(DIH) debug in Admin UI, so perhaps you can run a 
> subset (1?) of the queries and see the difference
> 4) Worst case, you may want to track this in between Solr and DB by using 
> network analyzer (e.g. Wireshark). That may show you the actual queries, 
> timing, connection issues, etc
> 5) DIH is not actually recommended for production, more for exploration; you 
> may want to consider moving to a stronger architecture given the complexity 
> of your needs
>
> Regards,
>   Alex.
>
> On Wed, 31 Jul 2019 at 10:04, Srinivas Kashyap  
> wrote:
>>
>> Hello,
>>
>> We are trying to run Solr(Tomcat) on Azure instance and postgres being the 
>> DB. When I run full import(my core has 18 SQL queries), for some reason, the 
>> requests will go till 9 and it gets hung for eternity.
>>
>> But the same setup, solr(tomcat) and postgres database works fine with AWS 
>> hosting.
>>
>> Am I missing some configuration? Please let me know.
>>
>> Thanks and Regards,
>> Srinivas Kashyap
>> 

DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.


Mismatch between replication API & index.properties

2019-07-31 Thread Aman Tandon
Hi,

We are having a situation where whole disk space is full and in server
where we are seeing the multiple index directories ending with the
timestamp. Upon checking the index.properties file for a particular shard
replica, it is not referring to the folder name *index *but when I am using
the replication API I am seeing it is pointing to *index *folder. Am I
missing something? Kindly advise.

*directory*



*drwxrwxr-x. 2 fusion fusion 69632 Jul 30 23:24 indexdrwxrwxr-x. 2 fusion
fusion 28672 Jul 31 03:02 index.20190731005047763drwxrwxr-x. 2 fusion
fusion  4096 Jul 31 10:20 index.20190731095757917*
-rw-rw-r--. 1 fusion fusion78  Jul 31 03:02 index.properties
-rw-rw-r--. 1 fusion fusion   296 Jul 31 09:56 replication.properties
drwxrwxr-x. 2 fusion fusion  4096 Jan 16  2019 snapshot_metadata
drwxrwxr-x. 2 fusion fusion  4096 Jul 30 23:24 tlog

*index.properties*

#index.properties
#Wed Jul 31 03:02:12 EDT 2019
index=index.20190731005047763

*REPLICATION API STATUS*


280.56 GB

*/opt/solr/x_shard4_replica3/data/index/*

...
true
false
1564543395563
98884
...
...

Regards,
Aman


RE: Dataimport problem

2019-07-31 Thread Srinivas Kashyap
Hi,

1) Solr on Tomcat has not been an option for quite a while. So, you must be 
running an old version of Solr. Which one?

We are using Solr 5.2.1(WAR based deployment so)


5) DIH is not actually recommended for production, more for exploration; you 
may want to consider moving to a stronger architecture given the complexity of 
your needs

Can you please give pointers to look into, We are using DIH for production and 
facing few issues. We need to start phasing out


Thanks and Regards,
Srinivas Kashyap
            
-Original Message-
From: Alexandre Rafalovitch  
Sent: 31 July 2019 07:41 PM
To: solr-user 
Subject: Re: Dataimport problem

A couple of things:
1) Solr on Tomcat has not been an option for quite a while. So, you must be 
running an old version of Solr. Which one?
2) Compare that you have the same Solr config. In Admin UI, there will be all 
O/S variables passed to the Java runtime, I would check them side-by-side
3) You can enable Dataimport(DIH) debug in Admin UI, so perhaps you can run a 
subset (1?) of the queries and see the difference
4) Worst case, you may want to track this in between Solr and DB by using 
network analyzer (e.g. Wireshark). That may show you the actual queries, 
timing, connection issues, etc
5) DIH is not actually recommended for production, more for exploration; you 
may want to consider moving to a stronger architecture given the complexity of 
your needs

Regards,
   Alex.

On Wed, 31 Jul 2019 at 10:04, Srinivas Kashyap  wrote:
>
> Hello,
>
> We are trying to run Solr(Tomcat) on Azure instance and postgres being the 
> DB. When I run full import(my core has 18 SQL queries), for some reason, the 
> requests will go till 9 and it gets hung for eternity.
>
> But the same setup, solr(tomcat) and postgres database works fine with AWS 
> hosting.
>
> Am I missing some configuration? Please let me know.
>
> Thanks and Regards,
> Srinivas Kashyap
> 
> DISCLAIMER:
> E-mails and attachments from Bamboo Rose, LLC are confidential.
> If you are not the intended recipient, please notify the sender immediately 
> by replying to the e-mail, and then delete it without making copies or using 
> it in any way.
> No representation is made that this email or any attachments are free of 
> viruses. Virus scanning is recommended and is the responsibility of the 
> recipient.


Re: Dataimport problem

2019-07-31 Thread Alexandre Rafalovitch
A couple of things:
1) Solr on Tomcat has not been an option for quite a while. So, you
must be running an old version of Solr. Which one?
2) Compare that you have the same Solr config. In Admin UI, there will
be all O/S variables passed to the Java runtime, I would check them
side-by-side
3) You can enable Dataimport(DIH) debug in Admin UI, so perhaps you
can run a subset (1?) of the queries and see the difference
4) Worst case, you may want to track this in between Solr and DB by
using network analyzer (e.g. Wireshark). That may show you the actual
queries, timing, connection issues, etc
5) DIH is not actually recommended for production, more for
exploration; you may want to consider moving to a stronger
architecture given the complexity of your needs

Regards,
   Alex.

On Wed, 31 Jul 2019 at 10:04, Srinivas Kashyap  wrote:
>
> Hello,
>
> We are trying to run Solr(Tomcat) on Azure instance and postgres being the 
> DB. When I run full import(my core has 18 SQL queries), for some reason, the 
> requests will go till 9 and it gets hung for eternity.
>
> But the same setup, solr(tomcat) and postgres database works fine with AWS 
> hosting.
>
> Am I missing some configuration? Please let me know.
>
> Thanks and Regards,
> Srinivas Kashyap
> 
> DISCLAIMER:
> E-mails and attachments from Bamboo Rose, LLC are confidential.
> If you are not the intended recipient, please notify the sender immediately 
> by replying to the e-mail, and then delete it without making copies or using 
> it in any way.
> No representation is made that this email or any attachments are free of 
> viruses. Virus scanning is recommended and is the responsibility of the 
> recipient.


Dataimport problem

2019-07-31 Thread Srinivas Kashyap
Hello,

We are trying to run Solr(Tomcat) on Azure instance and postgres being the DB. 
When I run full import(my core has 18 SQL queries), for some reason, the 
requests will go till 9 and it gets hung for eternity.

But the same setup, solr(tomcat) and postgres database works fine with AWS 
hosting.

Am I missing some configuration? Please let me know.

Thanks and Regards,
Srinivas Kashyap

DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.


Re: SOLR 8.1.1 EdgeNGramFilterFactory parsing query

2019-07-31 Thread Erick Erickson
This works fine for me. Are you completely sure that

1> you pushed the changed config to the right place
2> you reloaded your server?

One thing I do is go to the admin UI and check for the collection core) and 
bring up the schema file just to be sure that I’m using the schema I think I am.

I’d also check the admin/analysis page to see what that shows, sometimes I can 
get hints there.

And I’m assuming you’ve completely re-indexed your data, although for query 
parsing that shouldn’t be relevant.

Best,
Erick

Re: Single field in "qf" vs multiple

2019-07-31 Thread Erick Erickson
The short answer is “yes, ranking will be different”. This is inevitable since
the stats are different in your X field, there are more terms, the frequency of
any given term is different, etc.

I’d argue, though, that using qf with a list of fields can be tweaked to give
you better results. For instance you can boost the fields individually with
different weights etc. The canonical example is fields like title, summary
and body where you can assume that matches in title are more important
than summary which is in turn more important than body and do
something like:
qf=title^5 summary^2 body

Best,
Erick

> On Jul 31, 2019, at 8:51 AM, Steven White  wrote:
> 
> Hi everyone,
> 
> I'm indexing my data into multiple Solr fields, such as A, B, C and I'm
> also copying all the data of those fields into a master field such as X.
> 
> By default, my "qf" is set to X so anytime a user is searching they are
> searching across the data that also exist in fields A, B and C.
> 
> In some use cases, I need to narrow down a user's search to just A and C or
> A only, etc.  When that happens, I dynamically, at run time set "qf" to "A
> C" or just "A".
> 
> My question is this, will the search quality and ranking be different if I
> simply set "qf" to "A B C" and avoid the copy operation to "X" (it will
> save me disk space)?  Will there be a performance impact if I do this?  Is
> there a limit at which point I should not list more than N fields in "qf"?
> 
> Thanks,
> 
> Steven



Single field in "qf" vs multiple

2019-07-31 Thread Steven White
Hi everyone,

I'm indexing my data into multiple Solr fields, such as A, B, C and I'm
also copying all the data of those fields into a master field such as X.

By default, my "qf" is set to X so anytime a user is searching they are
searching across the data that also exist in fields A, B and C.

In some use cases, I need to narrow down a user's search to just A and C or
A only, etc.  When that happens, I dynamically, at run time set "qf" to "A
C" or just "A".

My question is this, will the search quality and ranking be different if I
simply set "qf" to "A B C" and avoid the copy operation to "X" (it will
save me disk space)?  Will there be a performance impact if I do this?  Is
there a limit at which point I should not list more than N fields in "qf"?

Thanks,

Steven


NRT for new items in index

2019-07-31 Thread profiuser
Hi,

we have something about 400 000 000 items in a solr collection.
We have set up auto commit property for this collection to 15 minutes.
Is a big collection and we using some caches etc. Therefore we have big
autocommit value.

This have disadvantage that we haven't NRT searches.

We would like to have NRT at least for searching for the newly added items.

We read about new functionality "Category routed alilases" in a solr version
8.1. 

And we got an idea, that we could add to our collection schema field for
routing. 
And at the time of indexing we check if item is new and to routing field we
set up value "new", or the item is older than some time period we set up
value to "old".
And we will have one category routed alias routedCollection, and there will
be 2 collections old and new.

If we index new item, router choose new collection and this item is inserted
to it. After some period we reindex item and we decide that this item is old
and to routing field we set up value "old". Router decide to update (insert)
item to collection old. But we expect that solr automatically check
uniqueness in all routed collections. And if solr found item in other
collection, than will be automatically deleted. But not !!!

Is this expected behaviour? 

Could be used this functionality for issue we have? Or could someone suggest
another solution, which ensure that we have all new items ready for NRT
searches?

Thanks for your help






--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Problem with solr suggester in case of non-ASCII characters

2019-07-31 Thread Szűcs Roland
Hi Erick,

Thanks your advice.
I already removed it from the field definition used by the suggester and it
works great. I will consider to took it from the entire processing of the
other fields. I have only 7000 docs with index size of 18MB so far, so  the
memory footprint is not a key issue for me.

Best,
Roland

Erick Erickson  ezt írta (időpont: 2019. júl. 31.,
Sze, 14:24):

> Roland:
>
> Have you considered just not using stopwords anywhere? Largely they’re a
> holdover
> from a long time ago when every byte counted. Plus using stopwords has
> “interesting”
> issues with things like highlighting and phrase queries and the like.
>
> Sure, not using stopwords will make your index larger, but so will a
> copyfield…
>
> Your call of course, but stopwords are over-used IMO.
>
> I’m stealing Walter Underwood’s thunder here ;)
>
> Best,
> Erick
>
> > On Jul 30, 2019, at 2:11 PM, Szűcs Roland 
> wrote:
> >
> > Hi Furkan,
> >
> > Thanks the suggestion, I always forget the most effective debugging tool
> > the analysis page.
> >
> > It turned out that "Jó" was a stop word and it was eliminated during the
> > text analysis. What I will do is to create a new field type but without
> > stop word removal and I will use it like this:
> >  > name="suggestAnalyzerFieldType">short_text_hu_without_stop_removal
> >
> > Thanks again
> >
> > Roland
> >
> > Furkan KAMACI  ezt írta (időpont: 2019. júl.
> 30.,
> > K, 16:17):
> >
> >> Hi Roland,
> >>
> >> Could you check Analysis tab (
> >> https://lucene.apache.org/solr/guide/8_1/analysis-screen.html) and tell
> >> how
> >> the term is analyzed for both query and index?
> >>
> >> Kind Regards,
> >> Furkan KAMACI
> >>
> >> On Tue, Jul 30, 2019 at 4:50 PM Szűcs Roland <
> szucs.rol...@bookandwalk.hu>
> >> wrote:
> >>
> >>> Hi All,
> >>>
> >>> I have an author suggester (searchcomponent and the related request
> >>> handler) defined in solrconfig:
> >>> 
> >>>>
> >>>
> >>>  author
> >>>  AnalyzingInfixLookupFactory
> >>>  DocumentDictionaryFactory
> >>>  BOOK_productAuthor
> >>>  short_text_hu
> >>>  suggester_infix_author
> >>>  false
> >>>  false
> >>>  2
> >>>
> >>> 
> >>>
> >>>  >>> startup="lazy" >
> >>> 
> >>>  true
> >>>  10
> >>>  author
> >>> 
> >>> 
> >>>  suggest
> >>> 
> >>> 
> >>>
> >>> Author field has just a minimal text processing in query and index time
> >>> based on the following definition:
> >>>  >>> positionIncrementGap="100" multiValued="true">
> >>>
> >>>  
> >>>  
> >>>   >>> ignoreCase="true"/>
> >>>  
> >>>
> >>>
> >>>  
> >>>   >>> ignoreCase="true"/>
> >>>  
> >>>
> >>>  
> >>>   >>> docValues="true"/>
> >>>   >>> docValues="true" multiValued="true"/>
> >>>   >>> positionIncrementGap="100">
> >>>
> >>>  
> >>>  
> >>>   >> words="lang/stopwords_ar.txt"
> >>> ignoreCase="true"/>
> >>>  
> >>>  
> >>>
> >>>  
> >>>
> >>> When I use qeries with only ASCII characters, the results are correct:
> >>> "Al":{
> >>> "term":"Alexandre Dumas", "weight":0, "payload":""}
> >>>
> >>> When I try it with Hungarian authorname with special character:
> >>> "Jó":"author":{
> >>> "Jó":{ "numFound":0, "suggestions":[]}}
> >>>
> >>> When I try it with three letters, it works again:
> >>> "Józ":"author":{
> >>> "Józ":{ "numFound":10, "suggestions":[{ "term":"Bajza József", "
> >>> weight":0, "payload":""}, { "term":"Eötvös József", "weight":0,
> "
> >>> payload":""}, { "term":"Eötvös József", "weight":0,
> >> "payload":""}, {
> >>> "term":"Eötvös József", "weight":0, "payload":""}, {
> >>> "term":"József
> >>> Attila", "weight":0, "payload":""}..
> >>>
> >>> Any idea how can it happen that a longer string has more matches than a
> >>> shorter one. It is inconsistent. What can I do to fix it as it would
> >>> results poor customer experience.
> >>> They would feel that sometimes they need 2 sometimes 3 characters to
> get
> >>> suggestions.
> >>>
> >>> Thanks in advance,
> >>> Roland
> >>>
> >>
>
>


Re: Contact for Wiki / Support page maintainer

2019-07-31 Thread Jan Høydahl
I tried to add Jaroslaw as an editor of that one page by adding him under 
"Restrictions" tab of the page. But it does not work.
Anyone with higher Confluence skills who can tell how to give the edit bit for 
a single page to individuals. I know how to add edit permission for the whole 
WIKI space to individuals but that was not what I intended.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 29. jul. 2019 kl. 23:01 skrev Jan Høydahl :
> 
> All PMC members can add indivitual contributors in Confluence. Even for 
> specific pages I think.
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com 
> 
>> 29. jul. 2019 kl. 16:47 skrev Jason Gerlowski > >:
>> 
>> I was under the impression that non-committers could also edit the
>> wiki pages if the requested the appropriate karma on the mailing list.
>> 
>> Though maybe that changed with the move to cwiki, or maybe that's
>> never been the case
>> 
>> On Thu, Jul 25, 2019 at 4:10 PM Jan Høydahl > > wrote:
>>> 
>>> All committers can edit. What would you like to change/add?
>>> 
>>> Jan Høydahl
>>> 
 25. jul. 2019 kl. 09:11 skrev Jaroslaw Rozanski >>> >:
 
 Hi folks!
 
 Who is the maintainer of Solr Support page in the Apache Solr Wiki 
 (https://cwiki.apache.org/confluence/display/solr/Support 
 )?
 
 Thanks,
 Jaroslaw
 
 --
 Jaroslaw Rozanski | m...@jarekrozanski.eu 
> 



Re: Problem with solr suggester in case of non-ASCII characters

2019-07-31 Thread Erick Erickson
Roland:

Have you considered just not using stopwords anywhere? Largely they’re a 
holdover
from a long time ago when every byte counted. Plus using stopwords has 
“interesting”
issues with things like highlighting and phrase queries and the like.

Sure, not using stopwords will make your index larger, but so will a copyfield…

Your call of course, but stopwords are over-used IMO.

I’m stealing Walter Underwood’s thunder here ;)

Best,
Erick

> On Jul 30, 2019, at 2:11 PM, Szűcs Roland  wrote:
> 
> Hi Furkan,
> 
> Thanks the suggestion, I always forget the most effective debugging tool
> the analysis page.
> 
> It turned out that "Jó" was a stop word and it was eliminated during the
> text analysis. What I will do is to create a new field type but without
> stop word removal and I will use it like this:
>  name="suggestAnalyzerFieldType">short_text_hu_without_stop_removal
> 
> Thanks again
> 
> Roland
> 
> Furkan KAMACI  ezt írta (időpont: 2019. júl. 30.,
> K, 16:17):
> 
>> Hi Roland,
>> 
>> Could you check Analysis tab (
>> https://lucene.apache.org/solr/guide/8_1/analysis-screen.html) and tell
>> how
>> the term is analyzed for both query and index?
>> 
>> Kind Regards,
>> Furkan KAMACI
>> 
>> On Tue, Jul 30, 2019 at 4:50 PM Szűcs Roland 
>> wrote:
>> 
>>> Hi All,
>>> 
>>> I have an author suggester (searchcomponent and the related request
>>> handler) defined in solrconfig:
>>> 
>>>>
>>>
>>>  author
>>>  AnalyzingInfixLookupFactory
>>>  DocumentDictionaryFactory
>>>  BOOK_productAuthor
>>>  short_text_hu
>>>  suggester_infix_author
>>>  false
>>>  false
>>>  2
>>>
>>> 
>>> 
>>> >> startup="lazy" >
>>> 
>>>  true
>>>  10
>>>  author
>>> 
>>> 
>>>  suggest
>>> 
>>> 
>>> 
>>> Author field has just a minimal text processing in query and index time
>>> based on the following definition:
>>> >> positionIncrementGap="100" multiValued="true">
>>>
>>>  
>>>  
>>>  >> ignoreCase="true"/>
>>>  
>>>
>>>
>>>  
>>>  >> ignoreCase="true"/>
>>>  
>>>
>>>  
>>>  >> docValues="true"/>
>>>  >> docValues="true" multiValued="true"/>
>>>  >> positionIncrementGap="100">
>>>
>>>  
>>>  
>>>  > words="lang/stopwords_ar.txt"
>>> ignoreCase="true"/>
>>>  
>>>  
>>>
>>>  
>>> 
>>> When I use qeries with only ASCII characters, the results are correct:
>>> "Al":{
>>> "term":"Alexandre Dumas", "weight":0, "payload":""}
>>> 
>>> When I try it with Hungarian authorname with special character:
>>> "Jó":"author":{
>>> "Jó":{ "numFound":0, "suggestions":[]}}
>>> 
>>> When I try it with three letters, it works again:
>>> "Józ":"author":{
>>> "Józ":{ "numFound":10, "suggestions":[{ "term":"Bajza József", "
>>> weight":0, "payload":""}, { "term":"Eötvös József", "weight":0, "
>>> payload":""}, { "term":"Eötvös József", "weight":0,
>> "payload":""}, {
>>> "term":"Eötvös József", "weight":0, "payload":""}, {
>>> "term":"József
>>> Attila", "weight":0, "payload":""}..
>>> 
>>> Any idea how can it happen that a longer string has more matches than a
>>> shorter one. It is inconsistent. What can I do to fix it as it would
>>> results poor customer experience.
>>> They would feel that sometimes they need 2 sometimes 3 characters to get
>>> suggestions.
>>> 
>>> Thanks in advance,
>>> Roland
>>> 
>> 



Re: Solr 7.7.2 vs Solr 8.2.0

2019-07-31 Thread Erick Erickson
Do be aware that if you are using indexes created with 6x you will be required 
to completely re-index when you upgrade to Solr 8. IndexupgraderTool doesn’t 
help with this, i.e. you _cannot_ go from 6x->7x with 7x's IndexupGraderTool 
then go from 7x->8x.

Best,
Erick

> On Jul 31, 2019, at 6:35 AM, Jan Høydahl  wrote:
> 
> Hi
> 
> Go for 8.2, as 7.x will be end of life later this year. If you find any know 
> bugs in 8.2.0 that you cannot live with, wait for 8.2.1 which would maximize 
> stability.
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
>> 30. jul. 2019 kl. 22:53 skrev Arnold Bronley :
>> 
>> Hi,
>> 
>> We are trying to decide whether we should upgrade to Solr 7.7.2 version or
>> Solr 8.2.0 version. We are currently on Solr 6.3.0 version.
>> 
>> On one hand 8.2.0 version feels like a good choice because it is the latest
>> version. But then experience tells that initial versions usually have lot
>> of bugs compared to the later LTS versions.
>> 
>> Also, there is one more issue. There is this major JIRA bug
>> https://issues.apache.org/jira/browse/SOLR-13336 which mostly won't get
>> fixed in any 7.x version, but is fixed in Solr 8.1. I checked and our Solr
>> configuration is vulnerable to it. Do you have any recommendation as to
>> which Solr version one should move to given these facts?
> 



Issue with inplace update when TimeStampUpdateProcessor is added in updateRequestProcessorChain in solrconfig

2019-07-31 Thread Dominic Dsouza
Hello,

I have found a strange issue with inplace updates. When i have 
TimeStampUpdateProcessor configured in my updateRequestProcessorChain like 
following:

  
mydate
  

and i am doing a inplace update on pint field, then inplace update runs fine, 
but the indexed fields(fieldtype text_general) are removed, this can be checked 
by searching on that field. The updated document does not showup in the 
results(it was showing up before inplace update).

When i removed the TimestampUpdateProcessorFactory from solrconfig, everything 
is fine. Is there some issue here??



Re: Solr 8.2.0 having issue with ZooKeeper 3.5.5

2019-07-31 Thread Jörn Franke
Updated correct zoo.cfg? Did you restart zookeeper after config change ?

> Am 30.07.2019 um 04:05 schrieb Zheng Lin Edwin Yeo :
> 
> Hi,
> 
> I am using the new Solr 8.2.0 with SolrCloud and external ZooKeeper 3.5.5.
> 
> However, after adding in the line under zoo.cfg
> *4lw.commands.whitelist=**
> 
> I get the error under Cloud -> ZK Status in Solr
> *"Errors: - membership: Check 4lq.commands.whitelist setting in zookeeper
> configuration file."*
> 
> I have noticed that the issue is cause by adding the "conf" in the
> whitelist. But if I do not add the "conf" to the whitelist, I will get the
> following error:
> *"Errors: - conf is not executed because it is not in the whitelist. Check
> 4lw.commands.whitelist setting in zookeeper configuration file."*
> 
> What could be the issue that cause this error, and how can we resolve it.
> 
> Thank you.
> 
> Regards,
> Edwin


Re: Solr 8.2.0 having issue with ZooKeeper 3.5.5

2019-07-31 Thread Jan Høydahl
Please look for the full log file solr.log in your Solr server, and share it 
via some file sharing service or gist or similar for us to be able to decipher 
the collection create error.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 31. jul. 2019 kl. 08:33 skrev Zheng Lin Edwin Yeo :
> 
> Hi,
> 
> Regarding the issue, I have tried to put the following in zoo.cfg under
> ZooKeeper:
> 4lw.commands.whitelist=mntr,conf,ruok
> 
> But it is still showing this error.
> *"Errors: - membership: Check 4lq.commands.whitelist setting in zookeeper
> configuration file."*
> 
> As I am using SolrCloud, the collection config can still be loaded to
> ZooKeeper as per normal. But if I tried to create a collection, I will get
> the following error:
> 
> {
>  "responseHeader":{
>"status":400,
>"QTime":686},
>  "failure":{
>
> "192.168.1.2:8983_solr":"org.apache.solr.client.solrj.SolrServerException:IOException
> occurred when talking to server at:http://192.168.1.2:8983/solr;,
>
> "192.168.1.2:8984_solr":"org.apache.solr.client.solrj.SolrServerException:IOException
> occurred when talking to server at:http://192.168.1.2:8984/solr"},
>  "Operation create caused
> exception:":"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> Underlying core creation failed while creating collection: collection1",
>  "exception":{f
>"msg":"Underlying core creation failed while creating collection:
> collection1",
>"rspCode":400},
>  "error":{
>"metadata":[
>  "error-class","org.apache.solr.common.SolrException",
>  "root-error-class","org.apache.solr.common.SolrException"],
>"msg":"Underlying core creation failed while creating collection:
> collection1",
>"code":400}}
> 
> Is there anything which I may have missed out?
> 
> Regards,
> Edwin
> 
> On Tue, 30 Jul 2019 at 10:05, Zheng Lin Edwin Yeo 
> wrote:
> 
>> Hi,
>> 
>> I am using the new Solr 8.2.0 with SolrCloud and external ZooKeeper 3.5.5.
>> 
>> However, after adding in the line under zoo.cfg
>> *4lw.commands.whitelist=**
>> 
>> I get the error under Cloud -> ZK Status in Solr
>> *"Errors: - membership: Check 4lq.commands.whitelist setting in zookeeper
>> configuration file."*
>> 
>> I have noticed that the issue is cause by adding the "conf" in the
>> whitelist. But if I do not add the "conf" to the whitelist, I will get the
>> following error:
>> *"Errors: - conf is not executed because it is not in the whitelist. Check
>> 4lw.commands.whitelist setting in zookeeper configuration file."*
>> 
>> What could be the issue that cause this error, and how can we resolve it.
>> 
>> Thank you.
>> 
>> Regards,
>> Edwin
>> 



Re: Solr Backup

2019-07-31 Thread Jayadevan Maymala
On Tue, Jul 30, 2019 at 7:54 PM Jan Høydahl  wrote:

> The FS backup feature requires a shared drive as you say, and this is
> clearly documented. No way around it. Cloud Filestore would likely fix it.
>
> Or you could write a new backup repo plugin for backup directly to Google
> Cloud Storage?
>
I created a filtestore, mounted it on all the 3 nodes and the backup
worked. The minimum size of a Google Cloud Filestore is 1 TB, while our
backup would be well under 100 GB. I was just checking to see if there is a
way to avoid paying for 1 TB, especially since we do have spare local
storage. Anyway, guess this is the only way.
Thanks,
Jayadevan


Re: Solr 7.7.2 vs Solr 8.2.0

2019-07-31 Thread Jan Høydahl
Hi

Go for 8.2, as 7.x will be end of life later this year. If you find any know 
bugs in 8.2.0 that you cannot live with, wait for 8.2.1 which would maximize 
stability.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 30. jul. 2019 kl. 22:53 skrev Arnold Bronley :
> 
> Hi,
> 
> We are trying to decide whether we should upgrade to Solr 7.7.2 version or
> Solr 8.2.0 version. We are currently on Solr 6.3.0 version.
> 
> On one hand 8.2.0 version feels like a good choice because it is the latest
> version. But then experience tells that initial versions usually have lot
> of bugs compared to the later LTS versions.
> 
> Also, there is one more issue. There is this major JIRA bug
> https://issues.apache.org/jira/browse/SOLR-13336 which mostly won't get
> fixed in any 7.x version, but is fixed in Solr 8.1. I checked and our Solr
> configuration is vulnerable to it. Do you have any recommendation as to
> which Solr version one should move to given these facts?



Re: Solr 8.2.0 having issue with ZooKeeper 3.5.5

2019-07-31 Thread Zheng Lin Edwin Yeo
Hi,

Regarding the issue, I have tried to put the following in zoo.cfg under
ZooKeeper:
4lw.commands.whitelist=mntr,conf,ruok

But it is still showing this error.
*"Errors: - membership: Check 4lq.commands.whitelist setting in zookeeper
configuration file."*

As I am using SolrCloud, the collection config can still be loaded to
ZooKeeper as per normal. But if I tried to create a collection, I will get
the following error:

{
  "responseHeader":{
"status":400,
"QTime":686},
  "failure":{

"192.168.1.2:8983_solr":"org.apache.solr.client.solrj.SolrServerException:IOException
occurred when talking to server at:http://192.168.1.2:8983/solr;,

"192.168.1.2:8984_solr":"org.apache.solr.client.solrj.SolrServerException:IOException
occurred when talking to server at:http://192.168.1.2:8984/solr"},
  "Operation create caused
exception:":"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
Underlying core creation failed while creating collection: collection1",
  "exception":{f
"msg":"Underlying core creation failed while creating collection:
collection1",
"rspCode":400},
  "error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.common.SolrException"],
"msg":"Underlying core creation failed while creating collection:
collection1",
"code":400}}

Is there anything which I may have missed out?

Regards,
Edwin

On Tue, 30 Jul 2019 at 10:05, Zheng Lin Edwin Yeo 
wrote:

> Hi,
>
> I am using the new Solr 8.2.0 with SolrCloud and external ZooKeeper 3.5.5.
>
> However, after adding in the line under zoo.cfg
> *4lw.commands.whitelist=**
>
> I get the error under Cloud -> ZK Status in Solr
> *"Errors: - membership: Check 4lq.commands.whitelist setting in zookeeper
> configuration file."*
>
> I have noticed that the issue is cause by adding the "conf" in the
> whitelist. But if I do not add the "conf" to the whitelist, I will get the
> following error:
> *"Errors: - conf is not executed because it is not in the whitelist. Check
> 4lw.commands.whitelist setting in zookeeper configuration file."*
>
> What could be the issue that cause this error, and how can we resolve it.
>
> Thank you.
>
> Regards,
> Edwin
>


Re: Problem with uploading Large synonym files in cloud mode

2019-07-31 Thread Jörn Franke
The idea of using an external program could be good. 

> Am 31.07.2019 um 08:06 schrieb Salmaan Rashid Syed 
> :
> 
> Hi all,
> 
> Thanks for your invaluable and helpful answers.
> 
> I currently don't have an external zookeeper loaded. I am working as per
> the documentation for solr cloud without external zookeeper. I will later
> add the external zookeeper once the changes works as expected.
> 
> *1) Will I still need to make changes to zookeeper-env.sh? Or the changes
> to solr.in.sh  will suffice?*
> 
> I have an additional query that is slightly off topic but related to
> synonyms.
> My synonyms file will be updated with new words with time. What is the
> procedure to update the synonyms file without shutting down the solr in
> production?
> 
> What I am thinking is to replace all the similar words in a documents using
> an external program before I index them to Solr. This way I don't have to
> worry about the synonyms file size and updation.
> 
> *2) Do you think this is better way forward?*
> 
> Thanks for all you help.
> 
> Regards,
> Salmaan
> 
> 
> 
> 
> On Tue, Jul 30, 2019 at 4:53 PM Bernd Fehling <
> bernd.fehl...@uni-bielefeld.de> wrote:
> 
>> You have to increase the -Djute.maxbuffer for large configs.
>> 
>> In Solr bin/solr/solr.in.sh use e.g.
>> SOLR_OPTS="$SOLR_OPTS -Djute.maxbuffer=1000"
>> This will increase maxbuffer for zookeeper on solr side to 10MB.
>> 
>> In Zookeeper zookeeper/conf/zookeeper-env.sh
>> SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Djute.maxbuffer=1000"
>> 
>> I have a >10MB Thesaurus and use 30MB for jute.maxbuffer, works perfect.
>> 
>> Regards
>> 
>> 
>>> Am 30.07.19 um 13:09 schrieb Salmaan Rashid Syed:
>>> Hi Solr Users,
>>> 
>>> I have a very big synonym file (>5MB). I am unable to start Solr in cloud
>>> mode as it throws an error message stating that the synonmys file is
>>> too large. I figured out that the zookeeper doesn't take a file greater
>>> than 1MB size.
>>> 
>>> I tried to break down my synonyms file to smaller chunks less than 1MB
>>> each. But, I am not sure about how to include all the filenames into the
>>> Solr schema.
>>> 
>>> Should it be seperated by commas like synonyms = "__1_synonyms.txt,
>>> __2_synonyms.txt, __3synonyms.txt"
>>> 
>>> Or is there a better way of doing that? Will the bigger file when broken
>>> down to smaller chunks will be uploaded to zookeeper as well.
>>> 
>>> Please help or please guide me to relevant documentation regarding this.
>>> 
>>> Thank you.
>>> 
>>> Regards.
>>> Salmaan.
>>> 
>> 


Re: Problem with uploading Large synonym files in cloud mode

2019-07-31 Thread Jörn Franke
Ad 1) it needs to be configured in Zookeeper server and Solr and all other ZK 
clients

Ad 2) you never need to shut it down in production for updating Synonym files.
Use the config set API to reupload the full configuration included updated 
synonyms:
https://lucene.apache.org/solr/guide/7_4/configsets-api.html
Then: reload the collection and optionally reindex (if you use synonyms at the 
index level)
Alternatively used managed synonyms 

https://lucene.apache.org/solr/guide/6_6/managed-resources.html

And optionally reindex

> Am 31.07.2019 um 08:06 schrieb Salmaan Rashid Syed 
> :
> 
> Hi all,
> 
> Thanks for your invaluable and helpful answers.
> 
> I currently don't have an external zookeeper loaded. I am working as per
> the documentation for solr cloud without external zookeeper. I will later
> add the external zookeeper once the changes works as expected.
> 
> *1) Will I still need to make changes to zookeeper-env.sh? Or the changes
> to solr.in.sh  will suffice?*
> 
> I have an additional query that is slightly off topic but related to
> synonyms.
> My synonyms file will be updated with new words with time. What is the
> procedure to update the synonyms file without shutting down the solr in
> production?
> 
> What I am thinking is to replace all the similar words in a documents using
> an external program before I index them to Solr. This way I don't have to
> worry about the synonyms file size and updation.
> 
> *2) Do you think this is better way forward?*
> 
> Thanks for all you help.
> 
> Regards,
> Salmaan
> 
> 
> 
> 
> On Tue, Jul 30, 2019 at 4:53 PM Bernd Fehling <
> bernd.fehl...@uni-bielefeld.de> wrote:
> 
>> You have to increase the -Djute.maxbuffer for large configs.
>> 
>> In Solr bin/solr/solr.in.sh use e.g.
>> SOLR_OPTS="$SOLR_OPTS -Djute.maxbuffer=1000"
>> This will increase maxbuffer for zookeeper on solr side to 10MB.
>> 
>> In Zookeeper zookeeper/conf/zookeeper-env.sh
>> SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Djute.maxbuffer=1000"
>> 
>> I have a >10MB Thesaurus and use 30MB for jute.maxbuffer, works perfect.
>> 
>> Regards
>> 
>> 
>>> Am 30.07.19 um 13:09 schrieb Salmaan Rashid Syed:
>>> Hi Solr Users,
>>> 
>>> I have a very big synonym file (>5MB). I am unable to start Solr in cloud
>>> mode as it throws an error message stating that the synonmys file is
>>> too large. I figured out that the zookeeper doesn't take a file greater
>>> than 1MB size.
>>> 
>>> I tried to break down my synonyms file to smaller chunks less than 1MB
>>> each. But, I am not sure about how to include all the filenames into the
>>> Solr schema.
>>> 
>>> Should it be seperated by commas like synonyms = "__1_synonyms.txt,
>>> __2_synonyms.txt, __3synonyms.txt"
>>> 
>>> Or is there a better way of doing that? Will the bigger file when broken
>>> down to smaller chunks will be uploaded to zookeeper as well.
>>> 
>>> Please help or please guide me to relevant documentation regarding this.
>>> 
>>> Thank you.
>>> 
>>> Regards.
>>> Salmaan.
>>> 
>> 


Re: Problem with uploading Large synonym files in cloud mode

2019-07-31 Thread Salmaan Rashid Syed
Hi all,

Thanks for your invaluable and helpful answers.

I currently don't have an external zookeeper loaded. I am working as per
the documentation for solr cloud without external zookeeper. I will later
add the external zookeeper once the changes works as expected.

*1) Will I still need to make changes to zookeeper-env.sh? Or the changes
to solr.in.sh  will suffice?*

I have an additional query that is slightly off topic but related to
synonyms.
My synonyms file will be updated with new words with time. What is the
procedure to update the synonyms file without shutting down the solr in
production?

What I am thinking is to replace all the similar words in a documents using
an external program before I index them to Solr. This way I don't have to
worry about the synonyms file size and updation.

*2) Do you think this is better way forward?*

Thanks for all you help.

Regards,
Salmaan




On Tue, Jul 30, 2019 at 4:53 PM Bernd Fehling <
bernd.fehl...@uni-bielefeld.de> wrote:

> You have to increase the -Djute.maxbuffer for large configs.
>
> In Solr bin/solr/solr.in.sh use e.g.
> SOLR_OPTS="$SOLR_OPTS -Djute.maxbuffer=1000"
> This will increase maxbuffer for zookeeper on solr side to 10MB.
>
> In Zookeeper zookeeper/conf/zookeeper-env.sh
> SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Djute.maxbuffer=1000"
>
> I have a >10MB Thesaurus and use 30MB for jute.maxbuffer, works perfect.
>
> Regards
>
>
> Am 30.07.19 um 13:09 schrieb Salmaan Rashid Syed:
> > Hi Solr Users,
> >
> > I have a very big synonym file (>5MB). I am unable to start Solr in cloud
> > mode as it throws an error message stating that the synonmys file is
> > too large. I figured out that the zookeeper doesn't take a file greater
> > than 1MB size.
> >
> > I tried to break down my synonyms file to smaller chunks less than 1MB
> > each. But, I am not sure about how to include all the filenames into the
> > Solr schema.
> >
> > Should it be seperated by commas like synonyms = "__1_synonyms.txt,
> > __2_synonyms.txt, __3synonyms.txt"
> >
> > Or is there a better way of doing that? Will the bigger file when broken
> > down to smaller chunks will be uploaded to zookeeper as well.
> >
> > Please help or please guide me to relevant documentation regarding this.
> >
> > Thank you.
> >
> > Regards.
> > Salmaan.
> >
>